The Democratization of Machine Learning: What It Means for Tech Innovation
The world of high-tech innovation can change the destiny of industries seemingly overnight. Now we are on the cusp of a new grand leap thanks to the democratization of machine learning, a form of artificial intelligence that enables computers to learn without being explicitly programmed. This process of democratization is already underway, according to this opinion piece by Kartik Hosanagar (@khosanagar), Wharton professor of operations, information and decisions, and a cofounder of Yodle Inc., and, Apoorv Saxena (@apoorvsaxena1), a product manager at Google and co-chair of the recent AI Frontiers conference.
Last month, at the CloudNext conference in San Francisco, Google announced its acquisition of Kaggle, an online community for data scientists and machine-learning competitions. Although the move may seem far removed from Google’s core businesses, it speaks to the skyrocketing industry interest in machine learning (ML). Kaggle not only gives Google access to a talented community of data scientists, but also one of the largest repositories of datasets that will help train the next generation of machine-learning algorithms.
As ML algorithms solve bigger and more complex problems, such as language translation and image understanding, training them can require massive amounts of pre-labeled data. To increase access to such data, Google had previously released a labeled dataset created from more than 7 million YouTube videos as part of their YouTube-8M challenge on Kaggle. The acquisition of Kaggle is an interesting next step.
Market-based access to data and algorithms will lower entry barriers and lead to an explosion in new applications of AI. As recently as 2015, only large companies like Google, Amazon and Apple had access to the massive data and computing resources needed to train and launch sophisticated AI algorithms. Small startups and individuals simply didn’t have access and were effectively blocked out of the market. That changes now. The democratization of ML gives individuals and startups a chance to get their ideas off the ground and prove their concepts before raising the funds needed to scale.
But access to data is only one way in which ML is being democratized. There is an effort underway to standardize and improve access across all layers of the machine learning stack, including specialized chipsets, scalable computing platforms, software frameworks, tools and ML algorithms.
“Just like cloud computing ushered in the current explosion in startups … machine learning platforms will likely power the next generation of consumer and business tools.”
Complex machine-learning algorithms require an incredible amount of computing power, both to train models and implement them in real time. Rather than using general-purpose processors that can handle all kinds of tasks, the focus has shifted towards building specialized hardware that is custom built for ML tasks. With Google’s Tensor Processing Unit (TPU) and NVIDIA’s DGX-1, we now have powerful hardware built specifically for machine learning.
Highly scalable computing platforms
Even if specialized processors were available, not every company has the capital and skills needed to manage a large-scale computing platform needed to run advanced machine learning on a routine basis. This is where public cloud services such as Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure and others come in. These services offer developers a scalable infrastructure optimized for ML on rent and at a fraction of the cost of setting up on their own.
Open-source, deep-learning software frameworks
A major issue in the wide-scale adoption of machine learning is that there are many different software frameworks out there. Big companies are open sourcing their core ML frameworks and trying to push for some standardization. Just as the cost of developing mobile apps fell dramatically as iOS and Android emerged as the two dominant ecosystems, so too will machine learning become more accessible as tools and platforms standardize around a few frameworks. Some of the notable open source frameworks include Google’s TensorFlow, Amazon’s MXNet and Facebook’s Torch.
The final step to democratization of machine learning will be the development of simple drag-and-drop frameworks accessible to those without doctorate degrees or deep data science training. Microsoft Azure ML Studio offers access to many sophisticated ML models through a simple graphical UI. Amazon and Google have rolled out similar software on their cloud platforms as well.
Marketplaces for ML algorithms and datasets
Not only do we have an on-demand infrastructure needed to build and run ML algorithms, we even have marketplaces for the algorithms themselves. Need an algorithm for face recognition in images or to add color to black and white photographs? Marketplaces like Algorithmia let you download the algorithm of choice. Further, websites like Kaggle provide the massive datasets one needs to further train these algorithms.
“The final step to democratization of machine learning will be the development of simple drag-and-drop frameworks accessible.”
All of these changes mean that the world of machine learning is no longer restricted to university labs and corporate research centers that have access to massive training data and computing infrastructure.
What are the implications?
Back in the mid- and late-1990s, web development was done by specialists and was accessible only to firms with ample resources. Now, with simple tools like WordPress, Medium and Shopify, any lay person can have a presence on the web. The democratization of machine learning will have a similar impact of lowering entry barriers for individuals and startups.
Further, the emerging ecosystem, consisting of marketplaces for data, algorithms and computing infrastructure, will also make it easier for developers to pick up ML skills. The net result will be lower costs to train and hire talent. We think that the above two factors will be particularly powerful in vertical (industry-specific) use cases such as weather forecasting, healthcare/disease diagnostics, drug discovery and financial risk assessment that have been traditionally cost prohibitive.
Just like cloud computing ushered in the current explosion in startups, the ongoing build-out of machine learning platforms will likely power the next generation of consumer and business tools. The PC platform gave us access to productivity applications like Word and Excel and eventually to web applications like search and social networking. The mobile platform gave us messaging applications and location-based services. The ongoing democratization of ML will likely give us an amazing array of intelligent software and devices powering our world.