Artificial intelligence — the stuff of science fiction—suddenly became real in 2015. Amazon shipped the Echo, its AI powered home assistant, to surprisingly positive reviews. The latest software from Microsoft and Google beat humans at image recognition. Tesla’s cars woke up one day and gained self-driving capability. And the latest software from Baidu is natively bilingual in English and Chinese.
The secret recipe that made these developments possible is an artificial intelligence technique called machine learning. Machine learning takes its inspiration from the human brain. Instead of solving problems analytically (“write a perfect mathematical description of what a chair looks like”), it solves problems by learning from large amounts of data (“use these 1 million chair pictures to figure out what chairs look like”). This approach to artificial intelligence has made huge strides in the last few years, especially in the field of computer vision.
The pair of images above shows how far we’ve come. On the left was the state-of-the-art computer vision algorithm trying to locate a mug circa 2008. Not only did it fail to find the mug, it produced many false positives. On the right is the latest computer vision system from Microsoft. It was able to accurately identify every object in the kitchen including a dimly lit person in the background.
Machine learning needs two things in order to work really well: lots of data and tons of processing power. The rise of the internet, especially photo and video sharing sites like Facebook, Flickr, and YouTube, has allowed researchers to train algorithms on data sets of unprecedented scale. The proliferation of programmable graphics processing units (GPUs), has made it possible to process this data in days as opposed to weeks or months. To use an analogy favored by AI researcher Andrew Ng: if machine learning is a rocket, data is the fuel and GPUs are the engines. With both in place, machine is taking off.
GPUs’ main claim to fame is their exceptional performance in floating point (decimal) arithmetic. GPUs are especially fast at adding and multiplying large amounts of decimal numbers, a common operation in 3D graphics, computational finance, genomic sequencing, weather forecasting, and machine learning. In terms of raw hardware, GPUs are roughly 6.4x faster than CPUs in floating point performance, as shown in the chart below. Because machine learning is dominated by this kind of math, GPUs have become very popular in AI research.
The GPU’s hardware performance advantage translates well when applied to machine learning software. When GPUs have been used to accelerate the training of convolutional neural networks, a popular machine learning algorithm, researchers found that they performed up to 8x faster than an optimized CPU implementation,[1] as shown in the chart below. Furthermore, GPU performance scaled better with increasing workloads.
Thanks in large part to GPUs, computer vision has achieved breakthrough progress in the last five years. In the ILSVRC 2010 computer vision competition to test how accurately computers could recognize common objects, the best algorithm was able to answer correctly 72% of the time, as is illustrated below. In 2015, the top algorithm scored 96%. Using GPUs for rapid training, the winning team was able to build larger and more sophisticated models that captured more nuances in the image. For comparison, a human who studied and took the same test scored 95%. In the span of five years, computer vision – specifically image recognition – went from habitually unreliable to human-level accuracy.
Computer vision accuracy took a leap in 2012 thanks to the use of deep convolutional neural networks trained with GPUs (Krizhevsky 2012). Software given five guesses for a given image (“top-5 error”) now have error rates below 4%.
Computers can now recognize not just flowers but correctly name the species.
Looking ahead, there are two technological developments that will further drive adoption of machine learning and GPUs. The first is the ubiquity of GPUs in the cloud. While Amazon offered GPU based cloud servers as early as 2013, this year Baidu, IBM, and Microsoft jumped into the fray with their own offerings based on NVIDIA’s Tesla K80 GPUs. For those who want to build in-house solutions, Facebook and Google have open-sourced their machine learning IP for hardware and software respectively. With easy access to GPUs via the cloud and plenty of open IP, machine learning is now more accessible than ever.
The second major catalyst for the adoption of machine learning is NVIDIA’s upcoming “Pascal” GPU. Pascal has three features geared specifically for machine learning. It uses stacked memory, which increases the maximum memory capacity from 12GB to 32GB, tripling memory bandwidth to 1TB/s. Pascal adds support for FP16, a lower precision number format that runs twice as fast as the prevailing FP32 format. Lastly, Pascal uses a new interconnect technology called NVLink that speeds up inter-chip communications. Together, these technologies will allow larger models to be trained, improving accuracy, and will cut down training time as they allow more iteration for developers.
After a few false starts, artificial intelligence finally has hit the mainstream thanks to the latest machine learning techniques. As apps and devices increasingly rely on artificial intelligence as the key differentiator, the GPU has emerged as the go-to processor for heavy computation. The availability of GPUs in the cloud and the upcoming Pascal chip from NVIDIA will only accelerate the adoption of machine learning in 2016 and beyond.