By now, most people know that Google has its own custom chips that are used to accelerate how their machines learn algorithms. These Tensor Processing Units (TPUs) were first revealed back in May 2016 at the I/O developer conference, but Google never shared much detail about them. Instead, they just explained that the TPUs were optimized for their own TensorFlow machine-learning framework. Google is now sharing more information and benchmarks about this project for the rest of the world to know.


Google has released a paper that explains all of the details about how their TPUs work. What’s important for people to know, however, is that the numbers are based on their own benchmarks, and they are evaluating their own chip. That being said, the TPUs are about 15-30x faster than a standard GPU/CPU combo in executing the Google machine workloads. Data centers also measure the power consumption that goes on, and TPUs offer 30-80x higher TeraOps/Watt, with those numbers only expected to increase.

These numbers are about the machines learning about models, not creating the models, to begin with.

Another thing that Google notes is that most architects optimize their chips for convolutional neural networks (works well for image recognition, among other things). Google claims that those networks only make up about 5% of their own data workload. Most of their applications use multi-layer perceptrons.

How It All Started

Google began exploring how to use GPUs, custom ASICS, and FFGAs in 2006 in their data centers. Back then though, there really weren’t a lot of applications that could use this special hardware and benefit greatly from it. Most of the heavy workloads that were being done just used excess hardware that already existed in the data center. That all changed in 2013, according to the authors of Google’s paper. The change occurred because they projected that DNNS increasing popularity would possible double computation demands on the data centers. These increasing demands would be too expensive to take care of using the older CPUs. Because of that, a high-priority project began to create a custom ASIC. The goal was to improve cost-performance over GPUs of more than 10x. The custom ASIC they wanted to create would help them cut costs and still be able to meet the data demands on their data centers.

While it isn’t likely that Google will release the TPUs for public use outside of their own cloud, it is very likely that other companies will piggy-back on what Google has done. When others see how their TPUs work and the benefits that they bring to the company, they are likely to try to create their own successors, eventually even surpassing Google’s TPUs and raising the bar for the future of TPUs.