Last week was Google Cloud’s annual developer conference, and with it came one of the most important artificial intelligence announcements of the year.
Google Cloud is Google’s cloud computing division that competes directly with Amazon Web Services and Microsoft Azure. These three giants make up about 68% of the global cloud services industry, leaving the rest for everyone else to fight over.
As you probably already guessed, the hottest topics and focus of the conference were all about AI… but there was a noticeable shift this year.
At the conference, Google unveiled Ironwood – its seventh-generation tensor processing unit (TPU) designed for artificial intelligence tasks. Shown below is an image of four Ironwood TPUs configured side by side.
Ironwood TPUs | Source: Google
TPUs differ from GPUs in that their architecture is optimized for machine learning and artificial intelligence. They are more energy efficient per unit of compute than GPUs, which are general-purpose semiconductors designed for high-performance parallel processing.
I often refer to GPUs – the kind designed by NVIDIA (NVDA) and Advanced Micro Devices (AMD) – as the workhorses of artificial intelligence. This is not because of their efficiency, but because they are general purpose. They can easily be used for any kind of AI application, whether it be training or inference.
NVIDIA and AMD in many ways fell into machine learning and artificial intelligence by way of gaming and video processing. Before the AI boom, GPUs were primarily used for high-performance video applications like video production, computer games, and computer-aided design.
These applications funded the development of more advanced GPUs and the dedicated production capacity at its manufacturing partner – TSMC.
Without this historical foothold, the industry would have been entirely enabled by AI-specific semiconductors like Google’s Ironwood TPU.
The development of Ironwood marks a major milestone for Google. It announced its first TPU almost exactly 10 years ago at its 2016 Google I/O conference.
That was obviously years before the generative AI boom began in November 2022. Back then, the TPU was designed explicitly for machine learning applications.
Since then, the pace of development has been dizzying.
Source: Google
In the remarkable chart above, we can see that Ironwood has 3,600 times the peak performance of Google’s TPU v2, which was Google’s first TPU made available externally to cloud service customers in 2017.
It’s a staggering performance improvement that’s hard to comprehend.
And, as is consistent in the industry now, these AI-specific processing units aren’t designed for a singular use They’re designed for interconnection and optimized to work in parallel. That’s one of the things that makes the Ironwood announcement so significant.
Ironwood has been designed for a massive 9,216 Ironwood chip configuration – which Google calls a “pod” – capable of 42.5 exaflops of performance.
Compare this to the peak performance of El Capitan, the world’s most powerful classical supercomputer, capable of operating at 2.74 exaflops… Ironwood, in its 9,216 chip pod, is 15.5 times more powerful than El Capitan at a fraction of the size.
El Capitan | Source: Lawrence Livermore National Laboratory
El Capitan is designed with 44,544 AMD Instinct MI300A processors, which are phenomenal semiconductors. Each MI300A contains 24 bleeding-edge CPUs and a GPU engine on the same chip.
This particular design is optimized for classical supercomputing rather than for artificial intelligence, so the comparison between the El Capitan and Google’s Ironwood pod isn’t apples to apples, but it is still relevant in terms of raw computing power and magnitude.
The reality is that supercomputing is being completely redefined by the needs of artificial intelligence. And those pesky complex multivariable problems that classical supercomputers have been designed to tackle are often better managed by AI supercomputers like Ironwood.
As with every successive generation of processors, Ironwood also demonstrates a measurable improvement in power efficiency per unit of compute compared to the previous TPU model – Trillium. In this case, a 2X improvement from the previous generation.
Source: Google
The chart above allows us to easily visualize why there is currently endless demand for AI-related semiconductors each year. When the power efficiency doubles from one generation to the next, data centers can’t afford to not upgrade to the more efficient semiconductors.
Not only do they speed up training and inference times, they have a material impact on operational costs, specifically the cost of energy (electricity).
Energy is the single greatest constraint in the race to artificial general intelligence (AGI), so the more efficient semiconductors are per unit of compute, the better.
The energy constraints are so severe that data centers will go to extraordinary lengths to produce more of it to achieve their goals. Musk and his team at xAI have not been able to get enough electricity off of Memphis’ power grid to power Colossus, so they brought in methane gas generators to produce additional power.
xAI is now using 20 additional generators on top of the 15 units they have been permitted to use. Local regulations allow for the additional generators to be used as long as they aren’t in a fixed location for more than 364 days, giving xAI time to get formal permits in hand.
Aside from the dramatic performance improvements and power efficiency improvements, the Ironwood architecture represents a much larger shift in the industry – what Google is now calling the “age of inference.”
The team at Google Cloud was clear that the industry has evolved from focusing on the massive computational power required to train the large language models to prioritizing the optimization of running artificial intelligence applications (i.e. inference).
We need to remember that Google is an advertising technology company, not a semiconductor company. It designed its TPUs specifically to run machine learning associated with Google search, Google Translate, YouTube algorithms, and ultimately its Gemini large language models… All services designed to drive advertising revenues.
It also designed its own semiconductors and associated software platform TensorFlow to lock developers into the Google Cloud ecosystems. Developers familiar with working with TensorFlow are more likely to use Google Cloud for their cloud computing needs.
This shift towards inference is the story, the really big story. What Google is telling us is that its customers need cloud computing services optimized to run AI applications, not so much to train AI models anymore. This speaks to how widespread adoption of AI software has become.
It may be hard for us to sense it… but the proof is everywhere.
Capital expenditure forecasts continue to increase for AI factories, which are expected to now exceed $1 trillion in annual spend by 2029.
Electricity/energy demands continue to exceed forecasts due to all of the AI factory construction and adoption of AI software.
At a conference just a few days ago, OpenAI CEO Sam Altman mentioned that 10% of the world is now using OpenAI’s artificial intelligence – somewhere between 800 million to 1 billion users.
Mass adoption. I’m afraid it’s already here.
And we’re now experiencing an exponential growth of utilization.
Jeff
The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.
The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.