The reality is that we, the human race, are no longer at the top.
And it’s a matter of our own doing.
The confluence of raw computational power – using parallel processing and advancements in software programming – has resulted in the realization that artificial general intelligence (AGI) is in reach.
Not in a matter of years, but in a period that will be measured in months.
And whether or not the financiers that controlled the purse strings knew what they were investing in or not, it doesn’t matter…
The capital influx to chase the objective – of achieving AGI – now represents a flood of investment capital that simply will not stop.
The wheel has been pushed up and over the top of the mountain.
It is now accelerating down the other side.
Several days ago, xAI released Grok 4, its most advanced, pre-AGI model.
I needed to sleep on it for several days, to experiment with Grok, and to think before I wrote about it. It’s simply not accurate to call Grok 4 a large language model (LLM). Doing so would be a sign of ignorance or bias.
Ignore whatever criticisms you might read about it. Politics weigh heavily, and they completely miss the point.
Something extraordinary has just happened. A shocking leap ahead by the team at xAI. And it’s a vindication of the team’s approach to both hardware and software architecture, as well as their foundational approach to building a maximum truth-seeking AI.
Grok 4 and its more powerful counterpart, Grok 4 Heavy, were both trained on xAI’s Colossus supercomputer, specifically 200,000 GPUs. What’s different from Grok 3?
The result?
Nothing short of superhuman reasoning and intelligence.
Longtime Bleeding Edge readers know that to measure machine intelligence, we look to the benchmarks. We’ve been tracking the rollouts of models for years now, as well as how they perform on a suite of “tests.”
When it comes to the most difficult benchmarks for AGI, no other company is even close to xAI right now.
Humanity’s Last Exam, which is designed to be extremely difficult, even for human experts, is comprised of 2,500 questions across all academic disciplines. The questions require deep reasoning to solve. They’re not the kinds of questions that simply require knowledge retrieval, which would be too easy.
Grok 4 Performance on Humanity’s Last Exam | Source: xAI
Grok 4 Heavy scored a 44.4 compared to the previously highest score of 26.9 by Google’s Gemini Deep Research. That is a huge jump. Better yet, xAI was able to demonstrate – as shown above on the right – that with additional training, Grok 4’s performance improves, in this case just over 50%.
Not satisfied?
Here are the results of the ARC-AGI 2 benchmark, another prominent measurement of an AI’s ability to reason to solve problems. Pure large language models (LLMs) score 0% on this test.
Grok 4, shown below clearly out front on top, scored a remarkable 15.9%. This score is almost double the previous state of the art model. Double.
ARC-AGI 2 Leaderboard | Source: ARC Prize
The ARC-AGI 2 benchmark was just recently introduced this May. It was already clear that existing AI models were making significant progress against the ARC-AGI 1 benchmark, indicating that it simply wasn’t difficult enough.
OpenAI’s o3 had scored 60.8% earlier this year, and Grok 4 came in at 66.7% on ARC-AGI 1.
ARC-AGI Leaderboard Breakdown | Source: ARC Prize
ARC-AGI 2 is monumental. The fact that Grok 4 achieved a 15.9% (16%), double the previous high, is nothing short of remarkable. ARC-AGI 2 is designed to test higher levels of fluid intelligence, abstract reasoning, and sophisticated generalization.
Grok 4, when given time to think, is now more intelligent than pretty much any PhD-level human in any domain of study. Period.
To some, that thought may feel threatening. To others, it may feel empowering. And to those who haven’t experimented with Grok 4, I can’t encourage you enough to start using it. It is so helpful and resourceful, it is hard to describe.
And there’s more…
Grok 4 is an agentic AI.
For those just joining us, I want to personally welcome you to The Bleeding Edge, the best place on the planet for gleaning unique insights and intel from the outer limits of high-tech development.
As a friendly reminder to those who have been following along for years, agentic AI is a trend we began following in early 2024. Here’s a bit of what I wrote…
Agentic AI, or agentic reasoning, is kind of like it sounds.
The technology, the AI, is given agency. It is given the authority or directive to solve a problem or complete a task through a series of steps.
This differs from today’s LLM technology, which provides users with a zero-shot response. When we use something like ChatGPT, we give it a prompt, and then it returns us a complete response. The response is based on the information from our prompt, along with its pre-trained knowledge, and returned in a matter of seconds.
An agentic workflow is quite different. It is an iterative process, where an agentic AI uses a more human-like workflow to accomplish a task.
In Grok’s case, it has been trained on how to use various software tools to get its job done. These are things like software tools for programming, or tools used for browsing the internet for real-time information.
And Grok 4 Heavy is a more powerful version of Grok 4 that takes advantage of test-time compute. An easy way to think about that is how Grok 4 Heavy can create 5, 10, or any number of hypotheses and test them all, in parallel, at the same time.
This approach naturally requires more computational horsepower, which means more electricity (cost). But it also means that complex problems can be solved, or an algorithm can be optimized in a fraction of the time.
And here’s the key…
The productivity improvements are going to come faster than we can imagine.
And Grok 4 comes with a new and improved voice interface, which is ridiculously easy to speak with. It’s in the palm of your hand with a smartphone now… but it’s also on our tablets, desktops, and if you have a Tesla, it’s being rolled out and integrated in all the modern Tesla electric vehicle models, a topic we explore in Monday’s Bleeding Edge – The Omnipresent Grok.
Source: The Bleeding Edge
But the bigger question is: What’s next for Grok?
In the next few weeks, we’re going to see major improvements in Grok’s multi-modal capabilities.
Specifically, it will receive a major upgrade in how it sees, hears, and understands the real world through audio and video inputs.
That might not sound like a big deal, but it is.
As I mentioned on Monday, Grok will have access to the cameras and audio inputs on Tesla EVs. We’ll be able to speak with Grok about what we’re seeing outside the car. And when we hold up our phone, Grok can see and hear exactly what we do, and it will understand our environment in the way that we understand our environment.
And the even more obvious application is to put Grok in Tesla’s humanoid robot Optimus, so that it can better understand its surroundings, and also better interact with us humans.
xAI is catapulting ahead. It has a competitive advantage. And this is resulting in a hyper-acceleration of technological advancement.
What’s next? Take your best guess at how far xAI and Grok will advance by the end of the year. What it accomplished in the last six months was something no other company was capable of doing.
One thing I’m sure of is that the next six months will be even more astonishing…
And with each step, we are that much closer to AGI.
Every day, we accelerate…
Can you grok it, now?
Jeff
The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.
The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.