OpenAI just announced a brand-new model that may have just crossed a major threshold in AI capabilities—and it has everyone talking.
The model in question is called o3. And yes, you read that right. They skipped o2 altogether (due to some reported copyright conflicts). Confusing naming aside, o3 is a direct sequel to OpenAI’s advanced reasoning model o1.
But unlike o1, o3 just beat human performance on a notoriously challenging intelligence test, marking yet another leap forward in the race to build smarter and more capable AI.
I talked through what that means with Marketing AI Institute founder and CEO Paul Roetzer on Episode 129 of The Artificial Intelligence Show.
What Is o3?
o3 is an AI model designed to do one thing really well: think deeply about problems before responding. This “chain-of-thought” approach first appeared in o1, but o3 is built to take reasoning further, spending even more time and compute on the hardest of problems.
And it looks like it works.
o3 just became the first model to outperform humans on a specialized intelligence test created by prominent AI researcher François Chollet. The test is called ARC-AGI. It uses simple visual puzzles to measure an ability to learn and adapt to brand-new environments and situations—no prior knowledge required. Humans score around 75% on the test. o3 scored 76%.
That might not sound like a huge difference, but it’s stunning when you learn that GPT-4, a state-of-the-art large language model, basically scored near zero on the same test.
Chollet himself, who has historically been skeptical of AI hype, called o3’s performance “a surprising and important step function increase in AI capabilities.”
Why It Matters
For context, beating human performance on ARC-AGI isn’t about memorizing facts or data. It’s about reasoning. It’s about understanding patterns in unfamiliar territory—something AI has historically struggled with.
According to Chollet, o3 is “doing something fundamentally different” than its predecessors. (Though o3 still whiffs some puzzles that humans solve easily. And there’s already a harder version of ARC-AGI in the works to challenge it further.)
So, does that mean we’re on the doorstep of AGI? Probably not yet. Chollet himself says that beating humans on this test doesn't magically equal AGI.
But o3’s performance suggests that AI is making more meaningful progress on capabilities once thought to be purely human.
But Can It Do Your Job?
It may not even matter if o3 is a precursor to AGI, says Roetzer. What matters is how its very real capabilities impact your day-to-day work.
“These evaluations are nice to talk about, but the thing that actually matters to all of us is—are these models superhuman at the tasks we do every day?," he says. That's the question to answer to determine how reasoning models will actually affect your job.
In other words, it’s one thing for o3 to crush a reasoning puzzle in a lab. It’s another thing entirely for it to handle your specific tasks, in your specific industry, with your specific constraints. And no big AI lab is running official tests on how well o3 can handle, say, product merchandising in retail, or compliance review in healthcare.
As 2025 (and beyond) unfolds, though, these models are almost certainly going to become “superhuman” at more and more tasks—and that’s not just talk. People are already seeing glimpses of advanced reasoning in o1, which many are using heavily despite a hefty $200/month price tag.
According to Sam Altman, OpenAI originally set that o1 price thinking it would remain profitable because usage would be limited. Instead, the model has been used so intensively that it’s costing OpenAI money at the current price point—suggesting these tools have some serious, tangible value for power users.
When o3 finally becomes available—and no date is officially set—it may well handle strategic planning, creative workflows, and other complex tasks more efficiently (or more expertly) than many professionals.
That's what you need to watch out for.
What Happens Next
OpenAI hasn’t shared an exact release timeline for o3. For now, we only have its performance on a handful of difficult benchmarks to go on. And that performance is eye-opening.
But as Roetzer points out, the real question is whether o3 (and subsequent models) become superhuman at the job tasks that make up the backbone of the economy.
“The evaluations that are used to test these are not necessarily representative of the impact on the economy and the workforce," he says. "They’re trying to come up with extremely complicated things that only the elite minds in the world can solve.”
But if an AI system can handle 25-30 tasks you do daily and do them significantly faster and better, that’s when we see a real impact.
“We’re going to start seeing a lot of those tasks where these models can do it better than you,” says Roetzer. “And you’ll find other stuff to do.”
After all, for many professionals, there’s no shortage of higher-value activities to focus on if AI can handle the tedious or time-consuming parts.
But one thing’s for sure: With o3 surpassing humans on a major reasoning benchmark, we’re witnessing yet another leap forward in AI’s capabilities. Whether that’s the first step toward broader human-level AI or just an incremental milestone, it’s another clear signal:
These models are growing more powerful by the day—and they may soon be the best “thinkers” in the room.
Mike Kaput
As Chief Content Officer, Mike Kaput uses content marketing, marketing strategy, and marketing technology to grow and scale traffic, leads, and revenue for Marketing AI Institute. Mike is the co-author of Marketing Artificial Intelligence: AI, Marketing and the Future of Business (Matt Holt Books, 2022). See Mike's full bio.