3 Min Read

We Tested o1 on Real-World Business Use Cases (With Some Surprising Results)

Featured Image

Wondering how to get started with AI? Take our on-demand Piloting AI for Marketers Series.

Learn More

OpenAI's new o1 reasoning model has generated significant buzz for its ability to think through problems step-by-step. But how does it actually perform in real-world business scenarios?

At Marketing AI Institute, we put it to the test against GPT-4o—with some surprising results. On Episode 127 of The Artificial Intelligence Show, me and founder/CEO Paul Roetzer broke down what we learned from our hands-on testing of o1.

The Real-World Test Results

To test o1, Roetzer and myself put both models head-to-head for real-world business tasks we were trying to accomplish in our own work.

“What we were trying to get at was: What are the use cases for business?” says Roetzer. “Whenever I’m testing things, I’m trying to use real-life situations where I can assess whether or not this would actually make a difference in my life.”

We weren’t trying to solve complex math or science problems, which o1 also excels at. (Primarily because neither of us know how to effectively judge the outputs that o1 produces for these types of problems.

Roetzer tested o1 vs. GPT-4o specifically for a complex problem he was trying to solve related to pricing for one of our education products.

“I gave it a couple of things I'm thinking about doing, and then I said basically analyze this for me, this is my goal, how would I best achieve this, the outcome I'm looking for, and I told it, ask any clarifying questions that you need,” he says.

I performed similar tests across use cases like: answering a strategy question about our podcast, creating a content strategy based on performance data, and producing a complex strategic brief based on data collected during one of our workshops.

Both Roetzer and myself came to similar conclusions about o1’s performance vs. GPT-4o.

"o1 asked way more complex and nuanced questions than 4o," says Roetzer. "Immediately, you could see that it was more deeply understanding and considering what I was asking of it, based on the questions that came back to me."

The quality difference was noticeable across several dimensions: o1 asked more sophisticated questions, provided richer explanations, came up with better thoughts and scenarios, exhibited deeper reasoning and, ultimately, produced more valuable context and insights.

"Side by side, o1 crushed this one over 4o," says Roetzer, though he notes that GPT-4o was still quite useful for the tasks he was trying to accomplish.

That doesn’t mean o1 is perfect. Far from it. Right now, you can’t currently upload documents or spreadsheets, which is a huge limitation. (I hacked my data analysis use cases by just copying and pasting unstructured data into o1’s prompt window.)

Also, if you don’t take the time to structure complex problems for o1 to solve, you may not unlock its full capabilities. So it definitely takes some trial and error to really get it to shine.

o1’s Impact on Knowledge Work

It’s impossible to use o1 and not immediately think of the impact on human professionals.

The outputs were just as good, if not better, than ones myself and Roetzer have spent years of our careers either producing or assessing from the work of others.

"If we got this from a human strategist, I would be like, yeah, this person did a fantastic job," says Roetzer. "From an entry-level employee, you'd be like, this person's moving up fast."

Roetzer believes the model is already doing the quality of work you’d see from a solid B-player on your team across many knowledge work tasks. And that’s just after initial tests. As the model gets better—and we get better at integrating it with other tools and outputs—you can really start to see how disruptive this could be to knowledge work.

"If the AI models keep getting better at thinking, reasoning, understanding, and imagination...if it does those things better than the average human who would otherwise do the job in your business, in your industry—then we’ve got some problems," says Roetzer.

Going into 2025, he expects more and more people to wake up to the fact that these models are highly capable of doing valuable knowledge work.

"I just think it's going to become a reality for a lot more people next year,” he says. “They're going to realize how capable these things already are of doing a lot of knowledge work at or above average human level."

 

Related Posts

OpenAI o1: What You Need to Know

Mike Kaput | September 17, 2024

OpenAI has released an initial version of its code-named “Strawberry” project—a new AI model that displays advanced reasoning.

Just How Dangerous Is GPT-4o? What You Need to Know

Mike Kaput | August 13, 2024

OpenAI has pulled back the curtain on its safety work for GPT-4o, revealing a complex and sometimes unsettling picture of AI capabilities and risks.

12 Days of OpenAI: Full o1 Model, ChatGPT Pro, Sora, and More

Mike Kaput | December 17, 2024

OpenAI just kicked off an ambitious “12 Days of OpenAI” holiday campaign featuring major product and feature releases every day in the next couple weeks.