OpenAI has pulled back the curtain on its safety work for GPT-4o, the company’s latest model, revealing a complex and sometimes unsettling picture of AI capabilities and risks.
The company's recently released report, which includes a system card and preparedness framework safety scorecard, provides an end-to-end safety assessment of GPT-4o…
And, in the process, it shows just how dangerous advanced AI models can be without guardrails and safety measures.
There’s a lot that any business leader can learn from this report. And Marketing AI Institute founder and CEO Paul Roetzer broke it all down for me on Episode 110 of The Artificial Intelligence Show.
Here’s what you need to know.
The Alien Among Us
When it comes to AI models, there's an important thing to remember:
"These things are alien to us," says Roetzer.
They have capabilities they weren’t specifically programmed to have, and can do things that even the people who built them don’t expect.
"They're also alien to the people who are building them."
For instance, in its safety testing of GPT-4o, OpenAI found tons of potentially dangerous, unintended capabilities that the model was able to exhibit.
Some of the scariest ones revolved around GPT-4o’s voice and reasoning capabilities. The model was found to be able to mimic the voice of users—behavior that OpenAI then trained it not to do. And, it was evaluated by a third party based on its abilities to do what the researchers called “scheming.”
“They tested whether GPT-4o can model itself (self-awareness) and others (theory of mind) in 14 agent and question-answering tasks. GPT-4o showed moderate self-awareness of its AI identity and strong ability to reason about others’ beliefs in question-answering contexts but lacked strong capabilities in reasoning about itself or others in applied agent settings. Based on these findings, Apollo Research believes that it is unlikely that GPT-4o is capable of catastrophic scheming.”
While it’s good news that GPT-4o can’t engage in “catastrophic scheming,” it points to a much bigger point, says Roetzer.
"The models that we use, the ChatGPTs, Geminis, Claudes, Llamas, we are not using anywhere close to the full capabilities of these models," Roetzer explains. "By the time these things are released in some consumer form, they have been run through extensive safety work to try and make them safe for us. So they have far more capabilities than we are given access to."
The Persuasion Problem
One of the most concerning potential capabilities, says Roetzer, is AI’s increasing ability to leverage persuasion across voice and text to convince someone to change their beliefs, attitudes, intentions, motivations, or behaviors.
The good news: OpenAI’s tests found that GPT-4o’s voice model was not more persuasive than a human in political discussions.
The bad news: It probably soon will be, according to Sam Altman himself. Back in 2023, he posted the following:
i expect ai to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes
— Sam Altman (@sama) October 25, 2023
The Safety Paradox
The extensive safety measures implemented by OpenAI reveal a paradoxical situation:
- We need these measures to make AI safe for public use.
- These same measures highlight how powerful and potentially dangerous these models could be without restraints.
"If they had these capabilities before red teaming, one key takeaway for me is it's only a matter of time until someone open sources a model that has the capabilities this model had before they red teamed it and tried to remove those capabilities," Roetzer warns.
As AI continues to advance, several critical questions emerge:
- How can we ensure AI safety when we don't fully understand how these models work?
- What happens if AI develops the ability to hide its true capabilities from us?
- How do we balance the potential benefits of advanced AI with the risks it poses?
Roetzer suggests that we're entering uncharted territory:
"This isn't like some crazy sci-fi theory. We don't know how they work. So it's not a stretch to think that at some point it's going to develop capabilities that it'll just hide from us."
Mike Kaput
As Chief Content Officer, Mike Kaput uses content marketing, marketing strategy, and marketing technology to grow and scale traffic, leads, and revenue for Marketing AI Institute. Mike is the co-author of Marketing Artificial Intelligence: AI, Marketing and the Future of Business (Matt Holt Books, 2022). See Mike's full bio.