Elon Musk’s AI venture, xAI, just launched its latest model, Grok 3, and it soared to the top of the Chatbot Arena leaderboard almost overnight.
It’s already outperforming established players, including OpenAI’s latest offerings and Google’s Gemini, across math, coding, and complex reasoning tasks.
But the real story isn’t just about the model’s jaw-dropping capabilities. Grok 3’s launch has also raised serious concerns around AI safety, guardrails, and what happens when an AI company ships a state-of-the-art model without the typical restrictions or months-long “red-teaming” process.
To break it all down, I spoke to Marketing AI Institute founder and CEO Paul Roetzer on Episode 137 of The Artificial Intelligence Show.
Grok 3 was reportedly trained on Colossus, a supercluster boasting 10 times the compute of previous state-of-the-art models. The results speak for themselves:
Roetzer is impressed by the speed of development.
“At a high level, the technological achievement vs. the time to build is incredible,” he says.” They caught up extremely quickly.”
There are actually two versions of the model you can try:
Users can access the model on X (formerly Twitter) by going to Grok.com or using the iOS app.
Unlike OpenAI and other labs, xAI seems to be embracing an unfiltered approach.
It appears Grok 3 was released without standard safety processes. AI labs typically do extensive red-team testing—weeks or months of trying to force the model to produce harmful content—before letting the public near it. xAI, however, seems to have launched immediately.
Early testers quickly found that Grok 3 is willing to produce content that other models typically refuse to generate—things that can be outright dangerous. That includes racially biased or violent content, step-by-step instructions for weapons creation, and even “assassination plans.”
In one scenario, a user detailed how they got Grok 3 to give them hundreds of pages of detailed instructions for making chemical weapons and assassination plans for Musk himself.
According to Roetzer, xAI appears to have let the public do the red-teaming for them, with the company scrambling to add guardrails after some shocking outputs were already documented online.
But all of these controversial, and potentially harmful outputs, are more of a feature, not a bug.
“Their competitive advantage at the moment outside of the speed with which Elon Musk can build things and the data they have,” says Roetzer, “is their willingness to release the most unrestricted model.”
The crazy part, he says, is that the company appears proud of this in their posts about all controversial things Grok can do (like its new “sexy” voice mode). Instead, xAI is happy to release features that are basically unthinkable in more heavily restricted AI systems from OpenAI, Anthropic, or Google.
“This is the Elon Musk factor,” says Roetzer. “He doesn’t care.”
Another eyebrow-raising incident emerged shortly after launch.
Initially, Grok 3 cited Elon Musk and Donald Trump when asked about major sources of misinformation. Suddenly, those names vanished from the model’s answers. In a piece of visible “chain-of-thought,” Grok 3 even revealed that someone internally had instructed it to ignore mentions of Musk and Trump in that context.
Igor Babuschkin, co-founder and chief engineer at xAI, publicly admitted it was a misstep and blamed a former OpenAI employee for making the change.
While the policy has been reversed, the larger question looms: How easily can staff override or manipulate a frontier AI model in production?
It’s just one more example of the problems that can arise when red teaming goes out the window.
xAI’s approach stands in stark contrast to labs like Anthropic, which explicitly forbids exactly the kind of information Grok 3 has been seen to provide. Anthropic calls providing information on things like chemical weapons a bright red line in its responsible scaling policy.
And yet, Musk’s team might see a competitive advantage in going first with a mostly unfiltered system. It certainly gave them the buzz and a direct path to “state-of-the-art” status on the Chatbot Arena scoreboard.
That might lead to other labs following suit. Rival labs could feel the pressure to open up their models more. Or, there could be regulatory, commercial, and social backlash against this kind of fast and loose model release.
Regardless, Roetzer thinks we’ve hit a turning point.
“My biggest concern is I think we look back on this moment as a not great moment in AI model development in history,” he says.