2 Min Read

Google Bard Now Surpasses GPT-4

Featured Image

Wondering how to get started with AI? Take our on-demand Piloting AI for Marketers Series.

Learn More

Google Bard just made a stunning leap in capabilities…

It just beat GPT-4 on a top leaderboard that evaluates AI models.

The leaderboard, called Chatbot Arena, comes from the Large Model Systems Organization. And it now shows Google Bard (powered by Google's Gemini Pro model) now in 2nd place in terms of performance. 

The leaderboard takes into account 200,000+ human votes on which models users prefer. 

It also assigns an "Elo" rating to each model, which is a method of calculating how good players are at zero-sum games like chess. 

Bard still trails behind GPT-4 Turbo, but now surpasses other versions of GPT-4 and other popular models like Claude and Mistral.

What should you do now that Bard is climbing the rankings?

In Episode 81 of The Marketing AI Show, I got the answer from Marketing AI Institute founder/CEO Paul Roetzer.

Here’s what you need to know…

This Is a Trustworthy Leaderboard

Chatbot Arena isn’t just a random online ranking site, says Roetzer. It’s the real deal.

It’s trusted by some of the top players in AI, including Andrej Karpathy, a leading AI researcher at OpenAI. (In fact, Karpathy says it’s one of only two evaluation sites he trusts.)

It Works By Pitting Models Against Each Other

The human evaluation component of Chatbot Arena works by having you pit two models against each other for the same prompt. (Hence the name.) 

For instance, you can give Bard (powered by Gemini Pro) and GPT-4 the same prompt, get two different outputs, and rate which one is best.

When pitted against several versions of GPT-4, Bard comes out the winner. However, it still falls short when matched against GPT-4 Turbo, the latest version of OpenAI’s most advanced model.

Not to mention, Gemini Pro, which now powers Bard after a December 2023 update, isn’t even the most powerful version of Google’s new models.

Gemini Ultra is the most powerful version of Google’s family of advanced models—and Google plans to incorporate it into its services and AI tools moving forward. Which means Ultra may be an even bigger leap forward.

Your Company Needs to Have Its Own “Chatbot Arena,” Too

This doesn’t mean you should drop all your other tools and switch to Bard, says Roetzer. 

AI tools improve at an insanely fast pace. As Bard shows us, a tool that was lagging behind can quickly become a leader, almost overnight.

“This is why it is so hard to make bets on which platform to use and which ones to integrate into your workflows,” says Roetzer. “Because they keep evolving as to which is best for which use cases.”

“You have to constantly be testing different tools.”

Roetzer recommends having one or more team members test different tools against your core AI use cases (blog writing, summarization, script writing, etc.) every 30-90 days—or whenever the leaderboards see a significant change.

“Go in and run those use case tests against the different systems and see if someone has made a leap forward that changes the kind of technology the rest of your team should be using.”

Related Posts

The Most Important AI Developments from Google I/O

Mike Kaput | May 21, 2024

Google just made some huge AI announcements at Google I/O—some of which could have big implications for marketers and business leaders.

Meta's Ambitious Plan for Open Source General Intelligence

Mike Kaput | January 23, 2024

Meta just surprised the AI world with some big announcements related to building open source artificial general intelligence (AGI).

OpenAI Now Available in Microsoft Azure: What It Means For Your Business

Mike Kaput | January 20, 2023

Microsoft just made another huge move to cement its position as an AI leader.