2 Min Read

ChatGPT Can Now See, Hear, and Speak (And It's a Big Deal)

Featured Image

Wondering how to get started with AI? Take our on-demand Piloting AI for Marketers Series.

Learn More

OpenAI just dropped two big upgrades that bring ChatGPT closer to becoming a true multimodal AI assistant.

First, ChatGPT can now analyze and respond to images. So, you just drop a photo or picture into the tool and then prompt it to interact with an image.

Early use cases for this functionality are stunning.

Second, ChatGPT can now talk to you like Siri or Alexa.

Just speak to it via the app, and it responds.

Tech reporter Kevin Roose had it read him a bedtime story, help him analyze a dream, and chat with him about some work-related stress.

(Notably, he said it was much, much better than Siri or Alexa.)

Right now, most users don’t have access to these features. But they’ll soon be rolled out to ChatGPT Plus and ChatGPT Enterprise users.

Why It Matters

ChatGPT is now the latest and greatest example of a multimodal AI assistant, or an AI assistant that can use different mediums like text, voice, and images to perform actions and produce outputs.

This unlocks huge new use cases for marketers and businesspeople using these tools.

Connecting the Dots

On Episode 66 of The Marketing AI Show, Marketing AI Institute founder and CEO Paul Roetzer walked me through why this is such a big deal.

  1. This makes ChatGPT insanely powerful. You can now do things like have it analyze marketing data charts, write product descriptions from photos, caption social posts, identify product defects from photos, build apps and sites from a simple drawing, and more.
  2. Yet most people still don’t truly comprehend its capabilities. “They’re not really thinking about the depth of the capabilities it has,” says Roetzer. Nor are they really pushing it and experimenting with it fully. Most people are still using a few basic prompts to do rudimentary tasks. There is much, much more to the tool.
  3. Domain experts stand to gain the most from the tool. “This is the kind of tech that becomes extremely powerful when you put it in the hands of people with domain expertise,” says Roetzer. “The professionals who figure out how to use these tools to do this kind of stuff will be able to just leap their peers.”
  4. And don’t sleep on ChatGPT as a voice assistant. OpenAI’s Whisper transcription technology is already incredible. We expect the voice functionality in ChatGPT to similarly excel. It also confirms that the future of all AI assistants will be multimodal, says Roetzer.

What to Do About It

  • Get moving on testing this out. “The leaders in different industries who get there first and test this technology, they're going to find the ways to reinvent their own industries,” says Roetzer. There’s no excuse not to have a ChatGPT Plus license moving forward.

Related Posts

Apple Investing $1 Billion a Year to Catch Up on Generative AI

Mike Kaput | October 31, 2023

Apple is now spending around $1 billion per year to develop its own generative AI capabilities.

Is ChatGPT Enterprise A Game Changer for Enterprise AI Adoption?

Mike Kaput | September 5, 2023

OpenAI just launched ChatGPT Enterprise, a secure, compliant, customizable version of ChatGPT for large businesses. Here's why it matters.

Pay to Play: The Costs of AI Adoption

Mike Kaput | June 6, 2023

There’s one factor not being talked about enough that could dramatically affect AI adoption.