Microsoft's New AI Makes Deepfakes from a Single Photo

Written by Mike Kaput | Apr 23, 2024 3:30:45 PM

Microsoft just debuted an AI model that can deepfake you from a single photo.

The model is called VASA-1. In a research paper, Microsoft detailed how it creates a deepfake video from one photo.

VASA-1 uses the photo and an audio track to create a realistic deepfake of someone talking or singing.

“The VASA framework (short for "Visual Affective Skills Animator") uses machine learning to analyze a static image along with a speech audio clip. It is then able to generate a realistic video with precise facial expressions, head movements, and lip-syncing to the audio.”

Microsoft sees value in using the technology for AI avatars, writing:

"It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors."

However, it says it's not releasing the model's code right now because it could be abused.

But the simple fact remains:

We now have the technology to deepfake anyone who has even a single public photo online.

What does this AI technology mean for you and your business?

I got the answers on Episode 93 of The Artificial Intelligence Show from Marketing AI Institute founder/CEO Paul Roetzer.

Deepfakes are improving faster than you think

Deepfakes are one area where AI technology has rapidly progressed. And not enough people understand what's now possible, says Roetzer.

“We’re just in a different place and most of society is blissfully unaware that this stuff can now happen," he says.

In Roetzer's book, Marketing Artificial Intelligence, published in 2022, he wrote the following:

"Vision can also be applied to produce deepfake videos in which a person in an existing image or video is replaced with someone else’s likeness. The prevalence and impact of deepfake videos is just beginning, and having an understanding of the underlying technology will help you prepare your brand for its potential impact.

I did a fair amount of crisis communications planning early in my career. Basically, you envision different scenarios of what could go wrong, then put strategies in place for how the organization will react. Then, you hope none of it actually happens. Never did I imagine a day in which brands would be planning for deepfake videos of executives doing and saying things that never happened in real life. But, here we are.

AI has made it possible—and relatively easy with the right resources—to create fake videos of people that appear and sound real. According to Siwei Lyu, who works for the Defense Department developing software to detect and prevent the spread of deepfakes, “it only takes about 500 images or 10 seconds of video to create a realistic deepfake.”“That means all those social media photos and YouTube videos your company shares could be used against your brand. So the next time you meet with the PR team to talk about crisis communications, make sure to put deepfake videos on the agenda.”

In 2022, it took 500 images to create a realistic deepfake.

Now, it takes one.

What a difference a couple of years make.

Most individuals and companies are still woefully unprepared for this reality.

And they're going to be everywhere

And the reality of realistic deepfakes being everywhere could happen soon.

If Microsoft is working on this, so are other AI labs. Just because Microsoft isn't releasing the technology doesn't mean someone else won't.

And that's if no one creates an open source version first.

“So if we hear about this tech from Microsoft or from OpenAI or whomever, assume that within six months it will be open sourced by someone and this stuff will be all over the web," says Roetzer.

Once that happens, others will release their versions of this in order to compete.

“It just takes one person to put it out in the world and it’s like ‘Oh OK, someone else did it, now we can go do it.’”

View full post