Microsoft's Big AI Announcements

Written by Mike Kaput | May 28, 2024 2:30:49 PM

Microsoft just wrapped up a huge week of AI announcements.

Based on the announcements, it seems like the tech giant is betting on three big trends:

Multimodal AI, on-device processing, and the rise of AI agents.

What do you need to know about the announcements?

I got the answers from Marketing AI Institute founder and CEO Paul Roetzer on Episode 99 of The Artificial Intelligence Show.

Get ready for new AI-powered PCs

Microsoft kicked things off on May 20th with the unveiling of Copilot+ PCs.

These Windows machines (built by partners like Dell and Samsung) are designed from the ground up for AI.

That's true at both the hardware and software level.

First, the machines are powered by new high-performance chips called neural processing units (NPU).

Second, they leverage both the latest large language models (LLMs) through Microsoft Azure and the company's powerful small language models (SLMs) that run locally on the device.

The standout feature of these new PCs is Recall, an AI-powered feature that tracks and stores everything you see and do on your computer.

According to Microsoft, this data is kept entirely on your device for privacy. But it raises important questions about the tradeoffs we're willing to make for AI-powered convenience.

"Am I going to allow my computer to take screenshots of everything I'm doing every five seconds?" asks Roetzer. "As an employee, if we go over to the corporate side of this, you're not going to have the choice, I'm guessing. This is something IT is going to turn on or not."

For individual users uncomfortable with the feature, you may miss out on some important AI benefits. For employees, you may not have a choice to end up being recorded all the time.

“I think it’s going to get messy,” says Roetzer.

AI that can "see, hear, speak and help in real time"

The day after announcing Copilot+ PCs, Microsoft kicked off its annual Build developer conference.

At the event, Microsoft made a ton of announcements linked by a common thread...

Multimodal AI is the future.

We saw that play out through a few major announcements:

Microsoft announced Phi-3-vision, a multimodal version of its Phi-3 small language model, which is small enough to work locally right on a device.
The company also talked up GPT-4o, which is now available in Microsoft Azure and has vastly enhanced multimodal capabilities, including much better voice chat.
Not to mention, during the Copilot+ PC announcement, the company's newly acquihired AI leader Mustafa Suleyman posted about the fact Copilot will “see, hear, speak and help in real time" to help you do your work.

This multimodal focus aligns with moves from Google, Anthropic, and other AI leaders. It sets the stage for AI that can seamlessly understand and interact with the world the way humans do.

"They're all trying to build AI that can see and create all these different modalities," says Roetzer.

"That is very apparent in what Microsoft is doing. They're doing it on the back of OpenAI's models, but also increasingly their own models."

Meet your new AI coworker

Perhaps most intriguing, the company talked a lot about AI agents coming to Copilot.

These are AI assistants that can take independent actions. And Microsoft says you'll be able to build them yourself in Copilot later this year.

One AI agent Microsoft talked up was Team Copilot, which is an agent that will be able to manage meeting agendas and notes, moderate team chats, assign tasks, and track deadlines using Microsoft apps.

Microsoft appears to be leaning heavily on messaging agents as colleagues and coworkers, not replacements. It's a strategy that could make sense, says Roetzer.

"I think that there's some branding power here that we saw already with Google going this direction," says Roetzer.

"It's a more digestible way to think about agents. People don't think about it as a replacement to them if it's messaged to them as a teammate."

On-device, all the time

Woven through all of Microsoft's announcements is a clear focus on bringing AI directly to devices without the need for constant cloud connection, thanks to smaller, more powerful models and better hardware.

"12 to 18 months from now, many of the things that we currently need to go to the cloud to do, we're going to be able to do on our devices, whether it's your PC, your tablet, your phone."

View full post