64 Min Read

[The AI Show Episode 138]: Introducing GPT-4.5, Claude 3.7 Sonnet, Alexa+, Deep Research Now in ChatGPT Plus & How AI Is Disrupting Writing

Featured Image

Wondering how to get started with AI? Take our on-demand Piloting AI for Marketers Series.

Learn More

AI is getting smarter—and more emotionally aware. 

This week, Mike and Paul highlight the biggest AI news and releases, with a major focus on how artificial intelligence is evolving to understand emotions. They break down the latest updates on GPT-4.5 and Claude 3.7 Sonnet, Amazon’s Alexa+, the significance of OpenAI's Deep Research becoming more widely available, and the impact of AI on writing. Plus, don’t miss our rapid-fire roundup covering even more developments in AI.

Listen or watch below—and see below for show notes and the transcript.

Listen Now

Watch the Video

Timestamps

00:04:55 — GPT-4.5

00:19:58 — Claude 3.7 Sonnet

00:28:20 — Amazon’s New Alexa

00:40:13 — Apple Siri in 2027 

00:46:02 — ChatGPT Deep Research Now Available to All Paying Users, & Voice Mode for All

00:52:22 — Agency > Intelligence

01:00:22 — Meta Plans to Release Standalone Meta AI App 

01:04:33 — Robots in the Home and Workplace

01:08:35 — Lmarena.ai Prompt-to-Leaderboard

01:11:43 — David Perell on How Writing Is Changing Thanks to AI

01:17:01 — AI’s Impact on the Future of HubSpot

01:21:15 — Listener Questions

  • How do you handle the known issues with AI hallucinations? Any practical tips?

01:23:43 — AI Text to Voice Releases

Summary 

GPT 4.5

OpenAI has just unveiled GPT-4.5, their “largest and best model for chat yet,” according to a company announcement.

OpenAI says of the model:

“Early testing shows that interacting with GPT‑4.5 feels more natural. Its broader knowledge base, improved ability to follow user intent, and greater 'EQ' make it useful for tasks like improving writing, programming, and solving practical problems. We also expect it to hallucinate less.”

The model demonstrates impressive factual accuracy compared to its predecessors. In internal testing on what OpenAI calls SimpleQA—a benchmark measuring factual knowledge—GPT-4.5 achieved a 62.5% accuracy rate, significantly outperforming GPT-4o's 38.2%. Similarly, it reduced hallucination rates from 61.8% to 37.1%.

Human testers also showed a clear preference for GPT-4.5 over GPT-4o, particularly for creative tasks, professional queries, and everyday conversations. The model's responses are notably more succinct and conversational, with an intuitive understanding of when to provide brief, empathetic answers versus detailed information.

However, Sam Altman also notes some obvious flaws with the model at the moment. It’s, in his words, a “giant, expensive model.” And it’s only available at the moment to ChatGPT Pro users, who pay $200 a month for that license. 

Claude 3.7 Sonnet

Anthropic has also released Claude 3.7 Sonnet, its most intelligent AI model to date and what they're calling the first "hybrid reasoning model" on the market. 

What makes Claude 3.7 Sonnet unique is its dual-mode approach. Users can choose between standard mode for quick responses or an extended thinking mode where the model performs step-by-step reasoning that's made visible to the user. 

This is a significant departure from other reasoning models in the market, as Anthropic believes that: "Just as humans use a single brain for both quick responses and deep reflection, we believe reasoning should be an integrated capability of frontier models rather than a separate model entirely."

In early testing, Claude 3.7 Sonnet has shown particularly impressive results in coding and front-end web development. 

Along with the model release, Anthropic has also introduced Claude Code, a command-line tool for "agentic coding" available as a limited research preview. This tool enables developers to delegate substantial engineering tasks to Claude directly from their terminal. 

Claude Code can search and read code, edit files, write and run tests, commit and push code to GitHub, and use command line tools—keeping the human developer informed at each step. 

Claude 3.7 Sonnet is now available on all Claude plans—including Free, Pro, Team, and Enterprise—as well as through the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. However, the extended thinking mode is only available on paid tiers. 

Amazon’s New Alexa

Amazon has just unveiled Alexa+, a complete reimagining of its voice assistant powered by generative AI technologies. 

This major overhaul transforms Alexa from the stilted, single-question interactions users are familiar with into a genuinely conversational assistant capable of understanding context, remembering preferences, and taking meaningful actions. 

Alexa+ is designed with an impressive range of capabilities that extend far beyond simple queries. Amazon says the new assistant can answer personalized questions about your life and activities drawing on information from a customer's Amazon account.

A standout feature is Alexa's new visual understanding capabilities. Through a device's camera, it can process a video feed and respond to questions about what it sees. Amazon also highlighted Alexa+'s ability to understand tone and environmental context, adjusting its responses accordingly. 

Beyond basic assistance, Alexa+ introduces powerful productivity features. Users can upload files, documents, and emails that Alexa will parse and reference in future conversations. For example, a user could ask, "I forwarded a work schedule, are there any interesting events I need to be aware of?" and Alexa will highlight key items from the document. The assistant can even take actions based on this information, such as adding text from a document to a calendar or creating reminders from specific details.

The integration with Amazon's broader ecosystem appears seamless. Alexa+ works with Echo Show smart displays to power personalized content feeds and provides a new "For You" panel with timely updates based on user interests. It can control smart home devices, play music from Amazon Music on connected speakers, and even direct Fire TV devices to skip to particular scenes in movies or shows.

Alexa+ will cost $19.99 per month but will be available free for Amazon Prime members. The rollout begins in the coming weeks with an early access period prioritizing owners of Echo Show 8, 10, 15, and 21 devices, followed by a wider release over the coming months.


This episode is brought to you by our AI for Writers Summit:

Join us and learn how to build strategies that future-proof your career or content team, transform your storytelling, and enhance productivity without sacrificing creativity.

The Summit takes place virtually from 12:00pm - 5:00pm ET on Thursday, March 6. There is a free registration option, as well as paid ticket options that also give you on-demand access after the event.

To register, go to www.aiwritersummit.com 


This episode is also brought to you by our 2025 State of Marketing AI Report:

Last year, we uncovered insights from nearly 1,800 marketing and business leaders, revealing how AI is being adopted and utilized in their industries.

This year, we’re aiming even higher—and we need your input. Take a few minutes to share your perspective by completing this year’s survey at www.stateofmarketingai.com.

Read the Transcription

Disclaimer: This transcription was written by AI, thanks to Descript, and has not been edited for content. 

[00:00:00] Paul Roetzer: These models already are superhuman at persuasion. It's just red teamed out of them, persuasion is the ability to convince people to change their beliefs, attitudes, intentions, motivations, behaviors. It uses advanced reasoning, it uses emotional appeals, and so I think persuasion starts to become like a truly concerning area of development.

[00:00:25] Paul Roetzer: Welcome to the Artificial Intelligence Show, the podcast that helps your business grow smarter by making AI approachable and actionable. My name is Paul Roetzer. I'm the founder and CEO of Marketing AI Institute, and I'm your host. Each week I'm joined by my co-host and marketing AI Institute Chief Content Officer Mike Kaput, as we break down all the AI news that matters and give you insights and perspectives that you can use to advance your company, company, and your career.

[00:00:54] Paul Roetzer: Join us as we accelerate AI literacy for all.[00:01:00] 

[00:01:01] Paul Roetzer: Welcome to episode 1 38 of the Artificial Intelligence Show. I'm your host Paul Roetzer on with my co-host Mike Kaput. It is AI for Writers Summit Week presented by Goldcast. so Mike and I are doing this Monday, March 3rd, 11:00 AM Eastern time. We are recording. We will be live for the Writer's Summit, on March 6th.

[00:01:24] Paul Roetzer: So that is coming up. So if you're listening to this on March 4th, fifth, or maybe even the morning of March 6th, and you want to join us for the virtual AI for Writer Summit, you can do that still. So it is coming up from noon to five Eastern time on Thursday, March 6th. This is the third annual summit.

[00:01:41] Paul Roetzer: last year was more than 4,500 people. I think it was 90 countries. Yeah. What the number was. so you can go to ai writer summit.com. You can also find it on the Marketing AI Institute website. The event has, gosh, there's about six sessions, so I'm gonna run through [00:02:00] real quick. So we got the state of AI for writers and creators.

[00:02:02] Paul Roetzer: That's my opening keynote. we have a panel discussion on ai, copyright, and ip, what writers and creators need to know. That's always. A fan favorite, and I'm very much looking forward to the updates, from Jen Leonard and Rachel Dooley on that one. That is a, I think that's actually a fireside chat. If I remember correctly, we have Andy Cina doing mastering AI prompting, harnessing creative Potential.

[00:02:26] Paul Roetzer: We've got Mike Kaput doing AI powered research, research, transforming how writers, discover and create. We have a relaxation exercise with Tamara Roski, our director of Partnership's, always, popular. We have an amazing conversation with Mitch Joel on the future of creativity, AI storytelling, and the writers evolution.

[00:02:47] Paul Roetzer: And then it wraps up with an ask us anything on navigating AI for writers and creators with myself, Mike, Rachel, and Andy. And then I'll have some closing remarks. So it actually is gonna wrap up about four 30 Eastern time, so noon to four 30 [00:03:00] Eastern on March 6th, AI writer summit.com. Again, a big thank you to Goldcast who's our presenting sponsor for the event.

[00:03:07] Paul Roetzer: We use Goldcast for our virtual summits. We have three annual virtual summits now, I think. one of the standout features for us is their AI powered content lab, which takes all the event recordings and instantly turns them into ready to use video clips, transcripts, and social content, which saves our team a ton of manual work and hours.

[00:03:27] Paul Roetzer: So if you're running virtual events and wanna maximize your content in an effortless way, check out Goldcast. You can go to Goldcast io to learn more. And if you join us for the Ai Writer Summit on Thursday, you'll get to experience Goldcast for yourself. Also just a quick note again, on the state of marketing AI report.

[00:03:46] Paul Roetzer: We are currently collecting responses for the 2025 survey and report. You can get, go to that at state of marketing ai.com. Last rate over 1800 people. Mike, I [00:04:00] think we're probably closing on close to a thousand responses I would imagine by this point. Yeah, 

[00:04:03] Mike Kaput: we're, we're tracking really well so far, but yeah, awesome.

[00:04:06] Mike Kaput: Don't, don't hesitate to go take the survey. 

[00:04:08] Paul Roetzer: Yeah, we'd love as much feedback as possible from as diverse a group as possible. so please check that out. State of marketing ai.com. While you're there, you can not only click the link and take the survey. You can download the 2024 report and kind of see where we are, where we were, last year.

[00:04:26] Paul Roetzer: and then when the new report's ready, you will get an email alerting you and you'll be able to download the new report, which will be coming out spring, summer, I think is what we said. Yep. Yeah, spring, summer. Yeah. Okay. All right. So that's it. we had a big week last week with model news with GPT-4 0.5, Claude 3.7, some new Alexa stuff to talk about and, and not much sorry to talk about, which explain why that's relevant in a few minutes here.

[00:04:55] GPT-4.5

[00:04:55] Paul Roetzer: All right, Mike. So GPT-4 0.5 kicks us off this week. 

[00:04:59] Mike Kaput: [00:05:00] Yes, it does. So OpenAI has unveiled it. GPT-4 0.5 is out in the wild. They say it is their quote, largest and best model for chat, yet they say of the model quote. Early testing shows that interacting with GPT-4 0.5 feels more natural. It's broader knowledge base, improved ability to follow user intent and greater EQ make it useful for tasks like improving writing, programming, and solving practical problems.

[00:05:28] Mike Kaput: We also expect it to hallucinate less. That sentiment was echoed by Sam Altman, who also posted good news. It is the first model that feels like talking to a thoughtful person to me. I've had several moments where I've sat back in my chair and been astonished at getting actually good advice from an ai.

[00:05:47] Mike Kaput: The model demonstrates impressive factual accuracy compared to predecessors In internal testing on what OpenAI calls simple qa, which is a benchmark measuring factual knowledge, 4.5 achieved a [00:06:00] 62.5% accuracy rate, which is significantly outperforming GPT-4 OH'S 38.2%. Similarly, it reduced hallucination rates from 61.8% to 37.1%.

[00:06:15] Mike Kaput: Human testers apparently also according to OpenAI, showed a clear preference for 4.5 over 4.0, particularly for creative tasks and everyday conversations. The model's responses are notably more succinct and conversational. It has a more intuitive understanding of when to provide brief, empathetic answers versus detailed information.

[00:06:38] Mike Kaput: Now, Altman and OpenAI also note that there are some obvious flaws and limitations with 4.5 at the moment. Altman says it is quote, a giant expensive model, and it is only available at the moment to GPT Chat, GPT Pro users. The ones who pay $200 a month for that license [00:07:00] says Altman quote. We really wanted to launch it to Plus and Pro at the same time, but we've been growing a lot in our out of GPUs.

[00:07:08] Mike Kaput: We will add tens of thousands of GPUs next week and roll it out to the plus tier then hundreds of thousands coming soon. And I'm pretty sure y'all will use everyone we can rack up. He also makes it very clear, this is not a reasoning model. It will not quote, crush benchmarks. He says quote, it's a different kind of intelligence and there's a magic to it I haven't felt before.

[00:07:31] Mike Kaput: So Paul, this seems almost like they kind of optimized a frontier model, almost like for vibes, which is weird to say, but seems like what they were going for here. What are your initial thoughts so far on 4.5? Do any of these pros and cons of the model particularly jump out to you? 

[00:07:48] Paul Roetzer: I think it's more a sign of what's coming versus being some obvious leap forward in capabilities and performance.

[00:07:55] Paul Roetzer: I've personally been using it, I was using it this morning, [00:08:00] as I was kind of getting ready for the podcast and I was experimenting with some prompts. I think you need to have like an arsenal of specific. applications or prompts that you test these things on? Like Ethan Mollick, you know, does a great job with this.

[00:08:15] Paul Roetzer: Yeah. He's got these, like same prompts he uses every time and it's like, okay, yes, I can see and feel the difference. I don't think the average user will feel the difference, or, or, you know, if you just start using it, start to see like these outputs where you're just like, oh my gosh, this is such a massive leap over four.

[00:08:33] Paul Roetzer: And I don't think that's the point. so a couple of notes it, they say it does have access to updated information, including search, it supports files and image uploads. It can use Canvas, for writing and coding, but it does not support multimodal features like voice. So you can't go into advanced voice, even if you have the pro account, you're not gonna get to talk to 4.5 yet.

[00:08:55] Paul Roetzer: video and screen sharing, those aren't in there yet. That'll, you know, kind of come later on. [00:09:00] there's a few things of, I think very noteworthy. Like as I sp started spending more time thinking about this, this morning in preparation, um. A couple of things jumped out at to me. So first, this, ongoing debate about scaling laws.

[00:09:13] Paul Roetzer: And, you know, there's the two methods. Now there's throw more Nvidia chips, more, you know, more compute and more data at these things. And, you know, let them learn and get smarter. And then there's the reasoning, like the test time compute, where you give them more time to think. So this is the latter or the prior, this is the unsupervised learning.

[00:09:32] Paul Roetzer: You know, giving it more compute, giving it more data, 10 times, probably more than GPT-4 is the belief. And, and see what happens. See what kind of comes out the other side. And so what they claim is by doing this, by giving it roughly 10 x more pre-training, compute, these things start to recognize patterns better.

[00:09:53] Paul Roetzer: They draw connections, they generate more creative insights without reasoning. And then GPT five is [00:10:00] where we'll get this merger of the models and it'll now have the reasoning abilities as well. So the reason you may not experience some dramatic feeling, in terms of the difference of the output is because it's sort of all just this underlying broader knowledge, deeper understanding of the world.

[00:10:16] Paul Roetzer: I thought Andres Carpathy, who we've talked about many, many times on this show, but he was, ad open AI for a couple of stints. He had a great tweet that sort of like gave his personal perspective and I thought I'd read that real quick, or excerpts of it. It was a pretty long tweet because I think it sort of sets the stage here.

[00:10:35] Paul Roetzer: So he said, I've been looking forward to this for two years, ever since GPT-4 was released. Because this release offers a qualitative measurement of the slope of improvement. You get out of scaling pre-training com compute, which means simply training a bigger model. So he's the one that's saying each 0.5 in a version is roughly 10 x pre-training compute.

[00:10:55] Paul Roetzer: So that's just more Nvidia chips being applied to this stuff basically. So he [00:11:00] said for context recall, G PT one barely generates coherent text. GPT two was a confused toy. In his words, they skipped 2.5, went right to three, which was interesting. And Mike, if I'm not mistaken, GPT-3 was what was in the world when you and I wrote the marketing artificial intelligence book.

[00:11:17] Paul Roetzer: Yes. So there was a, there was a section I wrote where I said, what happens when machines can write like humans? That section was written in, I think I wrote that in early 2022. And it would've been projecting out like what we were seeing, seeing already happening. And we knew we were gonna enter this phase where these things could write like humans.

[00:11:39] Paul Roetzer: So this is before the chat GPT moment, but we were already seeing this enough that we wrote about it in our book as like sort of an inevitable outcome. so then Andres Carin continues, GPT-3 0.5 crossed the threshold where it was enough to actually ship as a product and sparked open AI's chat GPT moment.

[00:11:57] Paul Roetzer: GPT-4 in turn also [00:12:00] felt better, but I'll say it definitely felt subtle. I remember being part of a hackathon trying to find concrete prompts where GPT-4 outperformed G 3.5. So again, like this is someone who is sitting in these labs having this same debate back from 3.5, which was the first version of Chad CPT in November 22 to GPT-4, which came out in March 23.

[00:12:25] Paul Roetzer: and so they were having this same battle internally, like we're trying to find the subtleties, trying to find, . It's just smarter. It just feels different. It feels better, but it's hard to like explain hard. So then he goes on to say, we do actually expect to see an improvement in tasks that are not reasoning, or this is actually going back to, um.

[00:12:42] Paul Roetzer: Yeah, yeah. This is still Andres, improvement tasks that are not reasoning heavy. And I would say those are tasks that are more EQ as opposed to IQ related. And, bottleneck by, for example, world knowledge, creativity, knowledge, ma ana, an analogy [00:13:00] making, there we go. general understanding, humor, et cetera.

[00:13:03] Paul Roetzer: So these are tests that I was most interested in doing during my vibe check. So for me, I started focusing in on this EQ versus IQ concept because I think this is very, very fundamental to understand where these things go. And that's why I'm saying I see 4.5 more as a prelude and honestly like, I think it gives us a few months, not much more than that.

[00:13:27] Paul Roetzer: 'cause five is coming to grapple with what, what it means when these models become high in eq. So, some context here. So in the GP GT five post from. They highlight right toward the beginning, combining deep understanding of the world with improved collaborate collaboration results in a model that integrates ideas naturally in warm and intuitive conversations that are more attuned to human collaboration.

[00:13:56] Paul Roetzer: GPT-4 0.5 has a better understanding of [00:14:00] what humans need and interprets subtle clues or implicit expectations with greater nuance and eq, emotional quotient, right? That's what EQ stands for. Emotional 

[00:14:08] Mike Kaput: quotient. Yeah. It's like a, it's like emotional intelligence, I guess. Yeah. And yeah, it'd be emotional intelligence quotient whenever.

[00:14:14] Mike Kaput: Yeah. 

[00:14:14] Paul Roetzer: Yeah. All right. So G PT five also shows stronger aesthetic intuition and creativity. It excels at helping with writing and design. So to me, the EQ part is what really matters here, because it moves models more into the realm of skills, traits, and even professions that we perceive to still be uniquely human or like safe.

[00:14:35] Paul Roetzer: So IQ provides the foundation for solving intellectual, technical, analytical challenges. EQ is all about navigating social complexities, communicating clearly, handle handling, emotional nuances. So when we think about what is the impact of EQ as these models, whether it's clawed or Gemini or in this case GPT-4 0.5, as they [00:15:00] become higher emotional intelligence, it enables interactions that start to feel way more natural.

[00:15:06] Paul Roetzer: It gives the AI a feeling of empathy that it can, it can seem more empathetic and it can seem more human-like. It then becomes better at task performance because it helps to discern like the subtleties of intentions behind the user's request. 'cause it actually sort of understands humans a little bit better.

[00:15:25] Paul Roetzer: This leads to better supporting complex tasks like writing and, customer service and things like that. It does then reduce misunderstandings and errors like hallucinations just naturally fall because it starts to understand the intent behind prompts more. . So I think that as we start to get this emotional intelligence, it starts to change the way we interact with these models.

[00:15:49] Paul Roetzer: It starts to change the use cases in a business environment for the models. And it starts to probably creep more into these professions that we [00:16:00] thought were maybe safer from ai. And so that kind of led me to think about, well, what are the ramifications of this? Like as the emotional intelligence increases, what do we now have to face both in business and society?

[00:16:15] Paul Roetzer: And so a couple of things that came to mind. one is manipulation risks. So AI could be subtly manipulating the user by appealing directly to their emotions that, enables them to start affecting decisions and behaviors. Privacy and data. So these AI systems have to analyze and understand deep emotional cues, often requiring access to sensitive data.

[00:16:38] Paul Roetzer: So this is where, you know, Sam has alluded to this, that the future of their models, and certainly we've heard this with other model companies, is memory and personalization are the keys. It wants to remember every interaction. It wants to personalize the experience to you. So EQ is a path to true personalization.

[00:16:56] Paul Roetzer: And if you have something that can talk in a very [00:17:00] natural way to you and be empathetic to you and truly understand your emotions and your needs, or at least perceive that it is now, you get de dealing with these emotional bonds and dependencies that people develop with their ai, which we're already starting to see with models that don't have high emotional intelligence.

[00:17:19] Paul Roetzer: And this leads to maybe the biggest concern of all, which is. Earlier last year on the podcast, I shared a tweet from Sam where he said he thought these machines would be superhuman at persuasion before they were super superhuman at anything else. And so in the AI exposure key that we talked about, when I was sharing the jobs GPT two stuff and that I created last year, one of the key exposures is level eight, which is persuasion abilities.

[00:17:48] Paul Roetzer: And as I've said before, these models already are superhuman at persuasion. It's just red teamed out of them. . Like persuasion is the ability to [00:18:00] convince people to change their beliefs, attitudes, intentions, motivations, behaviors. And it uses advanced reasoning. It uses emotional appeals. It uses the ability to understand and influence people's, you know, emotional intelligence.

[00:18:14] Paul Roetzer: And so I think persuasion starts to become like a truly concerning area of development. So. Again, just to recap, like are you gonna go into 4.5 if you're paying the 200 bucks a month and like feel the difference? I don't know, may maybe for some prompts or use cases you might. But I think the underlying thing here is OpenAI is putting this into the world three months roughly before they launch G PT five, which will not only have higher emotional intelligence because go back to Carpo tweet 10 X.

[00:18:47] Paul Roetzer: So if my math is doing this right from GPT-4 to G PT five is a hundred x increase in compute, go 10 x to 10 x. So you're gonna not only have a much more powerful model, you're [00:19:00] gonna have reasoning layered over that model. And you're probably gonna see a massive leap in the emotional intelligence once you layer reasoning over a already more powerful model.

[00:19:11] Paul Roetzer: So I think it's probably just very important that people don't gloss over this release as like, eh, it's the same. I don't really see the difference. That's not the point. I think the point is to prepare us for GPT five, which will likely be a leap of sorts over what you're used to, and it will have the reasoning capabilities baked into it.

[00:19:31] Paul Roetzer: And I am very, very confident in saying that. Like no one is really prepared for that. Like in business. again, Mike, you and I sit in these meetings all the time. We run workshops, we do talk, you just show people like the most fundamental things like image generation. Yeah. And they're just like, jaws on the floor blown away.

[00:19:48] Paul Roetzer: This is possible. They're not thinking about like where these things are going and what they're truly gonna be capable of. 

[00:19:58] Claude 3.7 Sonnet

[00:19:58] Mike Kaput: In another big topic [00:20:00] this week we have another major model release because Anthropic has also released Quad 3.7 sonnet, which is its most intelligent AI model to date. And what they're calling the first quote hybrid reasoning model on the market.

[00:20:15] Mike Kaput: What makes this model so unique is its dual mode approach. Users can choose between a standard mode for quick responses or an extended thinking mode, where the model performs step-by-step reasoning that's made visible to the user. So Anthropics says this is because they believe that quote, just as humans use a single brain for both quick responses and deep reflection, we believe reasoning should be an integrated capability of frontier models rather than a separate model entirely.

[00:20:47] Mike Kaput: In early testing, cloth 3.7, sonet has shown particularly impressive results in coding and web development. Some major tech companies, according to Anthropic, have already noticed [00:21:00] improvements. AI programming assistant company Cursor found Quad to be best in class for real world coding tasks. Sal highlighted its exceptional position for complex agent workflows.

[00:21:13] Mike Kaput: Repli has reportedly succeeded in using the model to build sophisticated web apps from scratch where other models stall. Now, along with this model release, anthropic also introduced Claude Code, which is a command line tool for a Gentech coating available as a limited research preview. This tool enables developers to delegate substantial engineering tasks to Claude directly from their terminal.

[00:21:40] Mike Kaput: Claude Code can search and read code, edit files, write and run tests, commit and push code to GitHub, and use command line tools, keeping the human developer informed at each step. So, Claude 3.7, if you're trying to use 3.7 sonnet, that's now available on all Claude plans, as [00:22:00] well as through the Anthropic API.

[00:22:02] Mike Kaput: However, the extended thinking mode is only available on the paid tiers. So Paul. You had just alluded to this, this seems like a preview of where we're headed with GPT five models that bake thinking right into a single model. What do you think of their hybrid reasoning approach and also kind of their justification for it as the way the human brain works?

[00:22:26] Paul Roetzer: I, this reminds me so much of their fall launch of computer use where they presented it's like this groundbreaking thing that only they had solved. and this is not a knock on anthropic, by the way. This is just how they're doing their marketing right now and communication stuff. so they're presenting this as like, we've cracked that reasoning should be part of these models.

[00:22:45] Paul Roetzer: Like everybody's doing this like this. They're just literally put out a 3.7 just to be first to market with some early version of an LLM plus reasoning. But it's literally what Gemini is doing. It's what opening AI is gonna do with ChatGPT, things like that. [00:23:00] So yeah, it all response has been really positive that I've seen.

[00:23:04] Paul Roetzer: I have not personally tested 3.7. But everything I've seen about it is that it's a very strong model. I thought it was interesting in their, I think it was their system card where they said it that they've optimized less for math and computer science competition problems, which you kind of highlighted as like, where it's actually really seems really good, like really good.

[00:23:23] Paul Roetzer: And instead they shifted their focus towards real world tasks that better reflect how businesses actually use LLMs. But I couldn't actually find any reports of what those were. Like I, yeah, it just alluded to it, but it didn't show those. So, It's good. Like that's what we were saying a couple episodes ago, like, that's what we want.

[00:23:40] Paul Roetzer: It's like focus on actual use cases. So that's great. Like if they're doing that, I would love to see that research. One thing jumped out to, to me is they shared this timeline in the post. . I dunno if this is the system card post or the original post. In this timeline, they show Claude assistance, then they [00:24:00] 2024, then they showed Claude collaborators, which is 2025, where it's like helping you do extensive work, you know, in in much shorter time period.

[00:24:08] Paul Roetzer: So Claude does hours of independent work for you on par with experts expanding what every person or team is capable of. So that's like 3.7. Then I assume Claude four Opus, like, 'cause again, this isn't even their biggest model. Yeah. Like the thing with philanthropic that's kind of odd is like Opus is their biggest model and they just keep releasing Haiku and sa.

[00:24:28] Paul Roetzer: I think haiku's still a thing. Haiku's like their mini, and then SA is like their medium and then Opus is the big model. . Right. And that's the one that we've been sitting here waiting for, for like 12 months. So my guess is at some point we get like four Opus, or maybe it's four sonnet. I don't, I don't know what they're gonna do, but they're obviously like 3.7 is very much this intermediary before intermediate step before the floor.

[00:24:55] Paul Roetzer: Then on their timeline, they show 2026 isn't present. [00:25:00] They just go right to 2027. It says, Claude Pioneers, Claude finds breakthrough solutions to challenging problems that would've taken teams years to achieve. So they're very much following, and the graph is representative of scaling laws. So if you look at this, and I think it is an intentionally showing what a scaling law graph might look like, and based on Dario ADE's comments that we talked about on the podcast a couple weeks ago, they seem to very much be positioning themselves as like Claude five, I'm guessing in their world, is like a GI.

[00:25:36] Paul Roetzer: . And so this is again, a step toward it is the first one to combine reasoning and traditional LLMs. And I think it is a prelude to these much bigger things that DIO has already alluded to, which again tells me. They're, you know, we talked about the two kind of scaling laws upfront here with GPT-4 0.5.

[00:25:58] Paul Roetzer: You have the traditional [00:26:00] unsupervised training, more compute, more data, and they just get, you know, better and smarter. And then you have the reasoning, which is the test time, which is like, give them more time to think. And when you combine those two scaling laws, the assumption from all these major labs seems to be we get a GI that we, we enter the phrase where these things are just now better than humans at basically all cognitive tasks.

[00:26:21] Paul Roetzer: And the 2027 that they marked here seems to be around where everybody is kind of centering on what we will have it by then, which isn't very far away, Mike. 

[00:26:33] Mike Kaput: Yeah, no kidding. Right? It's kinda snuck up on me. you know, it's interesting too because Anthropic comes out with this model, it seems to yet pretty good press and reviews.

[00:26:45] Mike Kaput: The Wall Street Journal has just announced they're trying to raise another 3.5 billion at. A $61.5 billion valuation. I mean, despite the competition, it still seems like Anthropic is a major player here. Obviously not as [00:27:00] well-funded or as big as some of the others, but it might not matter if they make a DGI first, I guess.

[00:27:06] Paul Roetzer: Yeah. And again, I think there's still a chance Anthropic manages to stay independent and achieve their research and business missions and, you know, get the a GI and maybe, you know, become one of those top three model companies. Yeah. I also think there's a probably greater probability that they get acquired or, or maybe honestly, like maybe there plays to be a wildly profitable lab and product company that just infuses their models into other people's .

[00:27:36] Paul Roetzer: Technology and distribution channels, because like we talked about last week, they have nothing other than that. Like they. They, they maybe have great models. Well, they definitely have great models and give them that. They have an amazing research team. They have apparently, like a greater focus on safety and alignment than other people.

[00:27:50] Paul Roetzer: And they're doing some interesting things there. They have no data, like nothing of their own. They have no products to, you know, that they would get data from. They have no distribution other than the [00:28:00] app itself. So they, they're kind of like a little bit behind when it, you know, compared to an Amazon on Google Meta, OpenAI increasingly.

[00:28:12] Paul Roetzer: But yeah, and that's, that's just the challenge. They have brilliant researchers, great models, no data, no distribution. 

[00:28:20] Amazon’s New Alexa

[00:28:20] Mike Kaput: Our third big topic this week, Amazon has just unveiled Alexa Plus, which is a complete re-imagining of its voice assistant powered by generative AI at an event in New York. Amazon's devices and services chief called it a complete re-architecture of the AI assistant.

[00:28:38] Mike Kaput: This major overhaul transforms Alexa from that kind of stilted single question interactions that users are familiar with into a genuinely conversational assistant capable of understanding context, remembering preferences, and taking meaningful actions. The company demonstrated flowing natural conversations that represent a pretty [00:29:00] significant departure from the command based interactions that have defined Alexa up until this point.

[00:29:05] Mike Kaput: So Alexa Plus has an impressive range of capabilities that go way beyond just simple queries as well. Amazon says The new assistant can answer personalized questions about your life and activities. Things like how many books have I read this year? Drawing on information from a customer's Amazon account, it can proactively notify users when concert tickets become available, or help with complex tasks like booking dinner reservations.

[00:29:32] Mike Kaput: There's also a standout feature in the form of Alexa's new visual understanding capabilities through a device's camera. It can process a video feed and respond to questions about what it sees. Now beyond basic assistant. It also has some powerful productivity features. Users can upload files, docs, and emails that Alexa will parse and reference in future conversations.

[00:29:55] Mike Kaput: Now, the integration with Amazon's broader ecosystem appears to be a pretty [00:30:00] big advantage here. Alexa Plus works with the Echo Show, smart Displays to power personalized content feeds and provides a new for you panel with timely updates based on user interests. It can control smart home devices, play music from Amazon, music on connected speakers, and even direct fire TV devices to skip to particular scenes in movies or shows.

[00:30:23] Mike Kaput: Now, in one particularly impressive demo, Amazon showed how Alexa plus. Could summarize footage from Ring security cameras describing what's happening in a scene and pulling up specific moments. Apparently Alexa plus will cost 1999 per month, but will be available free for Amazon prod members. Now the rollouts starting in the coming weeks with an early access period, prioritizing owners of echo show devices followed by a wider release over the coming months.

[00:30:55] Mike Kaput: So Paul, Amazon touches on so many areas of people's [00:31:00] consumer and content consumption habits. Like how big a deal is this if it works as advertised. 

[00:31:06] Paul Roetzer: Are you an Alexa user? I do. I don't. We have one that we barely use. I'll be honest, I unplugged mine like seven years ago because my kids, when they were little, just kept like asking it like crazy questions and I was like, oh my gosh, this thing's gonna drive me nuts and I don't use it otherwise.

[00:31:20] Paul Roetzer: I have not personally used an Alexa device in, yeah, probably seven years. I don't, I don't even know. Um. So, yeah, I, you know, I think this is what Siri was supposed to be like, the, like, even what Google Assistant was supposed to be. So the vision here is, big, the, you know, if they execute, that's a really big deal.

[00:31:40] Paul Roetzer: One quick note that's really interesting is if you read the Post announcing this from Amazon, they mentioned nothing about Anthropic in it. And yet Anthropic, I'm sure with Amazon's permission, tweeted, Claude will help power Amazon's next generation AI assistant. Alexa, plus Amazon and Anthropic have [00:32:00] worked closely together over the past year, to help Amazon get the full benefits of Claude's capabilities.

[00:32:06] Paul Roetzer: So, as I alluded to earlier, maybe it isn't their own data and distribution that matters, it's their models living within places that do have data and distribution, which Amazon qualifies for as much as anyone. And as we've talked about on the show before, Amazon has invested 8 billion into anthropic to date.

[00:32:24] Paul Roetzer: That is no small. Amount, if they're gonna be valued at 61.5 billion, assuming Amazon, you know, carries forward their ownership and stake in it. I'm, I'm guessing Amazon probably owns somewhere in that 20% range of anthropic. That's again, big deal. You know, so again, if you're looking for suitors to potentially acquire anthropic, Amazon Sure.

[00:32:45] Paul Roetzer: fits that bill. So, couple notes here. the article where Amazon announced all this, they say they have 600 million Alexa devices. So we talk about distribution, you know, if you put that in. Now, how many of them are unplugged like [00:33:00] mine? I don't know. but let's assume some fair percentage of those 600 million are, are actually in use in people's homes.

[00:33:08] Paul Roetzer: I think the, one of the things that chipped out to me is they refer to it as her, and she, like, they, they're very much personifying this, this, these technologies, I guess. So they say she is more conversational, smarter, more personalized. I'm gonna highlight a couple of excerpts from the Post from Amazon about this.

[00:33:28] Paul Roetzer: 'cause I think there's some very fundamental things here that are part of the bigger story. So, the first, and this is my own way of describing this, is this is the everything AI assistant, as you kind of alluded to some of these, she now quote from them, she keeps you entertained, helps you learn, keeps you organized, summarizes complex topics and can converse about virtually anything.

[00:33:49] Paul Roetzer: Alexa Plus can manage and protect your home, make reservations, help you keep track, discover and enjoy new artists. She can also help you search, find or buy virtually any item online [00:34:00] and make useful suggestions based on your interest. Alexa Plus does all this and more. All you have to do is ask. So I think it's interesting because when they came out with what are they skills?

[00:34:10] Paul Roetzer: Is that what they Alexa? Yeah. Yeah. And the thing I always like, struggle with Alexa is like, there was like 10,000 skills and I don't know what any of 'em are. I just know can do weather and sports scores and like beyond that, like, I don't know. So they're kind of, it's like this new age of the everything, AI assistant.

[00:34:25] Paul Roetzer: But now just through conversation, like you don't have to know the skills. You just have to talk to it and assume it can help you with anything. the next is this whole emotional intelligence thing. So they, and again, they're, they didn't call out. And here this is me looking at it saying like, okay, here we are.

[00:34:41] Paul Roetzer: We're carrying on this emotional intelligence play. So what they said, quote is conversations with Alexa plus feel expansive and natural. Whether you're speaking in half form thoughts using colloquial expressions or exploring complex ideas. Alexa Plus understands what you mean and responds like a [00:35:00] trusted assistant.

[00:35:01] Paul Roetzer: It feels less like interacting with technology and more like engaging with a thoughtful or insightful friend. So again, we're gonna start to feel this emotional intelligence coming through in all of our devices, all of our software. Um. Then they get into agents. So, you know, can't talk about anything with AI without getting into the agentic side.

[00:35:20] Paul Roetzer: So they say at the foundation of Alexa's state-of-the-art architecture are powerful language models available on Amazon Bedrock, which is kind of where you go and get access to all their models. But that's just the start. Alexa Plus is designed to take action and is able to orchestrate across tens of thousands of services and devices, which to our knowledge has never been done at this scale.

[00:35:41] Paul Roetzer: Again, this is quoting them. To achieve this, we created a concept called experts, groups of systems capabilities, APIs, and instructions that accomplish specific tasks, types of tasks for customers. They also go on to say, Alexa plus introduces a agentic capabilities, [00:36:00] which will enable Alexa to navigate the internet in a self-directed way to complete tasks on your behalf behind the scenes.

[00:36:07] Paul Roetzer: Let's say you need to get your oven fixed. Alexa Plus will be able to navigate the web. This is a big deal. Navigate the web, use Thumbtack to discover the relevant service provider. Authenticate, arrange the repair and come back to tell you it's done. There's no need to supervisor intervene. I didn't watch the announcement thing, but like this seems like it's being underplayed if this is actually going to work like this.

[00:36:34] Paul Roetzer: Oh, right. this is the big deal stuff. another one, memory and personalization. And again, this now gets into the will you give up the data you need to get the benefit is the question here. The new Alexa is highly personalized and gives you opportunities to personalize further. She knows what you've bought, what you've listened to, the videos you've watched, the address you ship things to, and how you like to pay.

[00:36:59] Paul Roetzer: You [00:37:00] can also ask her to remember things that will make the experience more useful to you, to you. You can tell her things like family recipes, important dates, facts, dietary preferences and more. And she can apply that knowledge to take useful action. For example, if you are planning a dinner for your family, Alexa Plus can remember that you love pizza, your daughter is a vegetarian, and your partner is gluten-free to suggest a recipe or restaurant.

[00:37:25] Paul Roetzer: So Mike, I'll stop there for a second because I wanna like explore this. The amount of personal data. So we are all gonna have access to the same device. If you're prime member, you're gonna get it baked in for 1999 a month. Imagine all of these capabilities at your fingertips in in any device. and they're also gonna have a, a standalone Alexa Plus app that'll function just like a ChatGPT app.

[00:37:51] Paul Roetzer: they're gonna have a new alexa.com website where you can interact just like you would interact on ChatGPT dot com. How much knowledge, like how much are you [00:38:00] giving up, how much are you like guiding your family members to give up when you're, you know, someone's mom starts talking, oh, I heard Alexa Plus can do this.

[00:38:08] Paul Roetzer: I'm gonna start giving all of our family history so I can, are you there? Like I don't know if I'm there are what? 

[00:38:14] Mike Kaput: that, you know, suspect I will be there through a slippery slope. Right. Right. Because I feel like if this works as advertised, I'm not going to be rushing to give every single thing, but I will give enough that I bet you there'll be 2, 3, 4, like killer use cases, right?

[00:38:33] Mike Kaput: Yeah. This, I can't live without this. And then from there I suspect you turn a little bit 

[00:38:37] Paul Roetzer: on a time. Yeah. My sh will be in the dark. This. Alright. I'll let it have access to all my photos. 

[00:38:44] Mike Kaput: Yep, yep, yep. Yeah, it'll be interesting to see the extent though. That's kinda what I was thinking here because this feels like a slam dunk to me.

[00:38:51] Mike Kaput: For my commercial life, but I'm not sure it's gonna make sense for me to be using Alexa as the overall assistant for, [00:39:00] you know, docs, photos, stuff like that. I feel like I'm already using other tools to try to process those things, but the same concern remains, so I'm not sure. 

[00:39:09] Paul Roetzer: Yeah. And if anything, this, this illuminates to me the opportunity Google has, because for me, all of this already lives in Google.

[00:39:16] Paul Roetzer: They've got my calendar, they've got my email. I don't have my photos in Google. I'm still an Apple person when it comes to like photos and stuff, but you could imagine all the things Google has access to. Like, I'm not moving that stuff to Amazon for these experiences, I guess is what I'm saying.

[00:39:34] Paul Roetzer: Right. 

[00:39:34] Mike Kaput: That's what I'm getting at too. Yeah. 

[00:39:36] Paul Roetzer: And so like, and openly, I can't touch this stuff. Like, this is the thing, like anthropics not gonna build this, this is data and distribution. These are two things we keep hammering back to people. They have the data about your personal life and Amazon owns, whole Foods.

[00:39:50] Paul Roetzer: Like they, they have offline data too. Like, so you have data from all these sources. Apple's got it too. anthropics not gonna get it. Opening [00:40:00] eyes' not gonna get it like that. I think that that's done like the race for ownership of our data lives within three or four major companies, basically.

[00:40:08] Paul Roetzer: . And whether they build their own models and enable it or they use somebody else's models. 

[00:40:13] Apple Siri in 2027 

[00:40:13] Paul Roetzer: And that, that kind of leads Mike to, to my last note here and we'll timestamp this conversation separately for people. it's sort of like one in the same, but it continues on. There was a big article about Apple and Siri last week, and this is from Bloomberg Markman, who's like an insider when it comes to, you know, apple information and news that we follow closely.

[00:40:36] Paul Roetzer: So he writes Amazon's Alexa Plus, which was announced this past week, is essentially a version of chat t's voice mode with knowledge of who you are, who the people in your life are, your interests, and the context of your home and surrounding environment. he goes on to talk about there being one area where Apple has an edge.

[00:40:55] Paul Roetzer: Amazon lacks an ecosystem of outside the home products and a native app ecosystem [00:41:00] that can make Alexa plus more powerful. It has smart speakers and other gadgets, but nothing like apple's, billions of well integrated mobile devices, but that only makes the Apple intelligence situation even more disappointing.

[00:41:12] Paul Roetzer: And Mike, was it last week? I said like, apple intelligence still sucks. Like I, yep. I'm not being overly harsh on Apple, like they know it's bad. And so, I'll continue on because I think this is like really important context to what's going on and like who the winners might end up being. He said, so Mark continues.

[00:41:27] Paul Roetzer: Apple could have melded advanced AI with its ecosystem to create something powerful and magical. He then said the next version of Siri will be a test of whether Apple can mount a comeback. The software will likely be released in May, a full 11 months after they introduced it. The current, version of 18 of Siri essentially has two brains.

[00:41:47] Paul Roetzer: One that operates traditional Siri commands. That's stuff like, what's the weather, what's the sports, whatever, and where's my stocks at? And the other that handles advanced queries, which if you've used it, is basically like, it's like talking to Siri always has been, [00:42:00] it usually doesn't have the answer.

[00:42:01] Paul Roetzer: If it requires anything to explain something to you, it now connects to ChatGPT to it. That's basically what Siri does now, if you have any. And he says, for iOS 19, Apple's plan is to merge those systems together. He expects they'll introduce this as part of their world, develop worldwide developer conference in June with a launch of spring of 26.

[00:42:23] Paul Roetzer: My God. So he's talking about like another full year before we start to get this like merge system. He said the new system dubbed LLM Siri internally was supposed to introduce more conversational approach in that same release, but that's now running behind and might not get till June of 26. 

[00:42:43] Paul Roetzer: So 

[00:42:44] Paul Roetzer: anything people with an Apples the eye division now believe that a true modernized conversational version of Siri won't reach consumers until iOS 20 at best in 2027 when philanthropic thinks we're gonna have a GI.[00:43:00] 

[00:43:00] Mike Kaput: Yeah, we're gonna get AGI before we get an upgraded SirIs what we're saying. 

[00:43:04] Paul Roetzer: Yes. So I was just like, when I saw this, I was literally dumbfounded. I know it's bad. We've commented how bad it is, but this is freaking apple. Like y you're five years after the introduction of Chad GPT, like in, in November of 22, like five years, it's gonna take us to get a workable version of Siri, which.

[00:43:26] Paul Roetzer: They bought Siri technology in 2011. Like they had a 10 year headstart on everybody when it came to this technology. And here they are, like just completely flailing. So the only thing Mike, I noted on this one, 'cause I, again, I'm gonna treat this more as like a rapid fire. the likelihood of them doing a massive deal with open air, Google is skyrocketing right now.

[00:43:52] Paul Roetzer: . Like they compete with Google no doubt. But they have done business with them on search. They've done business on maps. I, [00:44:00] man, I gotta think that the executives at Google and Apple are having deep conversations right now. I have zero knowledge of this, like no inside knowledge whatsoever. I just cannot fathom that Tim Cook is going to wait until 2027 when OpenAI Google and Anthropic all think a GI will be in the world to finally give us a working version of cert.

[00:44:19] Paul Roetzer: Like it just cannot possibly be true. 

[00:44:23] Mike Kaput: Yeah. Yeah. And it's. Interesting because I feel like we often see, given how fast AI moves and all the hype, you know, something new will come out and people be like, open AI's cooked or like ChatGPT d's dead. I think you have to really accept that Apple is in trouble here based on that.

[00:44:43] Mike Kaput: Yeah. I think at this stage, 

[00:44:44] Paul Roetzer: yeah, they are just, again, like I said this on last show or two shows ago, their stock's up 35% like this year. Like Apple as a company is fine. It just seems like they are incapable of [00:45:00] doing their own AI models to the extent they need to, to fix what should be their primary function, which is sur and voice.

[00:45:07] Paul Roetzer: That is like everything is going to personal AI assistance with deep knowledge of your actions, your interests, like Apple has all of that. Billions of devices and they can't figure this out. You have to go do a deal like tomorrow with one of these companies that has, it's not gonna be anthropic, they're not gonna do that.

[00:45:27] Paul Roetzer: Deal with Amazon being already deep with Anthropic. So you've got, the beginnings of a deal with open ai. I don't know, like, I don't know. That's it. Like I, Google is the most logical play here. It's not gonna be Grok three. Like they're not doing a deal with Elon. I can't see Tim and Elon like partnering up on this.

[00:45:46] Paul Roetzer: Yeah. But man, and especially as like someone who's been an Apple investor for my entire adult life, like I would be very, very happy if they made a deal with Google on models and just baked Gemini right into [00:46:00] this thing. 

[00:46:02] ChatGPT Deep Research Now Available to All Paying Users, & Voice Mode for All

[00:46:02] Mike Kaput: Let's dive into some more rapid fire for this week. So another big piece of news we're tracking is open AI has begun rolling out deep research to all ChatGPT PT plus team education and enterprise users.

[00:46:16] Mike Kaput: If you recall, deep research is this ag agentic research assistant that can think for extended periods of time. It can go research things for up to 30 minutes and use the web to gather information about topics and actually end up doing pro-level research for you online, completely autonomously. And it delivers this incredible final result in the form of a comprehensive research brief that often totals dozens of pages.

[00:46:43] Mike Kaput: So since this became available to pro users last month, it's definitely been wowing a lot of knowledge workers for its ability to do in minutes a level of high quality, in-depth research that used to take hours or even days. I think you can consider us as [00:47:00] some of those people wowed by this because we're using it quite often.

[00:47:03] Mike Kaput: It's actually interesting in a publication called Understanding AI on Substack tech journalist Timothy Lee conducted, an evaluation of deep research by basically showing it to 19 different experts across different professional fields. Seven out of the 19 said that the responses already were at or near the level of an experienced professional in their fields, and a majority estimated it would take at least 10 hours of human labor to produce comparable reports.

[00:47:33] Mike Kaput: What's more in a head-to-head comparison with Google's deep research, same name, similar product, which was released in December 16 out of 19 of these people were preferred open AI's responses. Now, I will say as you dive in and hopefully get excited about using open AI's deep research in your chat, GPT account.

[00:47:55] Mike Kaput: You only get 10 queries per month to start if you do not have a pro account. [00:48:00] One other note, there are some additional, good news coming, good news items coming out for chat. GPT users. OpenAI is also rolling out Advanced Voice mode powered by GPT-4 oh many to all free chat GPT users, so you can actually start trying that out for yourself as well.

[00:48:19] Mike Kaput: Paul, I'm kind of curious to see how much this becomes kind of a mind blowing moment for knowledge workers. I feel like anytime I've actually seriously showed this to someone that doesn't know what's possible, they are pretty impressed. There's obviously lots of issues, you still have to check everything, but it's been literally a month and I feel like this is just incredible.

[00:48:38] Mike Kaput: We even have this capability. 

[00:48:40] Paul Roetzer: Yeah, and if you wanna know where Open Eyes GPUs are going, that they don't have the ability to rock 4.5, this is where they're going. Like deep research is insane. giving voice mode to like all users, like, yeah. That burns up your GPUs. I agree. Like if any, anybody who's been listening to the show the last month, you know, how [00:49:00] we feel about these deep research products.

[00:49:01] Paul Roetzer: They are transformational. Like, I'm not over exaggerating here. Like it, I use it once or twice a day myself. I happily pay the 200 a month to have access to this technology, like unlimited. I have personally demoed this twice in real time in the last two weeks. One in a crisis communications instance where, something was occurring that I used to do as a job.

[00:49:27] Paul Roetzer: Like when we owned, you know, when I had my agency, you and I, Mike did crisis communications work. right. And I happened to be there for a different reason and something came up. And where a PR firm needed to get involved. And I'm just sitting there like in the meeting and I'm like, boom, boom, boom, boom, boom.

[00:49:43] Paul Roetzer: Like, I'm building a crisis communications plan, doing all my research in deep research as I'm sitting there, you know, and in seven minutes had a plan that I could have thrown a team of five PR people at seven years ago, and it would've taken them all day to come back to me with the plan. Yep. So [00:50:00] I did it there, and then I was actually on a call last week with somebody who was looking at it more like on, like fundraising, you know, just looking for some guidance on AI and fundraising.

[00:50:07] Paul Roetzer: Like, well, let me show you something real quick. And went in and built in deep research, AI for fundraising and, you know, advancement and stuff like that. And just real time as I'm talking, I'm like, here, let me build this while we're talking. Build the plan, send it to the person. So it's like, it is, it is truly like if you don't know what this technology's capable of, it can change the way you do if you, if you're often doing like, research and strategic planning.

[00:50:34] Paul Roetzer: it's, yeah, invaluable in my opinion. I know you use it all the time too, Mike. Like it's, 

[00:50:39] Mike Kaput: yeah. Dan, just as another plug at Writer Summit, my talk is on how to use tools like this and how to get ahead of this disruption. So it's really, really gonna be great to accelerate your own learning if you're a little behind on this.

[00:50:55] Paul Roetzer: And Mike, this is one of the ones where I sit there and think, how is this only the first version? [00:51:00] 

[00:51:00] Mike Kaput: Like, oh my God, I know. Yeah. Like 

[00:51:01] Paul Roetzer: it the way it is now. Like if it never got better, it would change the way knowledge work is done. Yeah. And we know that this was just like thrown together. First version, you know, Google had a more powerful version than what we're using with OpenAI and.

[00:51:14] Paul Roetzer: Google has the ability to make stuff really cheap or free if they want to whenever they want to, which is always, you know, one of the things that they've got over everybody else is the ability to probably make this stuff cheaper than everybody else. So, yeah. Yeah, 

[00:51:27] Mike Kaput: yeah. 

[00:51:27] Paul Roetzer: Yeah. I think 

[00:51:28] Mike Kaput: there's one takeaway here.

[00:51:29] Mike Kaput: If you're not a pro subscriber, like this should be number one on your list this week if you're a plus, and test it if you have use cases. 

[00:51:36] Paul Roetzer: So if you, if you, yeah, if you look at this and think, ah, these guys are hyping this, they can't possibly be this good. Pay the 200 bucks once, like, do it for, but you have to have a use case.

[00:51:45] Paul Roetzer: Don't go spend a 200 bucks if you're just gonna sit there, not use it all month. Yeah. But like, go to if, for the writer summit, there's a free registration. So here's, here's my plug. There's a free registration if, thanks to Goldcast. so you can [00:52:00] join for free. Just watch Mike's session and then that'll make the case for you.

[00:52:07] Paul Roetzer: . On to test deep research. Test it once for one month. Don't commit to the ongoing 200. And like I promise you, if you have used like, you will find insane opportunities for this and knowledge work, like it's, it's crazy. 

[00:52:22] Agency > Intelligence

[00:52:22] Mike Kaput: Next up, former Tesla AI director and open AI founding member Andrej Karpathy we talked about in a previous segment, has sparked a pretty important conversation about what really matters in our AI driven future.

[00:52:35] Mike Kaput: In a recent post on X, he made a surprising claim by saying that agency is significantly more powerful and scarce now than intelligence. And he actually wrote, I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment slash media obsession with iq, et cetera.

[00:52:57] Mike Kaput: He basically says that we all assume that [00:53:00] raw intelligence is the ultimate asset. In the age of ai that is starting to change. Now. He defines this idea of agency as the separate kind of attribute from intelligence. It's quote, an individual's capacity to take initiative, make decisions, and exert control over their actions and environment.

[00:53:19] Mike Kaput: This is about being proactive rather than reactive. People with high agency don't just let life happen to them. They actively shape it. They combine self-efficacy, determination, and ownership over their path. Now, the idea here is that with ai, everyone's going to get more of this type of agency by default.

[00:53:38] Mike Kaput: But also as AI handles increasingly complex cognitive tasks. Intelligence becomes a commodity. It's basically on tap. So really the only true differentiator he would argue, becomes agency. As a result, we need to be prioritizing agency in everything we do. And he poses several provocative [00:54:00] questions along these lines saying, are we hiring for agency?

[00:54:04] Mike Kaput: Are we educating for agency? Are you acting as if you had 10 x agency? So Paul, I, when I read this, I thought this concept is just something really important for knowledge workers right now to take to heart. It feels, and I still have to explore it a bit more, but it feels to me like at least one directionally correct way to really give yourself the best chance of like, for lack of a better phrase, becoming like AI proof, right?

[00:54:31] Mike Kaput: And just building an incredible competitive advantage by optimizing for exhibiting as much agency as possible. Like what did you think? 

[00:54:40] Paul Roetzer: Yeah, I loved this tweet. I, you know, when I flagged it last week, I was like, man, we should probably talk about this. And I feel like I could probably spend a full main topic on this one.

[00:54:50] Paul Roetzer: Yeah. So I'll try and be like, concise here. so I have seen this play out time and time again throughout my career. Many of the best producers I've [00:55:00] hired, many of the best leaders I've seen, certainly, most of the best entrepreneurs that I know, have all been like average students. Like they, they didn't come from the top Ivy League schools.

[00:55:13] Paul Roetzer: They were just insanely resourceful and resilient. And they didn't fear failure, like they just found ways through things. they viewed failure as like part of the journey. So one of the books this made me think about, I read very early in my career before I even started my own agency, was called Will and Vision.

[00:55:30] Paul Roetzer: And in that book, the authors tell us, in Golder, define Will as a, now they talk in company terms where you can apply the same thing in, in an individual. Level is an unwavering determination and commitment to achieve a specific vision, de demonstrating a strong resolve to overcome obstacles and execute a strategy even in the face of challenges.

[00:55:51] Paul Roetzer: Essentially representing a driving force behind a company's ability to achieve leadership, market leadership despite being a late comer to the market. So theirs is all about like what [00:56:00] makes market leaders. this touches on education. So I get asked all the time, like, what should my kids major? And I think about it myself with my kids.

[00:56:08] Paul Roetzer: And the thing I'm, fairly confident in, at least in my own belief system, is that liberal, liberal arts degrees matter greatly. Yeah. I don't know if you should go into computer science on its own. I don't, I don't know if programming is a thing 10 years from now, but I know it's part of like the thing.

[00:56:28] Paul Roetzer: so I think going to a university still matters. I do think the life experience of a college experience is relevant. I don't think it's essential. I don't think you have to have it in the future. But I think it matters. And if you're going to do it, I think liberal liberal arts is a really good choice because I feel like the best talent moving forward, the people with the most agency are gonna have elements of philosophy, psychology, sociology, history, science, business, fine arts, political science, [00:57:00] computer side.

[00:57:00] Paul Roetzer: Like all of that helps, a diversity of experience and perspective. And so like, when I think about even our own hiring plans, I don't care where people went to college, I actually don't even care what their GPA was. I can't, the only time I remember looking at GPAs in the early days of building my agency was my, my actual marketing agency, not d different kind of agency, is when I would see someone with like a 4.3.

[00:57:27] Paul Roetzer: I would, this is one of my favorite questions I ask, I would say, what are you going to do when you're not the smartest person in the room anymore? . Because so often what happens with people who are just brilliant. They've never struggled. They've, they've never had to like really know what it's like to fail in, in class and to like be in a room where you don't feel like the smartest person in the room.

[00:57:50] Paul Roetzer: And when you get into the real world and real life experience starts, starts to matter, that GPA of 3.3 means nothing. Like you're now, [00:58:00] you're dealing with ramifications of decisions and unknowns, like in the future. You can't go study in a book. And so, like, the way I always think about it is like IQ matters to a degree.

[00:58:09] Paul Roetzer: Like you have to be able to understand complex topics. You have to be able to learn things, you have to be able to, you know, take tests well, basically in real life. But what I'm more interested in is the emotional intelligence that we talked about. Are you a problem solver? Are you a hard worker? Are you confident but humble about your confidence?

[00:58:27] Paul Roetzer: Are you resourceful and resilient? Are you curious? Are you a fast learner? Do you have an insatiable desire to keep learning? Like, that was always one of the things I was looking for is like, do you read books outside of work? Like, are you doing things I'm not asking of you? . To become better at what you're doing?

[00:58:41] Paul Roetzer: Which is gets down to intrinsic motivation. Are you proactive? Are you persistent? Are you passionate? Do you understand people? Do you understand machines? Like I think that all of this matters, and I think it fits into this agency umbrella that, you know, he talks about. It's this idea that, [00:59:00] you can achieve anything.

[00:59:02] Paul Roetzer: And if I can give any advice to, to parents, to, to, to employers, like instill a belief that anything is possible, that the only limitation is what you put on yourself. Because it doesn't matter what school you came from or what your GPA was, once you get into the real world, all that matters is that you work hard and separate yourself and you create value consistently.

[00:59:22] Paul Roetzer: Um. And honestly, it's actually kind of easy to do. Like we were having this conversation with some family friends a couple weeks ago, and I was saying this like, how, how easy is it to like stand out when you get into the professional world? Like you don't have to be the smartest anymore, you just gotta be all these other things like that.

[00:59:40] Paul Roetzer: You differentiate yourself fast in the real world when you can do those things. So yeah, I'm a hundred percent on this topic. I, like I said, I could talk for like 30 minutes about this topic. I think it's very, very important though, and I think it's actually fits in the bigger theme of like what matters in the future when we do have GPT five and GPT [01:00:00] six and yeah, they have reasoning and they have emotional intelligence.

[01:00:03] Paul Roetzer: Like what else actually is left? I think this is the answer. Like, I think these are the things that remain fundamental. and we'll figure the rest out. Like if you have all these basic traits and skills and, emotional abilities, like you'll solve the rest of it. But if you don't and then you're just book smart, not, not gonna go well.

[01:00:22] Meta Plans to Release Standalone Meta AI App

[01:00:22] Mike Kaput: Hmm. In some other news meta plans to release a standalone meta AI app in the coming months. According to some reporting from CNBC, they intend to launch a dedicated app during the second quarter of this year. This would elevate meta AI from being just a feature embedded in Facebook, Instagram, and WhatsApp to becoming a company one of the company's major applications.

[01:00:48] Mike Kaput: So Meta aIf we recall, first launched in September, 2023 as a generative AI powered digital assistant that can provide conversational responses and [01:01:00] do things like create images. In April, the company took the step of replacing the search feature across Facebook, Instagram, WhatsApp, and Messenger with the chat bot, bringing it to the forefront of the user experiences on these platforms.

[01:01:16] Mike Kaput: Now according to Meta's Finance, chief Meta AI currently has approximately 700 million monthly active users. Blood analysts are finding it difficult to directly compare these figures with competitors like Chat GBT. Since meta AI is not currently its own individual app, there are some estimates that suggest meta AI standalone website generates less than 10 million views per month, which is way below ChatGPT and Gemini.

[01:01:46] Mike Kaput: So now, interestingly, shortly after CNBC published this report, Sam Altman responded with a, you know, probably a bit trolling post on X stating, okay, fine. Maybe we'll do a social app. Now, [01:02:00] Paul, we can't count out Zuckerberg. We definitely can't ignore his distribution. I guess my question is like. Do you even really use meta ai?

[01:02:07] Mike Kaput: Like, I don't use it at all. This seems just like a panic response to grok. Like, I'm just deeply skeptical. I would somehow start using this, but I could be wrong. 

[01:02:17] Paul Roetzer: Yeah, I don't see it. I think like any adoption numbers we've gotten are just like fake adoption numbers. Like they're, yeah. Yeah. Just 'cause I went into meta AI in Facebook to try and find someone.

[01:02:28] Paul Roetzer: Does that mean I used meta ai? Like I was just searching for someone's name? Like, so because like how everyone 

[01:02:34] Mike Kaput: uses threads apparently, right? Like, because they've infused this though into your experience, you have 

[01:02:38] Paul Roetzer: no way to not use it, like baked right into it. and I think that their strategy, you know, Zuckerberg's been very straightforward on like, he loves AI content and like people engage with it.

[01:02:51] Paul Roetzer: And so we're gonna like, let it, let it go and we're gonna like flood our social platforms with all this AI generated video and image and text and I. It's gonna be [01:03:00] great. And it's like, no, it's not. I'm sorry. Like that isn't gonna end well. People are gonna get tired of all this Slop, I think is, you know what we call it, AI slop.

[01:03:08] Paul Roetzer: And so I don't know, like when you look forward to the future of like, what is Facebook and what is Instagram when AI has just taken over, I don't think it's good. And I do think the deep seek moment really hurt the ego. Like the fact that they were, they were . Positioning themselves as the open source play and Llama was gonna dominate open source and then deep seek shows up and just steals all their thunder and gets to the top of the charts ahead of Instagram, ahead of Facebook and yeah.

[01:03:34] Paul Roetzer: I I gotta imagine there's some competitive fire burning within Zuckerberg at the moment. I don't know that they can find their way into like the top five. I mean at this point, Claude's got a better chance I think, of getting adoption than a standalone meta app. But I don't know, may, maybe they can change some perceptions about what they're for or their history's just buy it.

[01:03:55] Paul Roetzer: Like, may I, maybe they show up and buy Andout. I didn't think about that one. That would be [01:04:00] interesting. Yeah. Zuckerberg had the history of like can't, they don't, they don't innovate very well within Facebook meta, they just buy everything. Uh huh I didn't know about that. God, that would be a weird mix of cultures and drop don't culture.

[01:04:15] Paul Roetzer: Yeah. 

[01:04:15] Mike Kaput: The safety and alignment might not aligned. 

[01:04:18] Paul Roetzer: They're gonna need to buy somebody. I don't know. I think that would be the question is like, who do, who does meta buy to actually be relevant in this space? Because I don't think they're gonna do it on their own. 

[01:04:28] Mike Kaput: I dunno how many options they have. Yeah. I gotta think about that one now.

[01:04:33] Robots in the Home and Workplace

[01:04:33] Mike Kaput: All right, next up. Next up. Robotics startup figure we've talked about quite often is making waves with a couple significant announcements. So they have a breakthrough improvement to their AI system for package handling. And even more surprisingly, they plan to begin testing humanoid robots in homes much sooner than expected.

[01:04:54] Mike Kaput: CEO Brett Adcock is quite active on X revealed. That figure will start [01:05:00] alpha testing its figure O2 humanoid robot in home settings later this year, which is a timeline that's been accelerated by approximately two years. This kind of unexpected shift is attributed to rapid advancements in the company's recently announced Helix AI system.

[01:05:18] Mike Kaput: Their internally designed vision language action models that unifies perception, language, understanding, and learn control. We talked about this a bit last episode, so this is advancing faster than any of us anticipated said Adcox, which is accelerating our timeline into the home. Now, they had previously been focused mostly on industrial applications for the current moment.

[01:05:40] Mike Kaput: Just last year, they began piloting robots at A BMW manufacturing plant in South South Carolina. They've been simultaneously refining their technology both with robots in the home, but also staying focused on their commercial side, specifically commercial logistics. They also outlined this [01:06:00] week significant improvements in Helix low level control system known as System one, which handles Visio motor control and essentially governs how the robot sees and moves.

[01:06:11] Mike Kaput: So in logistics, testing, improvements to Helix have translated into really impressive results figures. Robots can now handle packages at speeds exceeding those of the human demonstrators they learn from. Paul, this timeline seems quite, quite ambitious. It sounds like they have made some type of breakthrough with Helix, but it seems real fast.

[01:06:35] Mike Kaput: I mean, it's clear we're making progress, but do you really expect to see humanoid robots in homes beginning this year? 

[01:06:41] Paul Roetzer: No, I. Well, I mean, maybe really, really rich people's homes. Like, I, like I could see Kim Kardashian pulling up in her Yeah. Tesla cyber truck with her, you know, robot in the front seat and like, you know, doing some nice social posts and stuff.

[01:06:54] Paul Roetzer: But no, I don't think this is reality. I figures seems like a super innovative company [01:07:00] making a ton of progress. They have a history of really, really impressive demonstrations and really impressive sounding tweets that don't actually like, change anything right away. Hmm. so I do not believe that anyone needs to think they're gonna go over a friend's house this holiday season and like run into the robot.

[01:07:21] Paul Roetzer: I do think there's real advancements being made on the hardware and the software side of robotics. I do believe it is a mass minute, massive investment opportunity. I think it's going to be enormous. if you have the stomach for it, like to wait out a Tesla stock. you know, I think Optimus is gonna be like the future of that company.

[01:07:45] Paul Roetzer: I just personally think we're three to five years away from any real, like, economic impact of these things, or like any real distribution of them in the consumer side. I just don't see it. And that's [01:08:00] just having studied the space for the last five years and watched closely. I just haven't seen anything that's like the ChatGPT moment of these things.

[01:08:09] Paul Roetzer: And again, ChatGPT took a couple years, even once it, you know, really into the world to start getting adoption. We still don't have, you know, widespread enterprise adoption. I think you'll have something like this, like two, three years out. You'll have that chat should be to moment for robotics on the consumer side.

[01:08:24] Paul Roetzer: And then like three to five years later, you may actually see like this adoption. So you could easily be in the end of this decade, early next decade before it starts to really be widespread. 

[01:08:35] Lmarena.ai Prompt-to-Leaderboard

[01:08:35] Mike Kaput: Next up, there's a new tool out there that might be helpful to you if you're trying to evaluate how to use AI for your own use cases.

[01:08:43] Mike Kaput: So this is called Prompt to Leaderboard or P two L, and it's a new thing that has been created and added to LM arena.ai, which is the website that runs the chatbot arena we've talked about quite a bit. That's a chatbot leaderboard that tries to rank how good each [01:09:00] model is across the board. But now what P two L does is they've actually trained a language model to predict which AI will perform best for any specific prompt.

[01:09:10] Mike Kaput: So what that means is it can basically help identify model strengths and weaknesses across domains. And what that means for you is if you go to el marina.ai, you can click on prompt to leaderboard and literally type in any prompt, anything you can think of. P two L will then generate a leaderboard for you that tells you which model, based on all the things it knows it's good and bad at is best for that prompt.

[01:09:37] Mike Kaput: It will rank them for you based on anything like they even offer some just very niche examples like which model is really good at translating Harry Potter into Chinese and they extrapolate from there and tell you, here's a ranking of who does it best. So Paul, this seems like exactly the type of thing we've been needing in some of our previous discussions.

[01:09:59] Mike Kaput: Like a way to [01:10:00] actually evaluate AI models on the many, many valuable tasks that fall outside these traditional benchmarks and evaluations. Like, I don't know about you, but I like looking at the benchmarks, but I'm not like doing a lot of coding or science or math. So even just being able to type into these, like writing a blog post about X is pretty helpful if we can trust it, obviously.

[01:10:24] Mike Kaput: What do you think? 

[01:10:25] Paul Roetzer: Yeah, I like the direction, the idea of it. I just tested it real quick on two quick ones. I did write a research report and transcribe an audio file, and it's like two, three things that popped in my head. I, yeah, I think that's the catch is like, is this really reliable? So like write a research report, oh oh one mini is number one, and then Quinn 2.5 plus, and then an early, I have no idea if that's right or not.

[01:10:49] Paul Roetzer: and like how many examples, what's the sample size here? Like how, how many people actually like tested this? So yeah, I don't know. I think this is probably more of like interesting [01:11:00] directional, we're not telling people go play with this. And it's like, you know, fact and just use it for everything, every use case.

[01:11:06] Paul Roetzer: But I think seeing more things like this is probably a really good sign and I think we'll continue. Like we heard that with the anthropic release that they were trying to do more business use cases. Right? I think that the labs get it. That the average users to drive adoption aren't gonna care about all their evaluations that are, you know, is this PhD level and above.

[01:11:24] Paul Roetzer: Yeah. It's like I just need something that does what I need it to do. 

[01:11:27] Mike Kaput: I will say that you are struggling historically with a use case. Like you're like, wow, I've tried a bunch of models for this thing. This could be really helpful to get some ideas to kind of try to crack that, but yeah, totally like kind of buyer beware here of the data you're getting.

[01:11:43] David Perell on How Writing Is Changing Thanks to AI

[01:11:43] Mike Kaput: Alright, so next up, a former writing coach named David Perel, who's pretty popular online, very popular internet writer. I've followed his work for a few years. He is sparked a really interesting conversation about the future of nonfiction writing in an era of [01:12:00] ai. So in a candid social media post, he actually said he made the decision to shut down his writing education business after six years.

[01:12:10] Mike Kaput: Concluding that the skills he's been teaching are rapidly becoming obsolete in the face of advanced language models. He writes, quote, it has only been four months since I shut down my business, but I can no longer imagine teaching writing in a way that resembles anything close to the way I taught in the past.

[01:12:27] Mike Kaput: The reason is simple. The world of nonfiction writing has fundamentally changed, and many of the skills I've developed and built my career on are becoming increasingly irrelevant. He gives a pretty blunt assessment here. He says, if you do a great job prompting things like open AI's, deep research, you can now produce content superior to what he could create in a full day's work on most topics.

[01:12:50] Mike Kaput: Now, what's interesting is he doesn't say that nonfiction writing is dead, but he does conclude that you have to start thinking along these lines [01:13:00] that the more a piece of writing draws from personal experience, the less likely it is to be overtaken by ai, personal narratives, memoirs, and biographies.

[01:13:10] Mike Kaput: Contain data that language models don't have access to the lived experiences of individuals. He also says that writing that prevent presents truly unique perspectives. what Peter Thiel might call important truths. Few people agree with you on maintains its value. He basically says, the more humanity, the more personality you can put into this stuff, the better chance you're going to have of actually standing out.

[01:13:35] Mike Kaput: Now, he says, for aspiring writers, his message is mixed. The bar is undeniably, undeniably been raised, and writers are just competing with AI at this point. But at the same time, these tools can be really powerful. AIDS operate instant feedback and helping to refine ideas. Now Paul, this is exactly the reason why we've hosted an AI for Writer Summit every year [01:14:00] and why we're doing so again this week.

[01:14:01] Mike Kaput: Like AI is changing what we do as writers, but I don't think enough people are coming together to really explore what that means. That's why I really liked. David Per, 

[01:14:10] Paul Roetzer: yeah, there was a, there was one part in particular that I thought really was, you know, resonated. He said, there's things to love, there's things to hate, there are things to be excited about and things to be dejected about.

[01:14:22] Paul Roetzer: I'm neither hopeless nor Pollyannish, but my job as your teacher is to point you towards the truth of what's happening so you can see it clearly and make a game plan no matter how un uncomfortable it makes you feel. Like, this is what I talk about all the time when I give talks. Like, I can't make this better for people.

[01:14:40] Paul Roetzer: Like, I can't say that the AI's not gonna be able to do your job on par with you or better than you like that. That's impossible to say at this point. And I think more and more people are gonna have these moments where, like we talked about with the screenwriter, you know, a few episodes ago of Taxi Cab, and I forget the guy's name right now, [01:15:00] but that moment where you're like, oh man, it's, it's better than me at the thing I do.

[01:15:05] Paul Roetzer: Yep. And. So, you know, what does that mean for writers and creators? Like, I don't know, I got like 48 hours to figure it out because that's my opening keynote, like the state of AI for writers and creators. And every year I try and synthesize like, where are we and what does it mean? I would put myself in that category.

[01:15:22] Paul Roetzer: I'm a writer, I'm a creator by trade. And I sometimes sit back and wonder like, you know, what is unique? What matters still? And the things I'm very bullish on are unscripted conversations and presentations. Like, I want someone just up on stage just like giving me a presentation where I know it's just them and their perspective and their, you know, unique context and experiences that are what I'm hearing, not something chat GBT wrote for them.

[01:15:47] Paul Roetzer: Right. fireside chats, I love it for the same reason. I want an unplanned question being asked, and I wanna see that they have deep knowledge of their topic. It's not some foe stuff you just get from LinkedIn feeds where people are just [01:16:00] like, you know, it's not their original thoughts. Like it's maybe sounds like them, but maybe it's not them.

[01:16:04] Paul Roetzer: Like I don't. I don't want that. I don't want like AI avatar podcasts. Like I want real people talking to each other, live and in-person events where it's not fake, like personal stories. All of that to me, matters. 'cause moving forward, like what, what we've done is we've democratized the ability to create content and create stories.

[01:16:23] Paul Roetzer: But the way I think about is like, only the true artist experts and storytellers can really bring that stuff to life with their own experiences and knowledge and context and give those stories meaning, and make them matter to people. Because I just think it's gonna be so easy to get lost in this AI generated content world where you don't know what's real.

[01:16:46] Paul Roetzer: So like, I think, mediums and events and places where, you know, it's just real and it's people, I think that stuff's gonna do very well moving forward. And I would think about that from a brand perspective and a storytelling perspective. Hmm. 

[01:17:01] AI’s Impact on the Future of HubSpot

[01:17:01] Mike Kaput: Next up some news about HubSpot. So HubSpot has unveiled some ambitious projections for its partner ecosystem, and they're forecasting a $30 billion market opportunity by 2028 with AI expected to drive a third of that growth.

[01:17:19] Mike Kaput: So this comes from a recent analyst brief by IDC. They wrote about highlighting how the conversions of AI and unified customer data is creating unprecedented opportunities for businesses building on HubSpot. So HubSpot's ecosystem has become increasingly central to their business model. 90% of their customers use at least one app from their marketplace.

[01:17:41] Mike Kaput: More than half are using five or more. The integration is profitable for this. Integration of these apps is profitable for partners as well. HubSpot Solution Partners projecting a median revenue increase of 44% from 2024 to 2025. Now, here's [01:18:00] where kind of AI fits in. They project that there's a $10.2 billion opportunity specifically tied here to AI's first solutions.

[01:18:09] Mike Kaput: They describe an emerging trend towards what they're calling Agentic solutions, which is a convergence of services and applications where partners can build AI agents or agent components that function within HubSpot's ecosystem. This could range from complete AI agents addressing common business needs to modular agent skills that can be combined for custom solutions.

[01:18:32] Mike Kaput: Now, at the core of all this, it's data integration. HubSpot emphasizes that AI is only as good as the data that trains it, and they position their unified data strategy as their competitive advantage. They point out that approximately 80% of customer data is unstructured, and that's information contained in emails, calls, support tickets, and other communications.

[01:18:57] Mike Kaput: Their strategy involves making this data [01:19:00] as actionable, as structured data, and they've acquired recently companies like Frame AI to accelerate this capability. Now, what is your read, Paul, on HubSpot's AI opportunity and the opportunity that they're kind of outlining here for the partner ecosystem?

[01:19:17] Mike Kaput: Obviously we've talked about a bunch of times for anyone that's newer to the podcast, you started HubSpot's first ever partner marketing agency. So you have potentially the top opinion here on what the past, present and future of HubSpot could look like here. 

[01:19:33] Paul Roetzer: Yeah, so I had started my agency back in 2005 and then became HubSpot's first partner in, in fall of 2007.

[01:19:41] Paul Roetzer: So, and then I sold the agency in 2021. So yeah, I spent like better part of 14 years running a HubSpot partner agency. And then I actually had the privilege of, being on their worldwide ecosystem kickoff event last week. Nicholas Holland, the head of ai there. For, an opening keynote for that event.[01:20:00] 

[01:20:00] Paul Roetzer: And so we gotta talk about like, the implications to agencies and, you know, I think it's big. Like the, I think the main reason I left with people on that interview was I think a lot of agencies are gonna go away. I think a lot of agencies are gonna be disrupted and struggle to evolve. And I think a bunch of other agents are gonna figure this stuff out and build amazing businesses.

[01:20:22] Paul Roetzer: They're gonna solve the pricing model issues, they're gonna solve new service mix. And when you look specifically at agents, like they need, the agents need humans to set the goals, plan and design them, connect them to data sources, integrate supporting applications, oversee their performance, manage the performance, iterate on 'em, like there's a, there's a huge role for humans in this AG agentic future.

[01:20:44] Paul Roetzer: And I think, you know, there's gonna be a lot of these solutions partners that not only solve the service side, but. Start building AI first companies and applications that can enhance what HubSpot does with their expansive reach and customer base. And so, yeah, I [01:21:00] think it's, exciting to be, you know, a partner, not just in the HubSpot ecosystem, Salesforce ecosystem, you know, all these other service provider systems.

[01:21:07] Paul Roetzer: if you get past the fear and anxiety and you actually are proactive about doing something about it. 

[01:21:15] Listener Questions

[01:21:15] Mike Kaput: Alright, so this week we're going to continue a new weekly segment. We've been running the past few weeks called Listener Questions. We get tons of questions each and every week about AI through a bunch of different channels.

[01:21:29] Mike Kaput: So we wanted to start answering some of those to the best of our ability on the pod. Also, if you have a question for us, just reach out to Paul or myself. Go to marketing ai institute.com, click contact us. Bunch of ways to get in touch. So this week's question, Paul, is how do you handle the known issues with AI hallucinations?

[01:21:51] Mike Kaput: Do you have any practical tips? 

[01:21:54] Paul Roetzer: Yeah, you gotta be very aware of them. You have to know that they exist. And if you're using AI in a [01:22:00] higher risk situation where accuracy and fact matters, then there might not be the right use case for these things. Like, you know, I think that's the biggest balance is when you're using them for brainstorming and creative outputs and first drafts that are gonna be, you know, reviewed and edited heavily.

[01:22:18] Paul Roetzer: And someone's gonna check names and places and data points and dollar amounts and all these things like, great. But if, if you're thinking you're writing a research brief it. Let, let's go back to this last again. Let's say you're an agency and you're creating a research brief for a client. . And you have deep research, write it, and you skim it and looks great and all, everything seems right, and you just turn that thing in.

[01:22:39] Paul Roetzer: It ends up it had wrong data in it that's on you. You have the agency in this situation, you own the output. And so I think it's just like understanding that the models do make mistakes and that's normal. So do humans, but understanding what you're allowing them to do without your oversight. And so as long as you understand the human in the loop importance [01:23:00] and you use them in, in use cases where it's okay if they make some mistakes, it's just part of the process.

[01:23:06] Paul Roetzer: Great. Yeah, just don't think you can go in, have these things output something and you don't have to like review or approve it or double check it for accuracy that that's not gonna work well for you. 

[01:23:16] Mike Kaput: Yeah. I'd also emphasize there too, that while. The prompting does still matter. Like, and especially with the deep research tools, it's, you know, every word can count.

[01:23:26] Mike Kaput: So if you're getting these outputs that are just like horrendously wrong all the time, you may also wanna look at, you know, there's no guaranteed way through prompting to avoid hallucinations, but you can be more specific, more detailed, more contextual, specify what sources and types of places you want to draw information from.

[01:23:43] AI Text to Voice Releases

[01:23:43] Mike Kaput: These things can also help. This is true. Yep. Good points. Alright, Paul, we're gonna wrap up here with a bunch of different product updates related to, there's been a ton of like ai, voice technology updates. So I'm gonna kind of run through these rapid fire, [01:24:00] obviously chime in if you are, feel particularly passionate about any of these.

[01:24:04] Mike Kaput: But otherwise just gonna kind of give people a sense of all the stuff that happened this week in voice. So one of the things getting the most buzz online at the moment is something called Sesame, which is an AI startup led by Oculus vr co-founder and former CEO Brendan Uribe. And it's come out of stealth mode with a voice assistant that a reporter at the Verge described as the first voice assistant I've ever wanted to talk to more than once.

[01:24:31] Mike Kaput: Now, often, you know, experiences with Alexa, Gemini, other assistants we've talked about, they're hampered by lag, misunderstandings, stilted responses. But Sesame appears to be very, very good at conversational fluidity. It's able to handle interruptions and course corrections, made conversation. It has a bunch of natural sounding pauses that mimic human speech patterns.

[01:24:55] Mike Kaput: And what's really cool here is they're not just building a better voice assistant. They're [01:25:00] developing companion AI glasses designed to be worn all day, giving you high quality audio and convenient access to your companion who can observe the world alongside you. Now, at the same time, Hagen has partnered with 11 Labs to integrate voice generation capabilities into their avatar creation platform.

[01:25:19] Mike Kaput: So this collaboration addresses what Hagen describes as one of their biggest challenges for creators using the platform, which is finding voices that match the custom avatars they generate. Now you can generate tailored voices by specifying age, gender, language, accent, and descriptive style prompts.

[01:25:38] Mike Kaput: Hume ai, which we've talked about in the past, has been busy with their release of Octave, which they're calling the first LLM, really built for text to speech. Now, unlike conventional text to speech systems that convert text into spoken words. Octave Hume claims that octave represents a fundamental shift in the approach.

[01:25:59] Mike Kaput: It's a [01:26:00] speech language model that actually understands what words mean in context, enabling it to add appropriate emotional inflection, timing, and expressiveness. It can actually interpret the meaning behind text. For instance, if you give it sarcastic dialogues, it naturally adopts a sarcastic tone. Now, using iv, you can do something called voice design, which allows users to create custom AI voices from text prompts.

[01:26:26] Mike Kaput: You can also give acting instructions, which lets you give directions to modify how the text is read. Last, but certainly not least, 11 labs itself has unveiled something called Scribe, which are positioning as the world's most accurate speech to text model. While much of the industry focus has been on generating realistic speech, scribe tackles the reverse challenge actually transcribing spoken content into text across 99 languages.

[01:26:56] Mike Kaput: According to 11 labs, scribe consistently outperforms leading [01:27:00] models like Gemini 2.0, flash, whisper, and deep gram in benchmark tests, and it achieves particularly impressive accuracy rates in Italian and English and demonstrates par huge improvements in traditionally underserved languages, things like Serbian, Cantonese, and Millennium.

[01:27:19] Mike Kaput: Now, beyond basic transcription, scribe offers structured outputs with word level timestamps. It can identify who's speaking and can even tag non-PE audio for events like laughter, and it's available through their API. Alright, Paul, that's a hugely packed week in ai. Tons of developments going on. Thanks for breaking everything down for us.

[01:27:40] Paul Roetzer: Sure. By like Wednesday last week I think I tweeted like, is it Text to Voice week? And I didn't get the memo or something like, I assume there must be like some voice AI summit going on where all this is getting announced. Yeah, exactly. Exactly. It was wild to see it all kind of, all the voice tech coming out at the same time.

[01:27:53] Paul Roetzer: Alright , good stuff as always. Thanks Mike, and we will be back with everyone next week. [01:28:00] Thanks for listening to the AI show. Visit marketing ai institute.com to continue your AI learning journey and join more than 60,000 professionals and business leaders who have subscribed to the weekly newsletter, downloaded the AI blueprints, attended virtual and in-person events, taken our online AI courses and engaged in the Slack community.

[01:28:24] Paul Roetzer: Until next time, stay curious and explore AI.

Related Posts

[The AI Show Episode 137]: GPT-4.5 and GPT-5 Release Dates, Grok 3, Forecasting New Jobs, DeepSeek Investigation, Microsoft Quantum Chip & Google AI “Co-Scientist”

Claire Prudhomme | February 25, 2025

Episode 137 of The AI Show: GPT-4.5 and GPT-5 proposed release dates, Grok 3, understanding AI's impact on the future of work, DeepSeek's investigation, and more.

[The Marketing AI Show Episode 66]: ChatGPT Can Now See, Hear, and Speak, Meta’s AI Assistant, Amazon’s $4 Billion Bet on Anthropic, and Spotify Clones Podcaster Voices

Cathy McPhillips | October 3, 2023

This week's episode of The Marketing AI Show covers AI advancements from ChatGPT, Anthropic, Meta, Spotify, and more.

[The AI Show Episode 90]: Hume AI’s Emotionally Intelligent AI, the Rise of AI Journalists, and Claude 3 Opus Now Beats GPT-4

Claire Prudhomme | April 2, 2024

This week on The Artificial Intelligence Show, Mike and Paul discuss Hume AI's new demo, AI's impact on journalism, Claude 3’s skill surpasses GPT-4 on the Chatbot Leaderboard, and more.