41 Min Read

[The AI Show Episode 88]: Meet Devin, the “First AI Software Engineer,” The Rise of Humanoid Robots, and OpenAI’s Sora Interview

By Claire Prudhomme on March 19, 2024

Podcasts

After last week's predictions of the future of AI, we are starting to see glimpses of these innovations continue to grow in AI Agents, robotics, and more. In Episode 88 of the Artificial Intelligence Show, hosts Paul Roetzer and Mike Kaput highlight Cognition's AI software engineer 'Devin', the significance behind Figure’s Humanoid Robots and OpenAI's Mira Murati's questionable responses to questions about Sora's training data in a WSJ Interview.

Listen or watch below—and see below for show notes and the transcript.

Listen Now

Watch the Video

Timestamps

00:03:27 — Cognition releases Devin, the first “AI software engineer”

00:14:52 — The Significance Behind Figure’s Humanoid Robots

00:22:08 — OpenAI CTO Questioned on Sora AI Model's Data Sources in WSJ Interview

00:29:13 — European Union’s Artificial Intelligence Act approved by the European Parliament

00:33:46 — Suno AI, music generation based on text prompts

00:39:04 — Grok is now open source

00:43:26 — The Top 100 Consumer GenAI Apps from venture capital firm Andreessen Horowitz

00:46:58 — Apple Inc. in talks to build Google’s Gemini AI into the iPhone

00:50:06 — Midjourney introduces feature to maintain consistency in image creation

Summary

Devin, first “AI software engineer,” is released/demoed by Cognition

An AI startup called Cognition is making waves with a demo of its new product, called Devin, that it labels as “the first AI software engineer.”

The demo video shows a user typing in a prompt giving Devin a task to complete using code. The result is that, in seconds in the demo video, Devin is able to execute a complex task using code and even build a website that visualizes the results of the project.

Cognition appears to have come out of almost nowhere (it didn’t even formally exist as a corporation until 2 months ago) and has $21M in funding from notable investors including Founders Fund.

The demo itself is getting incredible buzz online. Tech leaders like the CEO's of Stripe and Perplexity have weighed in with positive feedback.

Ethan Mollick, a leading AI voice we talk about often, got early access to Devin and said that, while it is slow and breaks often, it does make “a plan and executes it autonomously, doing research, writing code & debugging, without you watching.”

A notable builder in the AI education space, Mckay Wrigley, showed off a much more extensive 27-minute live demo using the tool that he said blew him away.

His conclusion:

“The era of AI agents has begun.”

The Importance of Figure’s Humanoid Robots

Figure is a robotics company building physical, humanoid robots. They just raised $675M in Series B funding and announced a partnership with OpenAI to incorporate the company’s multimodal models into their robots.

The company released a demo video showing the results of Figure’s robots running OpenAI models. This video showed off exactly why we are encouraging people to pay attention:

In the video, the robot uses OpenAI’s models to understand the words being spoken to it by a human and the world around it.

Then it uses Figure’s AI to translate that understanding into robotic actions. During the video, the robot responds verbally to a human demonstrator and follows his commands, including giving him an apple, cleaning up trash on a table, and putting away plates.

The robot is able to communicate what it is doing and why, as well as understand the world around it.

Not to mention, Figure isn’t the only one working on humanoid robots. This same week, it was announced that car company Mercedes is looking to trial humanoid robots to perform physically challenging manual labor in partnership with a robotics company called Apptronik.

OpenAI's Mira Murati Faces Scrutiny Over Sora AI Model's Data Sources in WSJ Interview

OpenAI’s CTO Mira Murati is coming under some fire for a recent interview with the Wall Street Journal. She sat down with a WSJ reporter to talk about Sora, the company’s new AI video generation model.

The interview contained plenty of useful discussion about Sora and its capabilities, but it’s drawing attention for one controversial segment.

The WSJ reporter asked Murati what data the model used for training, in order to be able to generate such realistic video.

Murati responded, saying “We used publicly available data and licensed data,” but stumbled when pressed by the Journal. When asked “So, videos on YouTube,” Murati responded with: “I’m actually not sure about that.”

She also said she wasn’t sure if videos from Instagram and Facebook were used to train the model, either.

Instead she repeated that the data used was publicly available or licensed. And shut down further questions about the specific websites or sources used.

Links Referenced in the Show

Devin, first “AI software engineer.” is released/demoed by Cognition
Advancements in humanoid robots
OpenAI CTO Questioned on Sora AI Model's Data Sources in WSJ Interview
- OpenAI's Sora Made Me Crazy AI Videos—Then the CTO Answered (Most of) My Questions - WSJ
- OpenAI’s Sora text-to-video generator will be publicly available later this year - The Verge
The EU’s Highly Anticipated Artificial Intelligence Act
Grok is now open source
- Open Release of Grok-1
Suno.AI
- A ChatGPT for Music Is Here. Inside Suno, the Start-up Changing Everything - Rolling Stone
Andreessen Horowitz Updates Top Generative AI Consumer Apps
- The Top 100 Consumer GenAI Apps from a16z
Apple to Build Google Gemini AI into the iPhone
- Apple Is in Talks to Let Google Gemini Power iPhone AI Features
Midjourney releases feature to uphold consistency across images
- Midjourney debuts feature for generating consistent characters across multiple gen AI images - VentureBeat
- Midjourney bets it can beat the copyright police- TechCrunch

This week’s episode is brought to you by our Marketing AI Conference (MAICON).

From September 10-12 this year, we’re excited to host our 5th annual MAICON at this pivotal point for our industry.

MAICON was created for marketing leaders and practitioners seeking to drive the next frontier of digital marketing transformation within their organizations. At MAICON, you’ll learn from top AI and marketing experts, while connecting with a passionate, motivated community of forward-thinking professionals.

Now is the best time to get your MAICON ticket. Ticket prices go up after Friday, March 22. Visit www.maicon.ai to learn more.

Read the Transcription

Disclaimer: This transcription was written by AI, thanks to Descript, and has not been edited for content.

[00:00:00] Paul Roetzer: If it's fair use to take copyrighted material to train these models as they say it is, if they believe that, then why don't they just say where the data came from?

[00:00:10] Paul Roetzer: Like, Welcome to the Artificial Intelligence Show, the podcast that helps your business grow smarter by making AI approachable and actionable. My name is Paul Roetzer. I'm the founder and CEO of Marketing AI Institute, and I'm your host. Each week, I'm joined by my co host, and Marketing AI Institute Chief Content Officer, Mike Kaput, as we break down all the AI news that matters and give you insights and perspectives that you can use to advance your company and your career.

[00:00:40] Paul Roetzer: Join us as we accelerate AI literacy for all.

[00:00:48] Paul Roetzer: Welcome to episode 88 of the Artificial Intelligence Show. I'm your host, Paul Roetzer, along with my co host, Mike Kaput, who is coming from us today from

[00:00:57] Mike Kaput: From Greenville, North Carolina.

[00:00:59] Paul Roetzer: Greenville, [00:01:00] North Carolina. All right. So we are recording on Monday, March 18th, at noon, Eastern time, which is relevant.

[00:01:08] Paul Roetzer: Because some of the things we're going to talk about today happened this morning. So, it is, is a fast moving place, to be in, as we all know. so we're going to get into it. The episode today is brought to us by the Marketing AI Conference, or MAICON. This is our fifth, Annual MAICON, I believe, September 10th to the 12th in Cleveland.

[00:01:30] Paul Roetzer: this is going to be , the, this is our, flagship event that we created in 2019. So the first time we held it was in 300 people from 12 countries attend the first one. Then we took a hiatus. For, I guess two years. We, we didn't have one. We went virtual. but we came back and it was a small gathering back in 2022.

[00:01:53] Paul Roetzer: and then 2023 things really took off. We had over 700 people in Cleveland in 2023, and we're [00:02:00] expecting. I don't know, man. It's so hard to project right now, but I would, I think 1500 is our goal, but we're ahead of pace, I would say at the moment to the 1500. So, this is an event built really for marketing leaders and practitioners.

[00:02:16] Paul Roetzer: There's going to be an applied AI track that's all about actionable, tools and recommendations and frameworks, and there's going to be a strategic leader track that's going to be all about the business of AI and how we're going to apply it to our talent, our technology, our strategies, our budgets. so, yeah.

[00:02:31] Paul Roetzer: If you're at the point where you're really ready to move forward with this stuff and be a part of kind of the next generation of what's happening in the marketing industry, be amongst your peers, we would love to have you join us in Cleveland for MAICON. So you can go to MAICON.AI, that's MAICON.AI the ticket prices go up on Friday, March 22nd, so that's coming up fast. So if you want to take advantage of the best prices available, again, check out MAICON. AI. to learn [00:03:00] more about the event and get your tickets today. All right, Mike, interesting on the heels of episode 87, where we talked about a theoretical timeline of what could be happening and when, we had some things happen like the day the podcast came out and then the day after that, were highly relevant to what we talked about last week.

[00:03:23] Paul Roetzer: So I'm going to turn it over to you and let's get going talking about those things.

[00:03:27] Cognition releases Devin, the first “AI software engineer”

[00:03:27] Mike Kaput: Alright, Paul, so first up, we have news that an AI startup called Cognition is making waves with a demo of its new product called Devin that it labels as quote, the first AI software engineer. So the demo video that has been released by the company showing Devin in action shows a user typing in a prompt, giving Devin a task to complete.

[00:03:54] Mike Kaput: using code, and then Devin develops a plan to tackle the [00:04:00] problem. It, quote, builds the whole project using all the tools that human software engineer would use, according to Cognition's CEO, Scott Wu. And then from there, Devin uses its own command line, code editor, and browser to go and complete coding projects on its own, using the same type of reasoning and problem solving that a human engineer would.

[00:04:22] Mike Kaput: So the result is, is that in the demo video, Devin's able to execute a complex project using code and even build a website that visualizes the results of that project. Now, Cognition as a company kind of feels like it came almost out of nowhere. It didn't even formally exist as a corporation until a couple of months ago.

[00:04:45] Mike Kaput: It has 20 million, 21 million in funding. from notable investors that include Founders Fund. And this demo itself is kind of blowing up the AI corners of the internet. Like tech leaders, like the CEOs of [00:05:00] Stripe and Perplexity have weighed in with pretty positive feedback. Ethan Mollick, a leading AI voice who we talk about all the time, got early access to Devin and said that while it is slow and breaks often, it does Make a plan and execute it autonomously, doing research, writing code, and debugging without you watching.

[00:05:21] Mike Kaput: Another notable builder in the AI education space named McKay Wrigley showed off a much more extensive 27 minute live demo of him using the tool and he said, The capabilities blew him away. His conclusion, quote, the era of AI agents has begun. Now on that point, Paul, during episode 87, you literally talked about an AI agent explosion that you pegged in about the 2025 to 2027 range of happening.

[00:05:53] Mike Kaput: Has that come early, do you think?

[00:05:55] Paul Roetzer: Yeah, the timing was just hilarious. I mean that, with the podcast dropped [00:06:00] Tuesday morning and this was like around 9am Tuesday, this started kind of taking off. So no, I mean, this is exactly what I was projecting when I was in, you know, kind of said what I said last week.

[00:06:11] Paul Roetzer: And basically I think we're just going to see these increasingly impressive demonstrations. The 27 minute experimentation, from McKay that you've mentioned was, was impressive. I watched the whole thing. Kind of was like trying to analyze what was actually going on. But like I said last week, I think what we're going to see is a bunch of like GPT 1, GPT 2 kind of agents.

[00:06:35] Paul Roetzer: So these things that are really impressive, but not, not like what we're experiencing from a GPT 4 kind of experience. So it's still going to, as Ethan Mollick highlighted, it's still going to break. It's still going to do things wrong a bunch of times. It's going to need you to interject kind of, so lots of human oversight.

[00:06:54] Paul Roetzer: And it's going to have errors. It's going to have all these, these issues. So while it's super [00:07:00] impressive, it does not mean life is changing next week because of this demo. So, as you mentioned, like a lot of times what we do when something like this occurs is like, Go check your network and like, what are the people who you trust in this space saying?

[00:07:13] Paul Roetzer: And you and I, Mike, aren't programmers. So like, we're not able to like truly analyze this thing and go use it ourselves. and kind of figure out what's going on from a coding perspective. So you do have to sort of rely on what other people who understand the technology better than us are saying.

[00:07:30] Paul Roetzer: And so you had mentioned, Patrick Collison is the CEO of Stripe.

[00:07:33] Paul Roetzer: He tweeted, these aren't just cherry pick demos. Devin is, in my experience, very impressive in practice.

[00:07:39] Paul Roetzer: We had, the co founder of Coinbase, Fred Esram.

[00:07:43] Paul Roetzer: he said, first time I've seen an AI take a complex task, break it into steps, complete it, and show a human every step along the way, to the point where it can fully take a task off a human's plate, all built in just a few months.

[00:07:56] Aravind [00:08:00] Srinivas, I think is how we say his name, CEO of Perplexity. He said, this is the first demo of any agent, leave alone coding, that seems to cross the threshold of what a human, what is human level and works reliably. It also tells us what is possible by combining LLMs and tree search algorithms.

[00:08:17] Paul Roetzer: You want systems that can try plans, look at results, replace, and iterate till success.

[00:08:23] Paul Roetzer: and it just went on and on and all these like amazing feedback. And honestly, like, If you just went and looked at the buzz online, you would think ChatGPT moment had happened in agents. Like there was no hedging by any of these people.

[00:08:36] Paul Roetzer: It was just like, this is insane. my take though, is much more along what Ethan Mollick had highlighted is like, Hey, this is great, but it's slow. It breaks a bunch. You can't like remove the human from the loop. If you don't know code, you have no idea if it's doing things correctly or not. And so. You know, I think my thoughts on this is when you look at advancements like this, to [00:09:00] determine when a task or a job is going to be transformed by AI, you have to consider variables such as, well, how reliable is it?

[00:09:08] Paul Roetzer: What is the risk of it being wrong? How repetitive and predictive is the task that it's taking on? How much human oversight is needed? How complex is the task? Does it need reasoning capabilities, mathematics, common sense, intuition? And so. Like I was thinking a lot about this one throughout the week. And the, you know, the example that came to mind is you'll often see these demonstrations where they're trying to show agents like booking a flight.

[00:09:32] Paul Roetzer: And I think what you're going to see this year, what you're going to see with Devin is these really impressive experiments where it's able to go book a flight, but then when you step back and you say like, okay, I'm going to North Carolina, I need a flight, I need a hotel, and it's able to like go do these things, but what the agent doesn't know, it doesn't know the nuances of individual.

[00:09:52] Paul Roetzer: person's preference. like a deeper understanding of how flight works. Which, which airports would I rather [00:10:00] transfer through based on which clubs I have access to? So, you know, if I have an Amex Platinum card, I have access to their clubs and it's like, it's not going to know all these things. So it's not like we're just going to have these agents that we just turn on.

[00:10:13] Paul Roetzer: And things are just done for us. There's going to be a long period of time where we are observers, trainers, mentors, and managers of these agents. And this is going to be extremely true in business and marketing and sales and service. We're not going to just all of a sudden have agents. Like Devin, flip a switch and this thing just watches us do our job once and learns everything from there.

[00:10:35] Paul Roetzer: So I think what's going to happen, as we talked about last week, is jobs are just going to start evolving over the next few years, where the agents are going to be a part of what we do, but your domain experience and expertise, your intuition and common sense, those things are going to be needed. To make these agents economically viable.

[00:10:56] Paul Roetzer: And so I think throughout the year, we're going to see really impressive [00:11:00] demonstrations from Google, Microsoft, OpenAI, Adept, HyperWrite, Maltan, like they're all working on these exact same things. And what we know about this space is people follow fast. Like you're going to see demonstrations of this.

[00:11:11] Paul Roetzer: Heck, we might see stuff this week from NVIDIA with their GTC conference. We just don't know. So my, my overall take here is I think what we're going to see is agents that are built for specific tasks because the less general they are, the more they're trained to book flights or send your emails or, or manage your social media, the more focused they are, the quicker you can reduce errors.

[00:11:37] Paul Roetzer: The less human oversight will that be, will be needed. It becomes more reliable and it can maximize its impact. Like we talked about last week with Klarna. so I think you're going to have agents that are trained in specific domains that are capable of reasoning and planning, have the ability to search like from a database to verify processes and data and decisions, and then they're connected to applications and tools and really perform these [00:12:00] actions.

[00:12:00] Paul Roetzer: But I don't think we're going to see these massive general agents like GPT 4, you know, from a language model perspective that just does everything. Everything out of the box, but I think you're going to see smaller agents that are faster and cheaper like a social media manager AI and email manager AI and ad buyer AI.

[00:12:17] Paul Roetzer: And I could actually start to see, and I think Ethan Mollick referred to this, is like these things almost become a part of your org chart where, I don't know if you personify them with names, but I do think we're just going to look at the org chart in the future and start thinking, okay, well, the AI plugs in here and it's going to do the job of four people doing this, but it's going to need a trainer or a mentor or someone who's actually overseeing it and training it and constantly evolving it and things like that.

[00:12:44] Paul Roetzer: So. Cool. Yeah, I mean, long story short, really, really impressive. Like go watch the demos. It's crazy, but don't think that, you know, I think it was the, what was the metric I gave last week? Your life noticeably changes as a result of like an advancement. I [00:13:00] don't think anyone's work or life is going to noticeably change because of this demonstration, but this is exactly the kind of stuff we're going to start seeing more and more of this year.

[00:13:11] Mike Kaput: That makes a ton of sense. Yeah. It reminds me time and time again of something we often say here and talk about often offline. You know, the reason our mission is AI literacy for all is because we have to arm everyone with the knowledge of what this stuff is starting to be able to do. So you can go figure out in your own career, what does AI agent mean for your particular business or function?

[00:13:37] Mike Kaput: So I love that. idea of potentially having one of these things as your co worker.

[00:13:42] Paul Roetzer: Yeah, and I think, Mike, so much of what we try and do is, like, cut through the noise and the hype and say, is this real yet? and like I said, like, I mean, if you just looked at, on Tuesday, all the people that were, like, retweeting this video, like, the demo video from them, and then [00:14:00] the, you know, once people started getting their hands on it and showing, like, actual real world stuff.

[00:14:04] Paul Roetzer: If you don't know what you're looking at, like you really could think we just had the ChatGPT moment of agents and like life just changed. And so that is from everything we can determine and everything we can look at. That is not what happened. And in reality, it's probably not even a massive technical breakthrough that the other AI labs don't already have the ability to do.

[00:14:27] Paul Roetzer: It just had its moment and the demo was awesome. And, you know, I think it's going to just keep making this part of the conversation moving forward. People are realizing like this stuff is becoming very real and it will start having an impact, but I still feel like. 2025 is a more realistic time where it starts to, you know, maybe you and I are looking at a social media AI to infuse into the Institute.

[00:14:52] The Significance Behind Figure’s Humanoid Robots

[00:14:52] Mike Kaput: Our next big topic today is somewhat related to that future timeline that we see kind of coming relatively [00:15:00] fast. you wrote. This week on LinkedIn, if you haven't been paying attention to the advances in humanoid robots, now would be a good time to start. You wrote that in response to a demo video from a company called Figure that we covered last week and a couple other weeks in different contexts, which is a robotics company building physical humanoid robots.

[00:15:23] Mike Kaput: So they just raised 675 million in Series B funding and announced a partnership with OpenAI. to incorporate the company's multimodal models into the physical robot. So this demo showed off exactly why you're encouraging people to kind of pay attention to this thing that could be coming down the line, because the video, which I'd recommend everyone go watch, shows the results of one of Figures robots running an OpenAI model.

[00:15:52] Mike Kaput: So in the video, the robot uses OpenAI's model to understand words being spoken to it by a human and [00:16:00] view things in the world around it and make connections between all these inputs. Then it uses Figures. ai to translate that into robotic actions. So during this video, the robot will respond verbally to a human demonstrator, follow his commands.

[00:16:17] Mike Kaput: It includes giving him an apple, cleaning up trash on a table, putting away plates. And during all this, the robot is able to communicate what it's doing and why, as well as understand how the world around it, the immediate environment of a table and objects, has changed. This is not the only company working on humanoid robots.

[00:16:38] Mike Kaput: This same week it was announced the car company Mercedes is actually looking to trial humanoid robots to perform physically challenging manual labor in partnership with a company called Aptronic. So my first question for you, Paul, is It's very similar to the one about AI agents. You [00:17:00] predicted a robotics explosion from anywhere from 2026 to 2030.

[00:17:05] Mike Kaput: Has your timeline changed at all on that?

[00:17:08] Paul Roetzer: So far, it's looking like I had some idea that these things were coming last week. I did not. I, like, it just, just so happened. This is the topic we were on last week, but yeah, I mean, so last week, what I said was 26 to 30, robotics explosion, lots of investment going into humanoid robots, OpenAI, Tesla, FIGURE, we mentioned them, seeing major advancements in the hardware.

[00:17:30] Paul Roetzer: And then the multimodal LLMs are the brains. again, I didn't know OpenAI and FIGURE. I don't think at that time I knew that they were working together. but that's exactly what this is. So this is an early demonstration of what happens when you take a multimodal LLM, like OpenAI has created.

[00:17:49] Paul Roetzer: And you're able to put that, embody it in a robot. The robot now. Not only understands language and can generate language, it has reasoning capabilities. It has memory of [00:18:00] previous conversations in theory. It understands things through image recognition. So, with the Apple example, if you watch the video, the crazy thing, the food Apple, not Apple the company, all the person said was, I'm hungry.

[00:18:15] Paul Roetzer: And the robot determined that the only thing it had in front of it that the person could eat was an apple and it picked up the apple and handed it to the person. So again, this like reasoning, it was given a goal, like this, this human is hungry, and then it had to think through a process of what do I have that I could give this human?

[00:18:34] Paul Roetzer: And it knew that an apple could be eaten and it picked up and that's crazy stuff. Like a robot couldn't do that. a year or two ago. So the fact that it can now see things, understand things, that's, you know, really what's going on here. So, you know, I think as we alluded to last week, Tesla Optimist is their robot.

[00:18:53] Paul Roetzer: I assume Tesla will have Grok, G R O K. we will talk more about Grok in a couple [00:19:00] minutes here, but I think we're just going to keep seeing this. So again, Really, really impressive. definitely noteworthy. And again, you should watch it to see it for yourself. It's just like two and a half minutes long.

[00:19:12] Paul Roetzer: but it's the intelligence being embodied in the robot. That's the real on lock here. Not exclusive to Figure. Other people are going to be working on this. But the fact that they teamed up with OpenAI. accelerates their ability to build intelligence into it. And there's a great thread that we'll put the link to on X from a guy named Corey Lynch, who's a senior AI engineer of robot manipulation at Figure.

[00:19:37] Paul Roetzer: Prior to this, he spent seven years as a senior research scientist of robotics at Google and on the Google brain team. And he gives a really nice like technical overview, but it's not overly technical. technical, if that makes sense. He's just basically saying like, Hey, here's what's going on in this video.

[00:19:52] Paul Roetzer: So it's, you know, the robot can describe its visual experience. It can plan for future actions. It can reflect on its memory [00:20:00] and it can explain its reasoning verbally, which is really cool. and so, you know, he goes into the behaviors are learned. So it learns by watching the videos at normal speed.

[00:20:10] Paul Roetzer: There is no one tele operating. So there was some. Issues with, I think it was Optimus, where people questioned whether or not the robot was actually doing what it was doing, or if humans were operating it remotely. so he's saying we just feed images from the robot's cameras, they're transcribed from text to text, from speech, captured by the onboard microphones, to the LLM that's trained by OpenAI.

[00:20:33] Paul Roetzer: That model is able to understand both images and text. The model then processes the history of the conversation, including past images, to come up with language responses, which are spoken back to the human via text to speech. The same model is responsible for deciding which learned behaviors to run the robot through to fulfill the command.

[00:20:53] Paul Roetzer: So really impressive stuff. and again, like the agent, when this doesn't change my timeline, I fully expected this stuff [00:21:00] to be happening. This is what's being talked about by Dr. Jim Fan and all these labs. So again, if you're, if you're following the right people, It's obvious all of these things are going to happen.

[00:21:09] Paul Roetzer: I just think most people probably aren't quite as in tune to that. And so like things like this may look as like, Oh my gosh, like did everything just change? No, but really cool. And it's showing you how quickly innovation is going to happen. I talk to people all the time who, you know, think things are decades away.

[00:21:27] Paul Roetzer: And what I tell people is you can't say anything is decades away right now, because we don't comprehend. What an exponential growth curve feels like. And this is the kind of stuff that is a demonstration to us of how fast technology can, can evolve and these capabilities. And when you start to try and look out a few years out, you can realize that the timeline we talked about last week probably isn't that crazy.

[00:21:54] Mike Kaput: Yeah, I love that. And if people have not reviewed that timeline, I'd highly [00:22:00] recommend they do just to get a sense of just how quickly all this is moving. super relevant. Love that breakdown.

[00:22:08] OpenAI CTO Questioned on Sora AI Model's Data Sources in WSJ Interview

[00:22:08] Mike Kaput: In our third big topic today, We have some, human beings that maybe aren't as good on video as some of the robots that we've seen.

[00:22:17] Mike Kaput: We had this scenario where OpenAI's CTO, Mira Murati is coming under some fire for a recent interview with the Wall Street Journal. She sat down with a journal reporter to talk about Sora, which is the company's new AI video generation model. The interview actually contained plenty of really useful discussion about Sora, its capabilities, but it's getting all the attention because of one controversial segment.

[00:22:45] Mike Kaput: So the journal reporter asked Murati what data the model used for training in order to be able to generate all the realistic videos that we've seen in Sora demos. Murati responded saying, quote, we used [00:23:00] publicly available data and licensed data. But then when the journal pressed her about where this data was coming from, the journal reporter asked, so does that mean videos on YouTube?

[00:23:11] Mike Kaput: Murati responded with, I'm actually not sure about that. She also said she wasn't sure if videos from Instagram and Facebook were used to train the model. Instead, she kind of kept repeating that the data used was publicly available. And then at one point during the interview, just shut down further questions about the different websites and sources being used.

[00:23:35] Mike Kaput: So Paul, we both watched this interview. I mean, how bad is this? We both come from the PR world. this seems like it should have been a pretty preventable PR issue.

[00:23:47] Paul Roetzer: Yeah, I mean, from a PR communications perspective, it's just a total miss on OpenAI's part. It'll go away. It's fine. Like, I mean, this stuff happens all the time.

[00:23:56] Paul Roetzer: It's shocking how often tech companies [00:24:00] allow this stuff to happen because they don't put enough value, I think. And we saw this in our own careers, Mike. Like, the PR and communications function isn't understood or often appreciated, I think. And especially in like a fast moving tech company. And yeah, like how you sit down for an interview with the wall street journal without an FAQ that would, this would be question number one, I would have put like, again, we've done training on this stuff.

[00:24:24] Paul Roetzer: We used to run workshops, teaching people how to prep for media interviews. And this would have been like one A of the questions I would have prepared this person for. So this was obviously they, they just didn't. like take it seriously or really think about what might be asked within this environment.

[00:24:41] Paul Roetzer: so yeah, certainly from that perspective, it was just a miss and hopefully they fix that moving forward. But at a more important level, the fact that they can't answer that question, like even if they had prepared, what would they have prepared her to say? because here's the thing that is [00:25:00] really.

[00:25:00] Paul Roetzer: Kind of challenging to understand with all of these companies that are building these models. If it's fair use to take copyrighted material to train these models as they say it is, as they're basically claiming in the lawsuits that are against them right now, if they believe that, then why don't they just say where the data came from?

[00:25:20] Paul Roetzer: Like, if you think you're allowed to take videos from YouTube to train these models, then why don't you just say we used YouTube videos? So obviously, they're not confident in their own stance. There's no way the CTO of the company doesn't know whether or not they used YouTube videos. So that to me is the bigger issue here.

[00:25:41] Paul Roetzer: It's that if these companies truly believe they're legally allowed to be taking this data to train their models, then just say that's what you're doing. but they're not saying it. So it's a, it's, it's a much larger issue than a PR issue. That being said, I will suggest, like, [00:26:00] the video's worth watching from the other perspective, so it's only about a 10 minute video.

[00:26:05] Paul Roetzer: But there was a few things that I made note of. One was, kind of like we were seeing with the agent conversation we just had, the demos of Sora are mind blowing. Like, if you go watch all these videos, and they keep releasing them on Instagram, they're incredible. But the Wall Street Journal had them create for you.

[00:26:24] Paul Roetzer: Four videos specifically for this interview, and every one of those videos had a bunch of flaws. Like, for example, Bull in a China Shop. The bull, which looked a lot like the Ferdinand Bull from, was it a Disney movie, I think, was like stepping on all the, the China, and none of it was breaking. so that was a flaw.

[00:26:44] Paul Roetzer: There was, there was just flaws in everything. So, We're seeing the best examples of Sora being released by OpenAI, but they still have problems. She was asked about the speed, and she says it does take a couple minutes to generate the examples we're seeing, and that these are much more [00:27:00] expensive right now than ChatGPT outputs or even DALL E image generation.

[00:27:04] Paul Roetzer: But by the time they release this, that they want it to cost roughly what it costs to do DALL E. She was asked about audio, and she said they're not working on it yet, which I thought was kind of interesting. I would have assumed they would be working on that, but they're not working on building audio into these, and that the model is basically in red teaming right now.

[00:27:22] Paul Roetzer: They're working on researching watermarking and content providence. It was actually two of the things that Mira said. they think a lot about and is slowing down the release of this. And then the reporter asked specifically about the release timeline. And she said this year, probably a few months.

[00:27:40] Paul Roetzer: And so again, that's kind of what we've said is, you know, I could see this by summertime, probably being built into ChatGPT, maybe for an additional fee. And it sounds like that's probably a realistic timeline to look at based on what Mira was saying.

[00:27:53] Mike Kaput: Yeah. I think that point about the quality or. Kind of being cherry picked nature of the demos [00:28:00] is so, so important, because I feel like Sora is one of those things, let me know if you agree.

[00:28:05] Mike Kaput: That was the thing where I had people reaching out to me being like, wait, what's happening here when we've covered that? So I think it is easy to take a look at the demo and be like, oh my god, the world is going to change completely the moment this thing becomes publicly available. That does not seem to be the case, but We'll see.

[00:28:23] Paul Roetzer: Yeah, and they, she asked her more about that and like there was an example where like there was a person standing on a corner and a car was a cab, like a yellow taxi cab as it was coming through the screen and then it passes the person and all of a sudden it becomes like a gray sedan. And so things like that were just completely inconsistent.

[00:28:41] Paul Roetzer: And so she asked her about the possibility of like editing that and she said, well, we're working on that. That'll give people more control of the, of the outputs and the ability to edit the output so that they become more reliable. so yeah, again, like, I mean, she, outside of the faux pas around the [00:29:00] training question, there was actually quite a bit of interesting information about SOAR that I had not previously seen publicly, so valuable interview other than the training data problem.

[00:29:13] European Union’s Artificial Intelligence Act approved by the European Parliament

[00:29:13] Mike Kaput: All right, let's dive into some of our rapid fire topics this week. First up, the European Union's long awaited Artificial Intelligence Act was finally approved by the European Parliament this week. According to the Parliament, the Act aims to protect fundamental rights, democracy, the rule of law, and environmental sustainability from high risk AI while boosting innovation and establishing Europe As a leader in the field, the regulation establishes obligations for AI based on its potential risks and level of impact.

[00:29:49] Mike Kaput: Those obligations include some pretty, it seems, direct measures like banning certain AI applications, What they, that the EU says threaten [00:30:00] citizens, so examples given are like untargeted scraping of facial images to create facial recognition databases. It's going to forbid emotion recognition technology in the workplace and in schools.

[00:30:14] Mike Kaput: Interestingly, types of AI that manipulate human behavior as well. So there's also going to be a bunch of new transparency rules and risk assessments required for AI systems by the EU. as well. This suite of regulations sounds like it's rolling out. Progressively over the next few years, and according to the Wall Street Journal, it applies to AI products that are in the EU market, regardless of where they were developed, and the EU could levy fines up to 7 percent of a company's overall revenue for violation of the Act.

[00:30:50] Mike Kaput: So, as a formality, the law still needs to be approved by European member states, but they think that's going to happen very easily since the member states have already [00:31:00] agreed. on what to put in this act that went through the parliament. Now, kind of at the same time, we're also getting some parties in the U.

[00:31:08] Mike Kaput: S. government, it sounds like, increasingly worried about AI. We got, in this same past week, a report commissioned by the State Department. that says the U. S. government must move quickly and decisively to avert substantial national security risks stemming from AI, which could in the worst case cause an extinction level threat to the human species.

[00:31:30] Mike Kaput: So, doesn't appear to be pulling its punches there. According to Time, reports like this also have recommended a set of sweeping and unprecedented policy actions. that could disrupt the AI industry. Again, these are all things recommended in a report advising the State Department, but some of them are pretty extreme.

[00:31:50] Mike Kaput: Things like Congress making it illegal to train AI models that use more than a certain amount of computing power. So really, what we've got here [00:32:00] are an initial piece of legislation that has finally moved through to start defining, for better or for worse, what AI is. within the EU, how AI can be used, but also a lot of movement and rumblings from other governments about how to deal with AI.

[00:32:15] Mike Kaput: So, Paul, I'm curious, like, what was your reaction initially to the AI Act being approved in the EU?

[00:32:24] Paul Roetzer: We knew it was coming. I mean, I think we have to go back and look at what episodes we dove deeper into this, but what we had originally heard was by the end of 2023, this would likely pass. And then there was the formalities of, of actually the final votes and everything.

[00:32:39] Paul Roetzer: So, yeah, like no surprises. I think it's been adapted slightly since maybe where it was in the fall. some, some slight changes made around the foundation models. so yeah, I guess no shock. Very little chatter about it online from the networks I follow. Like I don't know about you, but I didn't really see [00:33:00] anybody talking about this.

[00:33:00] Paul Roetzer: So I think everyone just kind of assumed it was coming. the impact to US based companies that do business, over there are, it seems unknown. Again, there was a lot of talk about it last summer and into the fall, but I just haven't seen much about it since. There hasn't been any commentary from OpenAI or even Yann LeCun at Meta.

[00:33:20] Paul Roetzer: Like I just haven't seen much. So. I guess to be continued, we'll, you know, keep an eye on it. And, as there's more to talk about on the topic, you know, we'll, we'll do it, but maybe for this week's show notes and even the newsletter, we could pull, from the fall when we went a little deeper on this and share kind of some background on the AI Act for people who aren't familiar with it.

[00:33:46] Suno AI, music generation based on text prompts

[00:33:46] Mike Kaput: So next up, we now have a ChatGPT for music generation according to Rollingstone. So the publication just published this in depth profile on a tool called Suno AI, [00:34:00] which is a startup that generates music based on text prompts. A reporter at Rollingstone used the technology with the solo acoustic Mississippi Delta Blues about a sad AI, and the result is that which they also published and everyone should check out is a really cool approximation of an entirely AI generated song that sounds like a human could have made it and they called it Soul of the Machine and the writer of this piece said Soul of the Machine feels like something different.

[00:34:36] Mike Kaput: The most powerful and unsettling AI creation I've encountered in any medium. And truly, if you go listen to it, you're like, whoa, this is pretty crazy. Then AI was able to generate this type of music. Now, I was personally pretty blown away by the example. Again, with the caveat throughout this episode, it's just one example, just one demo of technology.[00:35:00]

[00:35:00] Mike Kaput: But do you expect. AI to have the same kind of disruptive effect on music generation as it's starting to have in other mediums, like areas like writing, images, video.

[00:35:13] Paul Roetzer: Yeah, so, you know, I guess people who've been listening for a long time know when we talk about generative AI, there's like five main categories we think of.

[00:35:22] Paul Roetzer: So you have text, words, images, video, audio, and code, and certainly music we would put under the audio umbrella. And I think I've said like that I saw video and audio being This year kind of what we've seen with image and language in the last two years that we were going to see some pretty rapid advancements in this space.

[00:35:42] Paul Roetzer: And so this definitely, you know, fits the bill and you can go to the site. It's just suno. ai, SUNO.AI. And you can, they have this cool wheel up front where you can just pick something that'll like play it for you, but you can create an account and get access to their V3 alpha. So it's, you know, [00:36:00] obviously they're presenting it as a very early version of the product.

[00:36:05] Paul Roetzer: But you can go in and start making songs, just whatever description you want. So I think it's, it's fascinating. They're going to face the same issues around the training data. So in the Rolling Stone article, they glossed over this, but I actually saw this because there's a guy follow on Twitter, Ed Newton Rex, who left stability AI over, a disagreement with how they were stealing people's data to train their models.

[00:36:30] Paul Roetzer: So he was the guy in charge of it. And he, he left because he didn't, agree. So he shares all kinds of insights on Twitter around, training data. And so he highlighted. The section in the Rolling Stone article, it says OpenAI faces multiple lawsuits over ChatGPT's use of books, news articles, and other copyright material and its vast corpus of training data.

[00:36:52] Paul Roetzer: Suno's founders declined to reveal details of what data they're shoveling into their own model, other than the fact that its [00:37:00] ability to generate convincing human vocals comes in part because it's learning from recordings of speech. Sounds awfully familiar to Sora. The other couple of things that I found interesting was apparently these guys as like four founders, they had no intention of doing this.

[00:37:15] Paul Roetzer: This wasn't like what they set out to build. They were actually working at a company called, Kenshu. And they were working on like, transcription technology for capturing public companies earnings calls. And dealing with like, audio quality. and jargon and various accents. And they realized like, Oh, we should go build something.

[00:37:36] Paul Roetzer: So they're all machine learning experts. And so they ended up landing on, you know, building these, the song machine, two of the founders are musicians. So it says that they're hyper focused only on reaching music fans who want to create songs for fun. But you can certainly play this out and consider the impact on anyone who currently licenses music for anything from TV show production to [00:38:00] ads to ad agencies.

[00:38:01] Paul Roetzer: So you could see this becoming an alternative as this gets good. Now, again, they're not the only ones working on this. Meta is working on this. I'm sure OpenAI is doing something like everyone's working on variations of this. So I do think, I didn't have this one on my timeline last week, but. You know, one to two years out, being able to create anything you can imagine and any voice you can imagine is certainly very viable.

[00:38:27] Paul Roetzer: And I could see, you know, I think we talked about Grimes, was it? Maybe a few months back, who was like, Oh yeah, just steal my voice. Like make, make songs sound like me. And then she backtracked off of that. You could see artists licensing Their voice to be able to make songs, in their tone and style. So it's going to be a really interesting space.

[00:38:45] Paul Roetzer: There's going to be some disruption, there's going to be some lawsuits, and there's going to be some entrepreneurial musicians who say, all right, let's just do it. Let's just dive into this and find a way to make money off of it. So definitely a space to keep watching on [00:39:00] the innovation's going to happen quickly here too.

[00:39:04] Grok is now open source

[00:39:04] Mike Kaput: So in some other news, Elon Musk's AI model, Grok, is now open source, um. After Musk teased the fact that Grok would become open source last week, which we mentioned briefly on the podcast, his company, XAI, released the base model weights and network architecture of Grok1, their main large language model.

[00:39:27] Mike Kaput: Now, Grok1 is a 314 billion parameter mixture of experts model that is designed to essentially do what ChatGPT does, but in true Elon Musk fashion, being much more quote unquote open and less, willing to shy away from controversial, prompts. But according to the company, this is the raw base model checkpoint from the Grok 1 pre training phase, which concluded in October 2023.

[00:39:56] Mike Kaput: This means that the model is not fine tuned for any specific [00:40:00] application, such as dialogue. So, in the move that Musk announced amidst his ongoing lawsuit with OpenAI, he is claiming that the company is in breach of contract for abandoning its founding agreement, committing it to openness. We covered that at length, in a previous podcast episode, but this seems to also be a response to that.

[00:40:22] Mike Kaput: to some of those actions. So, Paul, I know you are a person who follows closely what Elon Musk does in terms of how it impacts AI. Like, what were your thoughts on this move overall and the level of openness here?

[00:40:36] Paul Roetzer: So he tweeted he was going to do this last week and we kept waiting and it didn't come out until Sunday.

[00:40:42] Paul Roetzer: They finally dropped the the open source link. My hope was they were going to release a new version of Grok with it. And again, like not to be overly, critical here, but I just don't understand what Grok is supposed to do. Like, so I went in, I was like, oh, okay, maybe, maybe it's better. [00:41:00] So I'm, I pay my 22 a month or whatever you have to pay to get access to Grok on, Twitter/X.

[00:41:06] Paul Roetzer: And so I went in and I was like, well, let me just see if anything's changed. So I said, I think that I would, I would read verbatim for you, but you can't access past, you Conversations in Grok, unless I'm missing something, I don't know how to see something I've previously talked about Grok, talk with Grok about.

[00:41:22] Paul Roetzer: So I'm just going to go off a memory here. I said, who are the most influential people in AI? And of course, Elon Musk was number one. It listed five people and Elon Musk was number one. And so just to have fun with it, I said, well, what about Sam Altman? And it said, I took a screenshot of this one, cause I remembered that it might, not give me access.

[00:41:41] Paul Roetzer: And it said, Sam Altman is an influential figure in the field of artificial intelligence and technology. He is the CEO of XAI, a company focused on creating trustworthy advanced AI systems. And I thought, that's why I replied to Greg. He's like, no, Elon Musk is the CEO of XAI. Sam is the CEO of XAI. OpenAI.[00:42:00]

[00:42:00] Paul Roetzer: And it said, Oh, I'm sorry. Like, you're, you're right. Elon Musk is the CEO of XAI. And so I replied and I said something like, where are you getting your data? Like, do you have the ability to verify facts? Because Elon Musk, it seems like you should know that Elon Musk is the CEO. And he replied, I'm sorry.

[00:42:19] Paul Roetzer: You're right. Sam Altman is the CEO of XAI. I was like, What is this? Like, it's like, and then I think I said something like, where are your sources coming from? And it was just like a completely useless conversation. so again, not to be overcritical of Grok, I don't know what the purpose of it is yet, there was something I saw claiming it was as powerful as some of the other open source models, again, on whatever tests it's being tested on.

[00:42:45] Paul Roetzer: I don't get it. Like if he can't do something as simple as. Who's the CEO of the company that made you isn't something it can tell you. so again, it might become super powerful. It might be amazing. he, he did what he said he was going to do. He [00:43:00] open sourced it. I don't know why you would use it yet.

[00:43:02] Paul Roetzer: I don't think it has any business or marketing function at all for people who would listen to the show. It's something we'll keep an eye on. I assume it's what will be embodied into Optimist, their robot. So I would guess they're going to make some pretty big advancements at some point here, but.

[00:43:18] Paul Roetzer: Yeah, my take was, I really wanted it to be a new version of the model that was actually useful, and it is not.

[00:43:26] The Top 100 Consumer GenAI Apps from venture capital firm Andreessen Horowitz

[00:43:26] Mike Kaput: So we also, this past week, got an updated version of a list of top generative AI consumer apps from the famous venture capital firm Andreessen Horowitz. And the previous list that they have released, with all these top consumer apps in generative AI, was They come out in September of 2023. So six months later, they went back in and updated this list and split out the top 50 tools for both web and mobile based on [00:44:00] traffic and usage.

[00:44:01] Mike Kaput: So you've got 50 Gen AI apps for each platform. And based on Andreessen Horowitz's analysis, ChatGPT. is still in its leading spot across web and mobile as the number one app people are using most, the most popular one by traffic. But on the web, Google's Gemini takes the number two spot with character.

[00:44:22] Mike Kaput: ai at number three. On mobile, Microsoft Edge is number two, while PhotoMath, an app that uses AI to solve math problems just by snapping a photo of is number three. Now, I'd highly recommend people go take a look at these lists, if anything, just to discover new interesting tools. but what's most surprising in Andreessen Horowitz's analysis of what's changed in the last six months is that over 40 percent of the companies And the list of top web gen AI apps are all new.

[00:44:57] Mike Kaput: They were not on the list at all six [00:45:00] months ago. So it seems like, Paul, there's quite a bit of innovation happening, obviously, in this space, and we are quickly getting apps that can rise very fast to the top spot in terms of popularity. Was anything surprising to you about this list?

[00:45:15] Paul Roetzer: And just how few of them I'd even heard of, I think, like, I don't know about you, but, you know, there's like a hundred total.

[00:45:21] Paul Roetzer: And I mean, maybe 10 percent of them I've heard of, maybe. And we bought this space pretty closely. So like you said, I think just the fact that there's all kinds of ones in here, you could probably go explore and check out. Certainly the web products. I knew more of the web products. than I did just the straight up apps.

[00:45:42] Paul Roetzer: So yeah, it just demonstrates how fast the space is moving and how many tools are being built. And it's, this isn't going to slow down anytime soon.

[00:45:50] Mike Kaput: Yeah, and I don't think I've had the mental space to dive into the rise of companion and character apps, like character. ai, which [00:46:00] perennially kind of surprises me as how popular it is.

[00:46:03] Mike Kaput: But hey, that's the new normal, I believe.

[00:46:05] Paul Roetzer: That one I've at least, like, checked out. Right. Yeah, there's so many, so many applications of AI that you and I just don't dive into each day. Which is also why I feel like there's just so much space here for people who want to, like, Specialize in their domain with this stuff.

[00:46:23] Paul Roetzer: I mean, this is, we're talking about consumer here, but on the business side, the same thing's going to be true. Like, you know, we talk about Descript as the tool we use for audio and video. you know, people who specialize in audio video, you may have 10 other ones that you love that you go find, cause that's what you do every day.

[00:46:38] Paul Roetzer: And I just, I feel like maybe that's the takeaway for me here is there's an opportunity to specialize in your domain and become like, someone who and AI experts, specifically for the field you know and the thing you do, because the space is just going to be so vast that it's impossible to just be a generalist that knows all of these tools.

[00:46:58] Apple Inc. in talks to build Google’s Gemini AI into the iPhone

[00:46:58] Mike Kaput: Alright, so we've got some [00:47:00] news that broke very early, or right before, rather, we started recording today. Which is that according to Bloomberg, Apple is now in talks to build Google's Gemini AI engine into the iPhone, according to people familiar with this situation, which is a little bit of a big deal given both the size of these companies involved.

[00:47:22] Mike Kaput: So Paul, I wanted to maybe get your thoughts on what you, your take is reading that. I mean, at the same time, interestingly, we had some news where Apple released a paper about its family, a family of multimodal models that it had developed that was widely seen as very open in terms of sharing all the types of details of some of these models.

[00:47:45] Mike Kaput: So what did you think of the fact that Apple might be in a tie up with Google here?

[00:47:50] Paul Roetzer: Yeah, so this did break like right on 8 a. m. Eastern Time on Monday when we were recording this, and my first reaction was a bit of [00:48:00] shock, like definitely not one that I expected. That being said, Google and Apple have a long history of working together.

[00:48:08] Paul Roetzer: Apple pays billions of dollars to Google to, you know, or vice versa to have, Chrome be like the default browser on, on the iPhones and things. So they, they've worked together, which means this could be a red flag for regulators who are already looking at the power they have on mobile. And two companies.

[00:48:27] Paul Roetzer: I don't know what it means though. Though, technically, like, you know, we've talked about the June conference coming up for the developers for Apple, where we expect announcements to be made. We know they're going to, you know, infuse more AI into the next version of iOS, which I think is iOS 18. So we know they're redistributing people from Project Titan, the car project to work on generative AI.

[00:48:50] Paul Roetzer: So you could look at this and say, well, maybe they're not as far along on generative AI We thought they were and they need Google to execute what they're planning on doing, or [00:49:00] maybe there's specific applications of Google Gemini that they're looking and using that Google's just far and away ahead and Apple wants to integrate that.

[00:49:07] Paul Roetzer: No one seems to know yet and neither company is commenting publicly about it. So it's just really a lot of theory at this point, but I mean, it certainly seems very positive for Google. that their Gemini models are pretty good. You know, certainly advanced enough to be considered to be integrated into the iPhone in this way.

[00:49:26] Paul Roetzer: their stock jumped, I think it was up 7 percent in pre market trading. and Apple's was up slightly. So I think it's just to be determined, but definitely something to watch. And I'm really excited. By the day, becoming more anxious for what Apple announces at their developer conference in June.

[00:49:44] Paul Roetzer: It's going to be fascinating to see the play they make, and I still think that whatever it is, we will have in our hands the early versions of all of this as consumers by the fall. Like, this isn't a multi year thing. They're going to do this fast. And then it'll [00:50:00] progress over time, but we're going to experience versions of this later this year.

[00:50:06] Midjourney introduces feature to maintain consistency in image creation

[00:50:06] Mike Kaput: So in our last big news item today, the popular AI image generation tool, MidJourney, has released a feature that it seems like users have been clamoring for for quite a while. And that's the ability to recreate characters consistently across your images in each new image you're prompting MidJourney for.

[00:50:28] Mike Kaput: So now what you can do is essentially add a new type of tag to a new prompt in MidJourney that references the URL of a previous image where you've generated a character you like that you want to keep working with in the tool. And MidJourney will try to generate that same character in its new outputs.

[00:50:48] Mike Kaput: So, to date, it's been pretty difficult to get image generation tools to do this, to create that same character and say, show me that character that you created previously in all sorts of different [00:51:00] settings or new environments. That hasn't stopped users, obviously, from making MidJourney incredibly popular and using it constantly, but it has raised some barriers with people like storytellers trying to use these tools to generate characters and scenes consistently in things like film, comics, storyboards, and more.

[00:51:23] Mike Kaput: That does seem to be changing. I mean, some of the early examples are somewhat impressive in terms of being able to sustain a character's image and likeness across many different prompts. So, Paul, depending on how you look at this, this looks like further opportunity or disruption to how we typically do visual storytelling in the sense that it enables us, if we so choose, to use AI in more visual storytelling work.

[00:51:52] Mike Kaput: Is that going to be your impression reading this news?

[00:51:54] Paul Roetzer: Yeah, we've talked about this recently with the video too. It's just this, the consistency, like in [00:52:00] video from frame to frame, if you want to start applying these technologies to doing things like designing storybooks and all this stuff, like you need consistency of character.

[00:52:08] Paul Roetzer: So the ability for the AI to maintain consistency in both images and video is going to be critical. And the other thing I would say is kind of on the theme of today, there was a TechCrunch article about this from Kyle Wiggers. And to quote out of that article, MidJourney is flying high at the moment, having reportedly reached around 200 million in revenue without a dime of outside investment.

[00:52:34] Paul Roetzer: Lawyers are expensive, however, and if it's decided fair use doesn't apply in MidJourney's case, it'd decimate the company overnight. So, I guess just to stick on that theme of whether or not people are allowed to train the models the way they have. So, yeah. Not taking away from the fact that this stuff is incredible and it's able to do all these amazing things, but we can't lose sight of the fact that we still have no idea, in many cases, what was used to train these models and whether it was even legal that they did it.

[00:52:59] Paul Roetzer: [00:53:00] So, a story I am sure we will hear a lot more about throughout 2024.

[00:53:07] Mike Kaput: All right, Paul. Well, that's a wrap for this week's episode. As a quick reminder to our audience, if you go to MarketingAIInstitute. com forward slash newsletter, our newsletter each week contains not only these stories and further analysis of them, but also all the topics we aren't able to cover each week in this, in the podcast, which is many of them.

[00:53:29] Mike Kaput: Typically. Several, if not dozens, of additional things going on in AI that you need to know about. So the newsletter is perfect for a quick brief each week to get caught up on what's going on in AI. Paul, thanks again.

[00:53:44] Paul Roetzer: Thanks, Mike. And I will encourage everybody to keep an eye on NVIDIA's GTC event this week.

[00:53:50] Paul Roetzer: So it'll have already started by the time you're listening to this, but I expect there's going to be lots of stuff to talk about on the podcast next week. So Keep an eye on that. And Mike, thanks as always for [00:54:00] pulling everything together. Safe travels, back home.

[00:54:03] Mike Kaput: Thank you.

[00:54:05] Thanks for listening to The AI Show. Visit MarketingAIInstitute. com to continue your AI learning journey. And join more than 60, 000 professionals and business leaders who have subscribed to the weekly newsletter, downloaded the AI blueprints, attended virtual and in person events, taken our online AI courses, and engaged in the Slack community.

[00:54:28] Until next time, stay curious and explore AI.

Claire Prudhomme

Claire Prudhomme is the Marketing Manager of Media and Content at the Marketing AI Institute. With a background in content marketing, video production and a deep interest in AI public policy, Claire brings a broad skill set to her role. Claire combines her skills, passion for storytelling, and dedication to lifelong learning to drive the Marketing AI Institute's mission forward.

[The AI Show Episode 134]: DeepSeek Updates, OpenAI’s o3-mini and Deep Research, New AI Copyright Guidelines, OpenAI In Talks to Raise $40 Billion & Your AI Questions Answered

Claire Prudhomme | February 4, 2025

Episode 134 of The AI Show: DeepSeek updates, OpenAI releases o3-mini and research, US Copyright Office shares new AI copyright rules & your AI questions answered.

Podcasts

[The AI Show Episode 84]: OpenAI Releases Sora, Google’s Surprise Launch of Gemini 1.5, and AI Rivals Band Together to Fight Deepfakes

Claire Prudhomme | February 20, 2024

Episode 84 provides insights on OpenAI's Sora for video generation, Google's Gemini 1.5, and tech giants' aim to regulate deepfakes with the C2PA standard.

Podcasts

[The AI Show Episode 136]: Elon Musk Tries to Buy OpenAI, JD Vance’s AI Speech, New GenAI Jobs Study, GPT-4o Update, OpenAI Product Roadmap & Grok 3

Claire Prudhomme | February 18, 2025

Episode 136 of The AI Show: Elon Musk's OpenAI bid, JD Vance’s AI Action Summit speech, the GPT-4o update, xAI drama, and the rise of robotics.

Podcasts

[The AI Show Episode 88]: Meet Devin, the “First AI Software Engineer,” The Rise of Humanoid Robots, and OpenAI’s Sora Interview

Listen Now

Watch the Video

Timestamps

Summary

Links Referenced in the Show

Read the Transcription

[00:03:27] Cognition releases Devin, the first “AI software engineer”

[00:14:52] The Significance Behind Figure’s Humanoid Robots

[00:22:08] OpenAI CTO Questioned on Sora AI Model's Data Sources in WSJ Interview

[00:29:13] European Union’s Artificial Intelligence Act approved by the European Parliament

[00:33:46] Suno AI, music generation based on text prompts

[00:39:04] Grok is now open source

[00:43:26] The Top 100 Consumer GenAI Apps from venture capital firm Andreessen Horowitz

[00:46:58] Apple Inc. in talks to build Google’s Gemini AI into the iPhone

[00:50:06] Midjourney introduces feature to maintain consistency in image creation

Claire Prudhomme

About

Resources

Education

Subscribe to our newsletter for exclusive AI content:

[The AI Show Episode 88]: Meet Devin, the “First AI Software Engineer,” The Rise of Humanoid Robots, and OpenAI’s Sora Interview

Listen Now

Watch the Video

Timestamps

Summary

Links Referenced in the Show

Read the Transcription

[00:03:27] Cognition releases Devin, the first “AI software engineer”

[00:14:52] The Significance Behind Figure’s Humanoid Robots

[00:22:08] OpenAI CTO Questioned on Sora AI Model's Data Sources in WSJ Interview

[00:29:13] European Union’s Artificial Intelligence Act approved by the European Parliament

[00:33:46] Suno AI, music generation based on text prompts

[00:39:04] Grok is now open source

[00:43:26] The Top 100 Consumer GenAI Apps from venture capital firm Andreessen Horowitz

[00:46:58] Apple Inc. in talks to build Google’s Gemini AI into the iPhone

[00:50:06] Midjourney introduces feature to maintain consistency in image creation

Claire Prudhomme

Related Posts

[The AI Show Episode 134]: DeepSeek Updates, OpenAI’s o3-mini and Deep Research, New AI Copyright Guidelines, OpenAI In Talks to Raise $40 Billion & Your AI Questions Answered

[The AI Show Episode 84]: OpenAI Releases Sora, Google’s Surprise Launch of Gemini 1.5, and AI Rivals Band Together to Fight Deepfakes

[The AI Show Episode 136]: Elon Musk Tries to Buy OpenAI, JD Vance’s AI Speech, New GenAI Jobs Study, GPT-4o Update, OpenAI Product Roadmap & Grok 3