AI video generation company Runway has found itself in hot water after an internal spreadsheet leak revealed the company is using YouTube videos for AI training.
The document, obtained by 404 Media, shows that Runway trained its new Gen-3 model by scraping thousands of videos without permission from various sources, including popular YouTube creators, brands, and even pirated films.
This revelation has sparked a heated debate about the ethics and legality of AI training data collection.
Runway, a major player in the AI video space with a $1.5 billion valuation and backing from tech giants like Google and NVIDIA, isn't alone in this practice. Recent reports have implicated other tech leaders like Apple, Salesforce, and Anthropic in similar unauthorized use of YouTube videos for AI training.
What does this mean for the responsible use and adoption of AI?
I got the answer from Marketing AI Institute founder and CEO Paul Roetzer on Episode 107 of The Artificial Intelligence Show.
This situation is “the uncomfortable part” of AI development, says Roetzer. While users love tools like Runway, the methods used to create them are raising serious ethical concerns.
"This isn't even the ugly stuff. This is table stakes. Everyone is doing this."
The process, as detailed in the leaked document, involves searching for specific types of content on platforms like YouTube, then using these copyrighted videos to train AI models.
This allows the AI to replicate highly specific styles or techniques, sometimes even mimicking individual content creators.
“This is how this works. These models have to learn from something, so they learn from human creativity,” says Roetzer.
The legal landscape surrounding these practices is murky at best.
Sharon Torek, an IP attorney, suggests in response to one of Roetzer’s LinkedIn posts that big AI companies are "rolling the dice" that they'll be too big to fail by the time any high court rules on the legality of their training practices, writing:
“‘Big AI’ is rolling the dice that they will be too big to fail by the time a high court determines that their use of copyrighted works to train their models is unlawful, if one does.
To me the lack of transparency about their training practices is telling . Of course they’re privately owned entities who are protecting their assets and shareholders. And slowing down isn’t a practical option for them.
My view is that they’re moving forward until IP owners will probably have to make licensing deals with them to protect and monetize their works.”
As the AI industry grapples with these ethical and legal challenges, a few potential outcomes emerge:
This uncomfortable state of affairs could also lead to some uncomfortable solutions.
“We all know it’s unethical,” says Roetzer. “We all know it’s probably illegal. But at the end of the day, it's going to be a better business decision for these big companies to just accept that it was done and this is what happens and just make a deal and try and make money in the process rather than trying to sue them and prove damages.”