Dear Reader,

On July 13, OpenAI signed a deal with the Associated Press (AP) to license its archive of news articles.

Founded in 1846, the AP archives are a huge trove of accurate and licensed content for OpenAI’s ChatGPT to consume.

The accuracy of AI models like ChatGPT depend on the availability of large and diverse datasets. These datasets allow the AI systems to learn patterns, make predictions, and generate valuable insights.

The more robust and high-quality the data, the better AI models become. Therefore, the success of AI applications hinges upon the value of large datasets.

The timing of the deal is no accident either.

ChatGPT is facing several lawsuits from authors, like Sarah Silverman, Christopher Golden, and Richard Kadrey. And a class action lawsuit claims ChatGPT secretly scraped 300 billion words from the internet, including books, websites, and posts. 

Generative image AI creators like MidJourney and Stability AI are facing similar lawsuits.

These allegations are serious enough that as of July 13, the Federal Trade Commission launched an investigation into OpenAI – the company behind ChatGPT – to determine if the AI violated consumer protection laws by scraping public data.

What’s becoming clear is that the best AI will come from companies with the (legal) access to the biggest and highest quality datasets.

And as investors, that’s important for us to understand. For a few reasons…

Break Things

Move fast and break things. For decades, that’s been the rallying cry of Silicon Valley.

It speaks to the need to be the first to get a product out… even if that means the code isn’t perfect.

Elon Musk even carried it over into the aerospace industry with SpaceX. It’s blown up five rockets to date. But breaking things worked out.

In 2016, SpaceX perfected the science of landing a reusable rocket on a pad.

Since 2020, SpaceX has brought 34 astronauts to space – safely.

Breaking things may work, but breaking the law is a step too far. That’s the situation many AI companies could find themselves in.

The mounting lawsuits against OpenAI and other developers is making enterprises wary of using publicly available AIs.

That’s why Adobe and Shutterstock are offering to protect enterprises from legal troubles.

Adobe and Shutterstock are both offering indemnity clauses that will pay any copyright claims related to works generated by their AI tools.

They can offer this sort of protection because they have rights to a combined 670 million images to train their AIs on.

Data is the lifeblood of AI. And companies that have access to the most high-quality data will be able to win over enterprise clients.

Finding the Winners

Identifying the companies with access to the best and most robust data will be critical in identifying the winners and losers in the AI arms race. Importantly, these companies need to have legal access to this data. Otherwise, AI companies open themselves up to lawsuits like the ones we’re seeing now.

If you’re a subscriber to my Exponential Tech Investor service, be on the lookout for this month’s issue hitting your inbox today. It includes a company operating with exclusive data access in a $512 billion industry.

Even if you aren’t subscribed to any of my paid newsletters, you still can still profit from this opportunity. I’ve previously recommended Adobe (ADBE) to my readers.

Its platforms, like Photoshop and Lightroom, are used by millions of content creators. It recently unveiled Firefly, a generative AI tool that allows users to type out their ideas and let AI do the hard work.

And importantly, the copyright protection from Adobe will go a long way in winning enterprise clients over from start-ups like MidJourney that are facing legal battles.

Regards,

Colin Tedards
Editor, The Bleeding Edge