OpenAI will get access to the AP’s archive of text stories going back to 1985, the news organization said in a statement. On top of licensing fees, the AP will also get access to OpenAI’s technology to use in experiments for deciding how it might improve its journalism.
The news organization has used automation to produce some local sports reporting and financial earnings reports for years. The AP does not use “generative” tech — chatbots like ChatGPT — to write stories, the news organization said.
OpenAI, Google and other AI companies have used billions of sentences pulled off the open internet to build the “large language models” that power their chatbots. News stories, Wikipedia articles, social media comments and blog posts have all gone into the models without getting permission from their owners, with the tech companies generally arguing they’re free to use the public data.
A Washington Post analysis of a database of websites that was used to train one of OpenAI’s older AI models showed that the AP’s main news website was the 68th-most cited website in the database.
Now, a rising group of authors, musicians, news organizations and social media companies has been pushing back, arguing that the use of their content to train AI is a massive shift in the way the internet works, especially since some of the AI tools being trained on human-made content are already being used to replace human workers. A wave of lawsuits has washed over the industry in the past two weeks alleging improper data use, including class-action suits against OpenAI and Google, and lawsuits against OpenAI from the comedian Sarah Silverman and two prominent fiction authors.
Tech companies have paid directly for news content in the past. Google and Facebook both pay news sites for direct access to their content to display on their platforms in some countries. In Australia, the government passed a law requiring the practice, and a similar law is about to go in effect in Canada.