Sarah Silverman and the novelists sue OpenAI, maker of ChatGPT, for ingesting their books

Ask ChatGPT about comedian Sarah Silverman’s memoir The Bedwetter and the AI ​​chatbot can provide a detailed synopsis of each part of the book.

Does that mean he actually read and stored a pirated copy? Or has he garnered enough customer reviews and online chatter about the bestseller or the musical it inspired that he passes for an expert?

US courts could now help sort it out after Silverman sued ChatGPT maker OpenAI for copyright infringement this week, joining a growing number of writers who say they have unknowingly built the foundation for Silicon Valley red-hot artificial intelligence boom.

FILE - Elon Musk, CEO of Tesla and SpaceX, speaks at the SATELLITE Conference and Exhibition, March 9, 2020, in Washington.  Musk is finally starting to talk about the AI ​​company he founded to compete with OpenAI, maker of ChatGPT.  The startup, xAI, had its formal launch on Wednesday, July 12, 2023 and says its goal is to understand the true nature of the universe.  (AP Photo/Susan Walsh, archive)

Elon Musk is finally starting to talk about the AI ​​company he founded to compete with OpenAI, maker of ChatGPT.

FILE - Miami Mayor Francis Suarez speaks during a news conference on security scheduled for former President Donald Trump, who is expected to appear in federal court on Tuesday, in Miami June 12, 2023. In a 2024 Republican presidential field filled with long shot candidates, Suarez could be, however on paper, the longest shot of all.  No sitting mayor has ever been elected president of the United States, an office that has historically been won by governors, vice presidents, senators, or cabinet secretaries.  That hasn't deterred Suarez, who launched his presidential bid this week by speaking of his unique experience leading the city of some 450,000.  (AP Photo/Wilfredo Lee, archive)

A super PAC supporting Miami Mayor Francis Suarez’s run for the Republican presidential nomination has launched an AI chatbot to answer questions about him.

FILE - Text from the OpenAI website's ChatGPT page is shown in this photo, in New York, Feb. 2, 2023. More than 150 executives are urging the European Union to rethink the world's most comprehensive rules for artificial intelligence.  In an open letter to EU leaders on Friday June 30, 2023, executives say impending regulations will make it harder for companies in Europe to compete with rivals abroad, especially when it comes to the technology behind systems like ChatGPT.  (AP Photo/Richard Drew, archive)

More than 150 executives are urging the European Union to rethink the world’s most comprehensive rules for artificial intelligence.

FILE - Senate Majority Leader Chuck Schumer of New York speaks to the media, June 13, 2023, on Capitol Hill in Washington.  The development of artificial intelligence is a time of revolution that requires swift action by the government.  That's according to Senate Majority Leader Chuck Schumer, who said Wednesday that he's working on ambitious bipartisan legislation addressing AI.  (AP Photo/Jacquelyn Martin, archive)

Senate Majority Leader Chuck Schumer says the development of artificial intelligence is a game changer that requires swift action by government.

Silverman’s lawsuit claims he never gave OpenAI permission to ingest the digital version of his 2010 book to train his AI models, and it was likely stolen from a shadow library of pirated works. He says the memoir was copied without consent, without credit, and without compensation.

It is one of a growing number of cases that could breach OpenAI and its rivals’ secrecy about the increasingly widely used valuable data used to train generative artificial intelligence products that create new texts, images and music. And it raises questions about the ethical and legal justification of the tools that McKinsey Global Institute projects will add the equivalent of $2.6 trillion to $4.4 trillion to the global economy.

This is an open, dirty secret of the entire machine learning industry, said Matthew Butterick, one of the attorneys representing Silverman and other authors in the search for a class action case. They love book data and get it from these illicit sites. We were booing the entire practice.

OpenAI declined to comment on the allegations. Another lawsuit by Silverman makes similar claims about an AI model created by Facebook and Instagram parent company Meta, which he also declined to comment on.

It might be tough for writers to win, especially after the success of Google in fending off legal challenges to his online book library. The US Supreme Court in 2016 upheld lower court rulings that rejected the authors’ claim that Google’s digitization of millions of books and displaying small portions of them to the public is a violation of copyright on an epic scale.

I think what OpenAI has done with books is awfully close to what Google has been allowed to do with its Google Books project and therefore will be legal, said Deven Desai, an associate professor of law and ethics at the Georgia Institute of Technology .

While only a handful have sued, including Silverman and bestselling writer Mona Awad and Paolo Tremblayconcerns about the tech industry’s AI building practices have gained traction in the literature and community of artists.

Other prominent authors including Nora Roberts, Margaret Atwood, Louise Erdrich and Jodi Picoult signed a letter late last month to the CEOs of OpenAI, Google, Microsoft, Meta and other AI developers accusing them of exploitative practices in building of chatbots that mimic and regurgitate their language, style and ideas.

Millions of copyrighted books, articles, essays and poems provide food for AI systems, endless meals for which no bill has been filed, says the open letter organized by the Guild of Authors and signed by more than 4,000 writers. You are spending billions of dollars developing AI technology. It’s only fair that you compensate us for using our writings, without which AI would be trivial and extremely limited.

The AI ​​systems powering popular products like ChatGPT, Google’s Bard, and Microsoft’s Bing chatbot are known as large language models that learned by analyzing and harvesting patterns from a large body of ingested text. They have amazed audiences with their strong command of human speech, although they are also known for their tendency to spew falsehoods.

While models have also been trained on news articles and social media feeds, books are especially valuable, as OpenAI acknowledged in a 2018 paper cited in Silverman’s lawsuit.

The first release of OpenAI’s large language model, known as GPT-1, was based on a dataset compiled by university researchers called the Toronto Book Corpus that included thousands of unpublished books, some in the adventure, fantasy and romance genres.

Crucially, it contains long stretches of contiguous text, which allows the generative model to learn to condition information at long range, OpenAI researchers said at the time. Other tech companies like Google and Amazon have also relied on the same data, which is no longer available in its original form.

But since then, OpenAI and other leading AI developers have grown more secretive about their data sources, even as they’ve ingested even greater amounts of written work. Butterick said that circumstantial evidence points to the use of so-called shadow libraries of pirated content that contained the works of Silverman and other plaintiffs.

It matters to their role models because books are the best source of long, well-edited, coherent writing, she said. Basically you can’t have a high quality language model unless you have books in your training data.

It could be weeks or months before a formal response is due from OpenAI. But once the case moves forward, tech executives may have to testify, under oath, about which book sources they downloaded.

As far as we know, the other party has not denied it, said Joseph Saveri, another of Silverman’s attorneys. They don’t have an alternative explanation for this.

Saveri said the authors aren’t necessarily asking tech companies to throw away their algorithms and training data and start over, although the US Federal Trade Commission has set a precedent for forcing companies to destroy AI data. obtained illegally. But a way to compensate writers is needed, he said.

#Sarah #Silverman #novelists #sue #OpenAI #maker #ChatGPT #ingesting #books
Image Source :

Leave a Comment