In the AI ​​copyright case, Zuckerberg turns to YouTube to defend it

In the AI ​​copyright case, Zuckerberg turns to YouTube to defend it

Meta CEO Mark Zuckerberg appears to have used YouTube and his fight to remove pirated content to defend his company’s use of a dataset containing copyrighted e-books to train artificial intelligence models, newly released excerpts of his testimony revealed.

The filing, which was part of a complaint filed with the court by the plaintiffs’ lawyers, is related to Amnesty International’s copyright case. Fate vs. Meta. It is one of many similar cases going through the US court system that pits AI companies against authors and other intellectual property holders. Most often, the defendants in these cases — AI companies — claim that training on copyrighted content is “fair use.” Many copyright holders disagree.

“For example, I think YouTube might end up hosting some of the things that people pinch for a while, but YouTube tries to remove those things,” Zuckerberg said during his testimony. Parts of text It became available Wednesday night. “And I assume the vast majority of stuff on YouTube is kind of good, and they have the license to do it.”

Excerpts from Zuckerberg’s testimony provide some clues about Zuckerberg’s thinking on copyright content and fair use. However, it should be noted that the full text of the testimony has not been published. TechCrunch has reached out to Meta for additional context and will update the article if the company responds.

Based on the deposition fragments, Zuckerberg appears to be advocating Meta’s use of a training dataset from e-books called LibGen to develop its own family of AI models known as Llama. Meta’s Llama competes with leading models from AI companies like OpenAI.

LibGen, which describes itself as a “link aggregator,” provides access to copyrighted works from publishers including Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. LibGen has been sued several times, ordered shut down, and fined tens of millions of dollars for copyright infringement.

According to court filings unsealed this week, Zuckerberg allegedly allowed LibGen to be used to train at least one of Meta’s Llama models despite concerns within the company’s executive AI and research teams about the legal implications.

Attorneys for the plaintiffs, who include best-selling authors Sarah Silverman and Ta-Nehisi Coates, quoted Meta staffers referring to LibGen as “a dataset we know to be pirated” and noting that its use “may undermine… [Meta’s] Negotiating position with regulators,” according to Legal deposit,

During his testimony, Zuckerberg claimed he had “never really heard of” LibGen.

“I realize you’re trying to get me to have an opinion on LibGen, which I haven’t really heard of,” Zuckerberg said during the deposition. “It’s just that I don’t have knowledge of this specific thing.”

Under questioning from one of the plaintiffs’ lawyers, David Boies, Zuckerberg explained why it would be unreasonable to ban the use of a dataset like LibGen.

“So would I want to have a policy against people using YouTube because some content might be copyrighted? No,” he said.[T]“There are cases where imposing such a blanket ban may not be the right thing to do.”

Zuckerberg stated that Meta should be “very careful about” training on copyrighted material.

“You know, [if there’s] “Someone putting out a website and intentionally trying to violate people’s rights… that’s obviously something we want to be cautious about or cautious about how we interact with it or maybe even prevent our teams from engaging with it,” Zuckerberg said during a press conference. His testimony, according to the record.

New allegations

Plaintiffs’ attorneys in Kadri v. Meta have amended the complaint several times since it was filed in the U.S. District Court for the Northern District of California, San Francisco Division in 2023. The latest amended complaint filed by plaintiffs’ attorneys late Wednesday contains new allegations against Meta, including: This is because the company has revised some of the pirated books in LibGen with copyrighted books available for licensing. The lawyers claim that Meta used this tactic to determine whether it made sense to pursue a licensing agreement with the publisher.

Meta allegedly used LibGen to train the latest family of Llama models, Llama 3, according to the edited file. The plaintiffs also allege that Meta used the dataset to train next-generation Llama 4 models.

According to the edited file, Meta researchers allegedly tried to hide the fact that the llama models were trained on copyrighted material by inserting “supervised samples” into the llama’s fine-tuning. Meta also downloaded pirated e-books from another source, Z-Library, for Llama Training in April 2024, the amended complaint alleges.

Z-Library, or Z-Lib, has been the subject of a number of legal actions brought by publishers, including domain seizures and takedowns. In 2022, Russian citizens who allegedly held it were charged with copyright infringement, wire fraud and money laundering.

Leave a Comment

Your email address will not be published. Required fields are marked *