...

Meta Allegedly Used Unlicensed Books to Train AI Models, According to Email Evidence

Facepalm: A group of authors has initiated legal action against Meta, alleging that the company utilized unauthorized copies of their books to train its generative AI models. Although Meta has refuted any misconduct, newly revealed messages indicate that executives and engineers were fully aware of their actions and the potential copyright infringements involved.

The lawsuit submitted by Sarah Silverman, Richard Kadrey, and other writers and rights holders against Meta is potentially entering a pivotal stage. The authors have secured internal company emails where Meta employees candidly discussed “torrenting” well-known archives of pirated content to enhance AI model training.

Meta acknowledged previously the use of certain contentious datasets, asserting that such actions should be deemed fair use. The company also admitted to downloading an extensive dataset known as “LibGen,” containing millions of pirated books. However, the newly disclosed emails express broader concerns within Meta regarding the acquisition and distribution of this data via the BitTorrent network.

Emails show that Meta downloaded and shared at least 81.7 terabytes of data across numerous contested datasets, which include 35.7 terabytes from Z-Library and LibGen archives. The plaintiffs assert that Meta undertook an “astonishing” torrenting strategy, distributing pirated books on a previously unseen scale.

In an April 2023 correspondence, Meta researcher Nikolay Bashlykov wrote, “torrenting from a corporate laptop doesn’t feel right.” The message concluded with a smiling emoji; however, a few months later, his perspective changed considerably.

By September 2023, Bashlykov mentioned consulting Meta’s legal team as using torrents – hence “seeding” terabytes of pirated data – was visibly “not OK” from a legal viewpoint.

Meta was seemingly aware of its engineers’ engagement in illegal torrenting to train AI models, with Mark Zuckerberg allegedly being informed about LibGen. To conceal this activity, the company attempted to obfuscate its torrenting and seeding actions by using servers external to Facebook’s primary network. In another internal message, Meta employee Frank Zhang described this tactic as operating in “stealth mode.”

Like other major technology companies, Meta is allocating significant funds towards AI progression and generative AI services. The company, aiming to populate its aging social networking platforms with AI-generated personas and bots, has recently submitted a motion to dismiss the lawsuit spearheaded by Silverman and other authors. However, the disclosed emails outlining Meta’s involvement in torrenting and distributing pirated books could notably complicate its legal defense.

Scroll to Top
Seraphinite AcceleratorOptimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.