Authors Charge Meta with Utilizing 81.7TB of Stolen Books for AI Training

Authors Charge Meta with Utilizing 81.7TB of Stolen Books for AI Training

Authors Charge Meta with Utilizing 81.7TB of Stolen Books for AI Training

# Meta’s Supposed Torrenting of Pirated Books Complicates Copyright Litigation

## Introduction

Meta, the parent organization of Facebook, is confronted with serious accusations in an ongoing copyright lawsuit. Recently unsealed emails imply that Meta not only downloaded but also seeded pirated books from clandestine libraries like LibGen and Z-Library. These disclosures might considerably influence the legal confrontation between Meta and authors who contend that their copyright-protected works were utilized without authorization to train Meta’s AI systems.

## The Accusations Against Meta

The dispute arose when authors filed a lawsuit against Meta for purportedly using pirated books to train its artificial intelligence systems. As per court documents, Meta acknowledged torrenting a substantial dataset from LibGen, a notorious archive of pirated books. However, the full scope of Meta’s involvement remained ambiguous until recently unsealed communications shed light on the situation.

The emails indicate that Meta torrented a minimum of **81.7 terabytes of data** from various clandestine libraries, encompassing **35.7 terabytes from Z-Library and LibGen**. Furthermore, Meta had previously torrented **80.6 terabytes** solely from LibGen. These findings were introduced in a court submission by the plaintiffs, who argue that Meta’s conduct constitutes large-scale copyright infringement.

## The Legal Consequences of Seeding

A critical element of the case is the allegation that Meta not only downloaded but also **seeded** the pirated books. In the context of torrenting, “seeding” refers to sharing downloaded files with others, effectively disseminating copyrighted material.

Some of Meta’s own employees expressed concerns regarding the legality of their actions. In an April 2023 email, Meta research engineer **Nikolay Bashlykov** conveyed his unease, stating:

> “Torrenting from a corporate laptop doesn’t feel right 😊.”

Later, in September 2023, Bashlykov escalated his concerns to Meta’s legal division, cautioning that:

> “Using torrents would entail ‘seeding’ the files—i.e., sharing the content outside, this could be legally not OK.”

In spite of these alerts, Meta reportedly persisted with its torrenting practices, raising doubts about whether the company knowingly partook in copyright violations.

## Meta’s Supposed Attempts to Conceal Seeding

Court documents indicate that Meta sought to **conceal its seeding actions** to evade scrutiny. Internal communications from Meta researcher **Frank Zhang** imply that the company intentionally avoided using Facebook servers to sidestep tracking. Zhang characterized the operation as functioning in **”stealth mode.”**

Moreover, Meta executive **Michael Clark** acknowledged in a deposition that the company altered torrent configurations to reduce seeding. These actions suggest that Meta was cognizant of the possible legal ramifications and took measures to obscure its participation.

## The Effect on the Copyright Case

The newly emerged evidence has strengthened the authors’ stance to assert that Meta engaged in **direct copyright infringement**. Previously, the case centered on whether Meta’s AI systems unlawfully replicated copyrighted materials. Now, plaintiffs can also assert that Meta illicitly circulated the books via torrenting.

Meta has maintained that its use of LibGen data falls under **”fair use,”** a legal principle that permits limited usage of copyrighted materials without consent. However, the company’s alleged seeding activities could undermine this defense, considering that distribution of copyrighted works is typically not shielded by fair use.

## Meta’s Reaction and Legal Approach

Meta has not yet made any public statement regarding the newly unsealed emails. Nevertheless, in a **motion to dismiss** submitted last month, the company contended that the plaintiffs failed to provide proof that any particular book was downloaded by a third party from Meta’s torrents.

In light of this argument, the plaintiffs are now seeking further depositions of Meta employees involved in the decision to utilize LibGen. They assert that the new evidence contradicts earlier testimony, including statements from **Meta CEO Mark Zuckerberg**, who denied any role in the decision to use pirated books.

## Conclusion

The disclosure that Meta allegedly torrented and seeded pirated books has introduced an additional layer of complexity to the copyright lawsuit. If the plaintiffs succeed in demonstrating that Meta actively distributed copyrighted works, the company could encounter significant legal repercussions.

As the case progresses, the tech sector will be closely observing developments. The resolution could establish a precedent for how AI firms navigate copyrighted materials in training datasets, potentially altering the legal landscape for artificial intelligence and intellectual property rights.