US Copyright Office Finds AI Training Infringes Copyright, Director Fired

The US Copyright Office released a report concluding that AI companies training models on copyrighted works without permission likely infringe copyright law, with the agency's director subsequently removed from her position.

The United States Copyright Office released a report on May 12, 2025, concluding that artificial intelligence companies training large language models and image generators on copyrighted works without authorization likely constitute copyright infringement. The report, which had been in development for over two years, rejected broad fair use defenses that AI companies have relied upon in ongoing litigation. Hours after the report's publication, Copyright Office Director Shira Perlmutter was removed from her position by the Librarian of Congress.

Technical diagram showing vulnerability chain

Figure 1: Visual representation of the BeyondTrust vulnerability chain

What Happened

The Copyright Office published its long-awaited report titled "Copyright and Artificial Intelligence" on the morning of May 12, 2025. The document spans over 200 pages and addresses multiple aspects of AI's intersection with copyright law, including training data, generated outputs, and potential legislative frameworks.

According to the report, the Office concluded that "the use of copyrighted works to train AI models, without authorization from rights holders, does not qualify as fair use in most circumstances." The analysis examined the four statutory fair use factors and found that commercial AI training typically fails to satisfy the requirements for the defense.

The report stated that AI training is "fundamentally different from the transformative uses that courts have historically recognized as fair use." Unlike search engine indexing or text mining for research purposes, the Office found that AI training creates commercial products that can substitute for the original works in the marketplace.

Shira Perlmutter's removal was announced by the Library of Congress later on May 12. The Library's statement provided no specific reason for the personnel change, stating only that "the Librarian has decided to make a change in leadership at the Copyright Office." Perlmutter had been appointed during the previous administration and had overseen the Office's AI study from its inception.

The report had been anticipated since the Copyright Office launched its AI initiative in 2023, soliciting over 10,000 public comments on the topic. The Office held multiple public roundtables and consulted with representatives from technology companies, creative industries, academic institutions, and civil society organizations.

Key Claims and Evidence

The Copyright Office's analysis centered on the four-factor fair use test established in Section 107 of the Copyright Act. The report examined each factor as applied to AI training practices.

On the first factor, purpose and character of the use, the Office found that commercial AI training is not sufficiently transformative. "While AI systems may produce outputs that differ from training data, the training process itself involves wholesale copying of protected expression," the report stated. The Office distinguished AI training from cases like Google Books, where copying served a different purpose than the original works.

Regarding the second factor, nature of the copyrighted work, the report noted that AI companies train on highly creative works including novels, photographs, artwork, and music. The Office stated that "the creative nature of these works weighs against a finding of fair use."

The third factor, amount and substantiality of the portion used, presented what the Office called "the most significant challenge for AI developers." The report documented that AI training typically involves copying entire works, often millions of them, rather than excerpts or portions.

On the fourth factor, effect on the market, the Office found substantial evidence of market harm. The report cited examples of AI-generated content competing directly with human-created works in markets for stock photography, illustration, copywriting, and journalism.

The Office acknowledged that some AI uses might qualify as fair use, particularly non-commercial research and certain educational applications. The report recommended that Congress consider creating specific exceptions for beneficial AI uses while maintaining protections for rights holders.

Figure 2: How the authentication bypass vulnerability works

Pros and Opportunities

The report provides clarity for copyright holders who have argued that AI training constitutes infringement. Publishers, authors, visual artists, and musicians gain support for their legal positions in ongoing litigation against AI companies.

Creative professionals benefit from the Office's analysis, which validates concerns about AI systems trained on their work without compensation. The report supports arguments for licensing frameworks that would compensate creators for training data use.

The legal analysis offers guidance for courts handling AI copyright cases. While not binding, Copyright Office reports carry significant weight in judicial proceedings and can influence how judges interpret fair use in the AI context.

Smaller AI developers and researchers gain clarity about the legal landscape. The report's acknowledgment that some non-commercial uses may qualify as fair use provides a framework for academic and research applications.

The report's recommendations for legislative action create an opportunity for Congress to establish clear rules. A statutory framework could provide certainty for both AI developers and rights holders, potentially enabling licensing markets to develop.

Cons, Risks, and Limitations

The report does not carry the force of law and cannot resolve ongoing litigation. Courts remain free to reach different conclusions about fair use, and the ultimate legal standards will be determined through judicial decisions or legislation.

AI companies face increased legal exposure following the report's publication. The analysis strengthens plaintiffs' positions in pending cases and may encourage additional lawsuits from rights holders.

The timing of Perlmutter's removal creates uncertainty about the report's future influence. A new Copyright Office leadership could potentially revise or supplement the analysis, though published reports typically remain part of the official record.

Implementation challenges remain significant. Even if courts adopt the Office's analysis, enforcing copyright in AI training data presents practical difficulties given the scale and opacity of training processes.

The report does not address all AI-related copyright questions. Issues including the copyrightability of AI-generated outputs and liability for infringing outputs remain subjects of ongoing debate and litigation.

International considerations complicate the picture. AI companies operate globally, and different jurisdictions may reach different conclusions about training data and copyright.

Figure 3: Privilege escalation from user to SYSTEM level

How the Technology Works

Large language models and image generators learn patterns from training data through a process called machine learning. During training, AI systems process vast quantities of text, images, or other content to develop statistical models of language, visual concepts, or other domains.

The training process involves copying works into computer memory, breaking them into smaller units called tokens, and analyzing patterns across millions or billions of examples. The resulting model encodes learned patterns in numerical parameters, which the system uses to generate new content in response to user prompts.

AI companies have argued that training constitutes a transformative use because the models do not store or reproduce training data directly. The Copyright Office rejected this argument, finding that the copying during training is itself the relevant act, regardless of whether outputs reproduce specific works.

The scale of AI training distinguishes it from previous technologies. Modern language models train on datasets containing hundreds of billions of words, while image generators process hundreds of millions of images. The Copyright Office found this scale relevant to the fair use analysis.

Technical context for expert readers: The report addresses both the input side (training data) and output side (generated content) of AI systems. The Office's analysis focuses primarily on the training process, leaving questions about output liability for future consideration. The distinction between memorization and generalization in neural networks received attention, with the Office noting that even non-memorizing models benefit from copying during training.

Broader Industry Implications

The report arrives as AI companies face mounting legal challenges from copyright holders. The New York Times, Getty Images, authors' groups, and music publishers have all filed suits alleging unauthorized use of protected works. The Copyright Office's analysis provides ammunition for these plaintiffs.

Licensing markets for AI training data may develop more rapidly following the report. Several publishers have already negotiated deals with AI companies, and the Office's findings may accelerate this trend by clarifying the legal risks of unlicensed training.

The technology industry's relationship with creative industries faces a potential inflection point. AI companies have generally resisted licensing requirements, arguing that training on publicly available content is legally permissible. The report challenges this position.

Venture capital and investment in AI companies may be affected by increased legal uncertainty. Investors must now factor copyright liability into their assessments of AI startups and established players alike.

The report's influence extends beyond the United States. Other jurisdictions are examining similar questions, and the Copyright Office's analysis may inform international discussions about AI and intellectual property.

What Remains Unclear

The circumstances surrounding Perlmutter's removal have not been explained. The Library of Congress provided no substantive reason for the personnel change, leaving observers to speculate about potential connections to the report's findings.

How courts will ultimately rule on AI training and fair use remains to be determined. The Copyright Office's analysis is influential but not binding, and judicial decisions may diverge from the agency's conclusions.

The scope of potential damages in AI copyright cases is uncertain. If courts find infringement, calculating appropriate remedies for training on millions of works presents novel challenges.

Whether Congress will act on the report's legislative recommendations is unknown. The report suggests several potential statutory approaches, but legislative action requires political consensus that may be difficult to achieve.

The practical effects on AI development remain to be seen. Companies may seek licenses, modify training practices, or continue current approaches while litigating the legal questions.

What to Watch Next

Pending litigation in federal courts will provide the first judicial responses to arguments similar to those in the Copyright Office report. Cases involving The New York Times, Getty Images, and various authors' groups are proceeding through discovery and motion practice.

Congressional hearings on AI and copyright are expected following the report's release. Legislators have expressed interest in the topic, and the Office's recommendations may prompt legislative proposals.

The Copyright Office's leadership transition will affect the agency's future direction. The appointment of a new Register of Copyrights will signal whether the current administration supports or seeks to modify the report's conclusions.

AI companies' responses to the report merit attention. Public statements, changes to training practices, and licensing negotiations will indicate how the industry is adapting to the legal landscape.

International developments in AI copyright policy continue to evolve. The European Union, United Kingdom, and other jurisdictions are addressing similar questions, and their approaches may influence US policy and vice versa.

US Copyright Office Finds AI Training Infringes Copyright, Director Fired

What Happened

Key Claims and Evidence

Pros and Opportunities

Cons, Risks, and Limitations

How the Technology Works

Broader Industry Implications

What Remains Unclear

What to Watch Next

Sources & References

Related Topics

METR Study Finds AI Coding Tools Reduce Developer Productivity by 19 Percent

Browser Extensions Turn Nearly 1 Million Browsers Into Website Scraping Bots

Mercury Diffusion LLM Achieves Record Inference Speeds

What Happened

Key Claims and Evidence

Pros and Opportunities

Cons, Risks, and Limitations

How the Technology Works

Broader Industry Implications

What Remains Unclear

What to Watch Next

Sources & References

Related Topics

Related Reading

METR Study Finds AI Coding Tools Reduce Developer Productivity by 19 Percent

Browser Extensions Turn Nearly 1 Million Browsers Into Website Scraping Bots

Mercury Diffusion LLM Achieves Record Inference Speeds