The Copyright Office Digs In On AI Licensing and Liability: An Annotated Guide

The U.S. Copyright Office on Wednesday (August 30) launched the next phase of its Artificial Intelligence Initiative, issuing a formal Notice of Inquiry (NOI) inviting public comments on the legal and policy implications of generative AI technology for copyright law and markets for creative works. The comments will inform the Office’s eventual recommendations to congress regarding possible legislative responses to the AI’s rise and rapid growth.

The Office seeks input on four broad areas of interest:

  • The use of copyrighted works to train generative AI models;
  • The copyrightability of material generative by AI systems;
  • The potential liability for infringement by AI generated works
  • The treatment of generative AI outputs that imitate the identity or style of human artists.

While the headline debates around whether the use of copyrighted works in training requires licenses and the locus of authorship in AI generated works are likely to attract the most attention from commenters, the Office has done a commendable job in the framing of its questions to encourage commenters to put some real flesh on the bones of their talking points. In the four listening sessions the Office held in advance of the NOI, for instance, it was striking that calls for mandatory licensing and compensation for the use of copyrighted works in AI training were rarely accompanied by any explanation of how such a system should or could work. Where they were, the question was often waved away with blithe assurances that existing licensing regimes were up to the task.

The Copyright Office is asking for details.

Here’s our annotated guide to some of the grittier questions raised in the NOI that directly bear on the mechanics of how any eventual licensing arrangements (or liability) for generative AI input and/or output might work.

On training

Origin of training data sets: The NOI seeks input on how the data sets used to train AI models are obtained by developers. The infringement lawsuits that have been filed so far, against OpenAI, StabilityAI, Midjourney and others, have (among other things) targeted the developers of AI models for alleged unauthorized reproduction of copyrighted works for use in training. In many cases, however, much of the data used in that training was obtained by developers from third parties, such as Common Crawl or LAION. Many developers also outsource the labeling and preparing of the data to third parties, such as Scale and its subsidiary, Remotasks. Insofar as reproduction is occurring, it’s likely happening first in the collection and preparation of the data before it’s ever fed into an AI system.

The Copyright Office appears to be trying to distinguish between the potential liability of AI developers, and that of AI data providers, which could affect who would be subject to any mandatory licensing regime:

6. What kinds of copyright-protected training materials are used to train AI models, and how are those materials collected and curated?

6.1. How or where do developers of AI models acquire the materials or datasets that their models are trained on? To what extent is training material first collected by third-party entities (such as academic researchers or private companies)?…

11. What legal, technical or practical issues might there be with respect to obtaining appropriate licenses for training? Who, if anyone, should be responsible for securing them (for example when the curator of a training dataset, the developer who trains an AI model, and the company employing that model in an AI system are different entities and may have different commercial or noncommercial roles)?

How does AI training really work? The Office is asking commenters to put their cards on the table regarding their knowledge or understanding of what actually is happening during training:

7. To the extent that it informs your views, please briefly describe your personal knowledge of the process by which AI models are trained. The Office is particularly interested in:

7.1. How are training materials used and/or reproduced when training an AI model? Please include your
understanding of the nature and duration of any reproduction of works that occur during the training process, as well as your views on the extent to which these activities implicate the exclusive rights of copyright owners.

7.2. How are inferences gained from the training process stored or represented within an AI model?

7.3. Is it possible for an AI model to ‘‘unlearn’’ inferences it gained from training on a particular piece of training material? If so, is it economically feasible? In addition to retraining a model, are there other ways to ‘‘unlearn’’ inferences from training?

Let’s hear your ideas for a licensing system: The Office seems to recognize that a meaningful licensing system could be harder to stand up than rights owners generally acknowledge. But it wants to hear how they think it could work:

10. If copyright owners’ consent is required to train generative AI models, how can or should licenses be obtained?

10.1. Is direct voluntary licensing feasible in some or all creative sectors?

10.2. Is a voluntary collective licensing scheme a feasible or desirable approach? Are there existing collective management organizations that are well-suited to provide those licenses, and are there legal or other impediments that would prevent those organizations from performing this role? Should Congress consider statutory or other changes, such as an antitrust exception, to facilitate negotiation of collective licenses?

10.3. Should Congress consider establishing a compulsory licensing regime? If so, what should such a
regime look like? What activities should the license cover, what works would be subject to the license, and would copyright owners have the ability to opt out? How should royalty rates and terms be set, allocated, reported and distributed?

On AI generated output

Who wrote that? In February, the Copyright Office partially rescinded a registration it had previously accepted for the graphic novel, “Zarya at Dawn,” after learning that the artwork was produced using the Midjourney image generator. Partly in response to the ensuing controversy, the Office in March issued updated guidance for registering works containing AI generated material. While the updated guidance was generally welcomed in both copyright and AI circles, most acknowledged that the update’s proposed reliance on a case-by-case analysis of works containing both human and AI produced elements would quickly become impractical, given the sheer volume of such hybrid works likely to be submitted. In this week’s NOI the Office notes that it is working on an update to the update, obliquely acknowledging it could have made the wrong call on “Zarya,” but it wants to know where folks think it should set the dial for how much and what kind of human input is necessary for a work to be eligible for copyright:

18. Under copyright law, are there circumstances when a human using a generative AI system should be
considered the ‘‘author’’ of material produced by the system? If so, what factors are relevant to that
determination? For example, is selecting what material an AI model is trained on and/or providing an iterative series of text commands or prompts sufficient to claim authorship of the resulting output?

19. Are any revisions to the Copyright Act necessary to clarify the human authorship requirement or to provide additional standards to determine when content including AI-generated material is subject to copyright protection?

Whodunnit? As with its distinguishing between AI developers and AI data vendors, the Office wants to know who should be held liable in the event an AI generated work is found to infringe the copyright of a human author or artist.

25. If AI-generated material is found to infringe a copyrighted work, who should be directly or secondarily
liable—the developer of a generative AI model, the developer of the system incorporating that model, end users of the system, or other parties?…

27. Please describe any other issues that you believe policymakers should consider with respect to potential copyright liability based on AI generated output.

Beyond Copyright

Time for a federal publicity right? Finally, and intriguingly, the NOI broaches the increasingly heard question of whether federal copyright law (and the Constitution’s copyright clause) can provide creators sufficient protection against all the potential ravages of generative AI, such as the misappropriation of an author’s or performer’s name and likeness, or the copying of an artist’s creative style. If not, are new or additional federal protections called for:

30. What legal rights, if any, currently apply to AI-generated material that features the name or likeness, including vocal likeness, of a particular person?

31. Should Congress establish a new federal right, similar to state law rights of publicity, that would apply to AI generated material? If so, should it preempt state laws or set a ceiling or floor for state law protections? What should be the contours of such a right?

32. Are there or should there be protections against an AI system generating outputs that imitate the
artistic style of a human creator (such as an AI system producing visual works ‘‘in the style of’’ a specific artist)? Who should be eligible for such protection? What form should it take?

Comments are due by October 18; written reply comments are due by November 15.

Get the latest RightsTech news and analysis delivered directly in your inbox every week
We respect your privacy.