Copyright and AI: Where’s the Harm?

Berkley law professor Pamela Samuelson has ruffled more than a few feathers among creators and rights owners over the years. In her role as co-founder and chair of the Authors Alliance, her seats on the boards of the Electronic Frontier Foundation and Public Knowledge, and in spearheading the American Law Institute’s controversial restatement of copyright law, she has been a high-profile and vocal skeptic of expansive views of copyright protections, particularly in the realm of digital platforms and technologies.

In her keynote comments to last week’s RightsTech AI Summit in a rain and wind-battered Los Angeles, however, the Richard M. Sherman Distinguished Professor of Law and Information doffed her advocate’s hat and assumed her professorial guise to offer an overview of how courts historically have analyzed copyright law’s encounters with disruptive new technologies, and how they’re likely to handle new questions raised by generative AI.

It was a sobering lesson for the plaintiffs and their supporters in the majority of the 16 infringement lawsuits brought so far by artists and rights owners against generative AI developers.

The biggest challenge plaintiffs will face, according to Samuelson, will be establishing actual — as opposed to speculative or potential — harm from the use of their works to train AI models, even if they can establish that their works were copied in the process of that training.

“Just making copies is not enough to say that there is harm,” she told moderator Sophie Goossens, partner in Reed Smith’s Entertainment & Media Group. “There has to be some actual harm, or the likelihood of [actual] harm to an existing or likely to develop market [for the works]. It is not enough to say that you were going to license this stuff. [Courts] need to see some evidence that the market was harmed.”

Samuelson also emphasized the distinction courts have made between the expressive elements of the works being copied, and the end product of that copying, as in the two Google Books cases, in which the federal 2nd Circuit Court of Appeals found Google’s wholesale copying of millions of books to create a searchable index to be a fair use under copyright law.

“In the case of Google Books the court found that the computational use of the data was very different,” from the market for the works themselves, Samuelson said. That distinction proved dispositive for a finding of transformational fair use.

“Once courts have found that something is transformative, it has tended to have a kind of spillover effect on the other [fair use] factors,” she added. “The courts tell us we’re supposed to weigh all the factors [spelled out in Section 107 of the Copyright Act] in relation to one another. But in almost all the cases, the market effects, the fourth factor, is the most important and has tended to tip the scales one way or another.”

Samuelson thinks a similar dynamic is likely to play out in cases brought against AI developers.

“In the case of [AI] training data it is important to understand that the training data is a distinct object from the model, or the software used” to create the model, she said. As with Google Books, a generative AI model and its output are the end results of the training process, they do not themselves perform the training.

That doesn’t mean that all claims of fair use by AI companies will succeed, however.

“All fair use cases are decided on their facts,” Samuelson noted. “While there are a lot of commonalities among the cases there are also differences. I would say the NY Times lawsuit against OpenAI, the complaint there is the most powerful of the complaints in the cases filed so far and does the best job to try to show that the use of the work as training in fact is not fair use.”

The fact that the Times had an existing licensing business for access to its archive for text-and-data mining purposes, and in fact was in active negotiations with OpenAI over such a license, could argue against a finding of fair use, she suggested.

“Most of the [other] lawsuits are complaining about the use of the works as training data where all of the training data is all of the stuff that’s available on the internet,” Samuelson said. “If the works are available on the the internet, that will affect the fair use defense. From the standpoint of the plaintiffs, I get it, they copied my stuff and they didn’t pay me for it or ask me for my consent and that’s not fair. I’m also willing to license it and therefore there’s harm to my market. [But] I think, again, Getty Images and the NY Times, and also Universal Music all have active licensing programs and they will have a better chance than will some of the other plaintiffs to succeed.”

Samuelson expressed skepticism, however, that collective licensing schemes will be able to bootstrap a solution for creators and rights owners in the absence of an existing direct licensing business.

“I know there is interest among many of the people attending this event in collective licensing, but that is something I think congress would have to develop,” she said. “I understand the impetus behind it and it seems like a compromise to a very difficult situation… but it has to be something actually feasible to do and it is in the feasibility that I have my greatest doubts about collective licensing.

“Past collective licensing systems have had a particular purpose: this is a particular type of exploitation of a work that should be allowed to go forward but with compensation,” Samuelson added. “But that’s only certain types of works and certain uses of that exploit the expressive aspects of the work, and training data doesn’t do that… The NY Times and Getty Images have a better case, because both of those companies have active licensing programs… So its possible both of those cases will settle. But its going to be years before we know what is going to happen with those cases. In the meantime it’s a good idea to explore [different possible solutions], but I just don’t think collective licensing is feasible.”

Courts, too, are generally reluctant to assume feasibility without hard evidence.

“I think that it is good when there are voluntary licenses like Open and some other [AI] companies are able to negotiate, I’m completely supportive of that,” Samuelson said. “But a collective licensing scheme for all of the works? I just think it’s impossible. And if it’s impossible that’s going to end up tipping in favor of fair use.”

I’m grateful to our partners at Reed Smith for helping organize the RightsTech AI Summit, and to UCLA’s Anderson School of Management for hosting the event.

Get the latest RightsTech news and analysis delivered directly in your inbox every week
We respect your privacy.