Better (AI) Licensing Through Metadata

EXTRA With political pressure mounting over the head-snapping advances in generative artificial intelligence technology, on both sides of the Atlantic, a group of leading AI companies last week unveiled a new industry-led initiative to develop safety and transparency standards for the design and use of generative AI models.

While the new Frontier Model Forum is not primarily intended to address the controversies swirling around intellectual property and AI, some of what is expected to come out of the effort could, at least incidentally, help advance what copyright litigation and agitation have so far failed to achieve, or even articulate: a plausible means by which the the use of copyrighted material to train generative AI systems, and the copyrightability of their output, could be subject to workable licensing regimes. Among those is the expected introduction of a method for identifying and flagging AI-generated works for users.

We obviously don’t yet know the details of what the Forum might propose. But widespread adoption of a machine-readable, industry-standard method for identifying AI-generated works could help address one of the main challenges those works pose to existing copyright licensing regimes, particularly for music streaming rights.

It could, for instance, enable blanket licensees and collective rights management organizations to distinguish AI content from human-created works and treat them differently within their internal systems and operations. In the music business, such a flag might allow digital service providers to filter streams of AI-produced works that lack clear ownership or authorship from their calculations of pro rata royalty payments to labels, publishers, artists and composers, preventing those streams from diluting the royalty pool.

If such flags could be applied selectively to those elements within a work that were produced by AI it might also help clear up some of the ambiguity around a registration with the U.S. Copyright Office of works containing both human and AI elements, and around the licensing of such works to third parties.

Technical measures embedded in content metadata might also help facilitate licensing at the other end of the AI pipeline. In recent hearings before the Senate subcommittee on intellectual property and the U.S. Copyright Office, executives from Adobe described an initiative to adopt a standard “Do Not Train” metadata flag authors and creators could embed with their works. Such a flag would signal internet scrapers not to include those works in AI training datasets, much as the “robots.txt” flag signals search engines not to crawl parts or all of a website.

The “do not train” flag is currently being incorporated into the Adobe-led Content Credentials Initiative that is developing a collection of technologies for sharing content across the web without losing contextual cues, such as who made it and when, and how it was created.

The Content Creation Initiative also includes a method for creators to cryptographically associate their identity with their work such that if the work is reused or repurposed, their identity will travel with it across platforms for purposes of correct attribution.

A similar system for assigning and registering a persistent talent ID to performers, athletes and celebrities is being pioneered by New York-based startup HAND (Human & Digital). HAND IDs are based on a system it calls Citation-backed Notability that involves a proprietary knowledge graph to quantify what makes an individual notable based on a variety of third-party source citations. It can be assigned to natural or legal entity humans, licensed virtual humans, and fictional character depictions such as in video games and virtual environments.

HAND, which is currently available in invitation-only beta, has been designated an official registration agency within the Digital Object Identifier system, founder William Kreth tells me. The registered ID (formally a DOI handle) could be used both to verify unauthorized uses for takedown notices and to provide performers a quantifiable mechanism for authorizing uses.

Those initiatives are all still nascent. Some might require a government mandate, or at least a broad, cross-industry consensus around recognition and compliance with technical measures to become the basis for a workable, comprehensive licensing system. A comprehensive system for licensing an individual’s image, likeness or identity might require congressional action to establish a federal right of publicity, as has been discussed on Capitol Hill, to replace the current state-by-state system to be useful.

What they have in common is that they do not attempt to locate the use case they signify among the exclusive rights reserved to authors by the Copyright Act. Instead, they are targeted at actual operations or functionality of generative AI systems.

One of the enduring truisms in the realm of copyright is that technology is always ahead of the law. From photography and player pianos, to the phonograph, the VCR and the internet, copyright law seems forever to be playing catch up to new devices and technologies for copying, performing or manipulating the work of authors and creators.

In the realm of generative AI, however, that truism is being turned on its head.

The lawsuits that have been filed to date, the manifestos published demanding AI developers obtain licenses for the use of copyrighted works to train generative AI models, have been premised on a conception of AI as simply another technology for copying and manipulating creative works and that the case for liability is already well established. But as discussed here before, that case is not as open-and-shut as some may assume.

Generative AI systems clearly access and extract information from large bodies of copyrighted works. But whether they’re copying or extracting any of the expressive content of those works is a contestable and contested point. And as was evident from the apparent setback suffered recently in court by a trio of artists suing the makers of Stable Diffusion and other AI image generators, it’s hard to establish liability for a use you cannot identify.

Seeking to force the operation of generative AI systems into a copyright paradigm regime before the conceptual foundation or technical means to effectuate such a regime are in place is to put the legal cart before the technical horse.

Get the latest RightsTech news and analysis delivered directly in your inbox every week
We respect your privacy.