New ISO Standard Points to AI Opt-Out

Last month, the International Standards Organization (ISO) gave final approval to a new, open technical standard for a machine-readable content identifier that could provide creators and rights owners with a powerful new tool to regulate the use of their works in a variety of contexts.

Unlike other product or works identifiers, such as the music industry’s ISRC and ISWC standards that are typically assigned to a work or file by an outside authority or industry body, the new International Standard Content Code (ISCC) is algorithmically derived from the media file itself, and can be used for any type of digital media content, from text to music to images.

“You don’t need to embed anything, or rename the file or do anything because anyone can apply the same algorithm to generate the same code. It’s an open source standard,” the ISCC’s project lead, Sebastian Posth, told RightsTech.

The code is also independent of any particular use case, making it highly flexible. There is no central registry or database of ISCC codes. The open standard can be used by anyone for whatever use case they want to enable.

Industries without standardized work identifiers, such as the publishing business, for instance, could leverage ISCCs to establish a works registry, along with associated rights information.

“It’s very important to distinguish the ISCC from any kind of use case,” Posth said. Any other application or operation would be downstream from generating the code itself.

There is a reference implementation, maintained by the new ISCC Foundation, to show companies how to write software to implement the standard. But “now it’s up to, let’s say software companies, startups. established product management software companies to implement that and generate the codes whenever they see that it makes sense,” Posth said.

Once a code is generated, other use-case specific data can be added without affecting the code itself. For instance, metadata related to the content of the file, rights information, or the content’s C2PA credentials could be added to the code and cryptographically signed, or could be maintained in a separate rights or metadata registry.

Anyone wanting to use the content, then, could run the file through the ISCC algorithm and derive the metadata and rights information, or look up the information in a registry by searching for a matching ISCC code.

Critically, the ISCC standard also relies on a perceptual hash of the media file’s content to generate the code, rather than a cryptographic hash of the file itself. That makes the code resistant to changes to a file, such as the resizing or compressing of an image file or slowing down or speeding up a music recording.

Anyone running the file through the ISCC algorithm after the file has been change would still generate the same or very similar code.

“It doesn’t need to be an exact match because it cannot be,” Posth explains. “If you publish an image on Twitter (X), the metadata gets stripped, it gets compressed and changed. But it will still be a match with the ISCC because the perceptual content can change up to a certain degree and we can still find it. “

Posth himself heads up Leiden, Netherlands-based Liccium, which is leveraging ISCCs to enable creators and rights owners opt out of having their works used to train AI models.

Under EU rules, the use of content for text-and-datamining (TDM) applications, including AI training, is permitted by default. To exclude their content, rights owners must make an affirmative assertion of their intention to reserve their rights.

Liccium’s protocol, called TDM-AI, includes a “declaration engine,” which generates a standard opt-out statement and can be associated with a work’s ISCC. Those statements would be maintained in a registry by Liccium, along with their ISCC codes.

AI companies would then be able to run their training data through the ISCC algorithm during the ingestion process and search the Liccium registry for matches via API.

Posth first proposed the ISCC standard to the ISO for approval in 2019. What followed was 2-plus years of demonstrations and testing by ISO technical experts. Once they were satisfied the algorithm would work, it took another 2-plus years of drafting and reviewing the technical documentation before the standard could be published.

“It has been a long road, sometimes a difficult one,” Posth told me last week, his voice still hoarse from meetings and conferences to promote the new standard.

RightsTech and the Wharton School are currently seeking sponsors for a conference later this year in New York on content identification, authentication and provenance, where the ISCC Foundation would present possible applications of the new standard. Anyone interested in helping support the conference can message me here, or via my email [email protected].

Get the latest RightsTech news and analysis delivered directly in your inbox every week
We respect your privacy.