Headlines

Ziff Davis study says AI firms rely on publisher data to train models

Leading AI companies rely more on content from premium publishers to train their large language models (LLMs) than they publicly admit, according to new research from executives at Ziff Davis. While AI firms generally do not say exactly what data they use for training, executives from Ziff Davis say their analysis of publicly available datasets makes it clear that AI firms rely disproportionately on commercial publishers of news and media websites to train their LLMs.

Source: Ziff Davis study says AI firms rely on publisher data to train models

‘Millions’ of NYT and NY Daily News stories taken by OpenAI for training data

Millions of stories published by sites including The New York Times and The New York Daily News have been found in three weeks of searching OpenAI’s training dataset. The news publishers are currently trawling through data to find instances of their copyrighted work being used to train OpenAI’s models – but they say the tech company should be forced to provide the information itself.

Source: ‘Millions’ of NYT and NY Daily News stories taken by OpenAI for training data

News organisations are forced to accept Google AI crawlers, says FT policy chief

News sites don’t have a “genuine choice” about whether to block Google AI crawlers from scraping their content, a publisher has warned. Matt Rogerson, director of global public policy and platform strategy at the FT and former Guardian Media Group director of public policy, argued that Google’s “social contract” with publishers – through which it provided value to the industry by sending traffic to their sites – has been broken.

Source: News organisations are forced to accept Google AI crawlers, says FT policy chief

UMG sues Believe and TuneCore for $500 million 

The complaint was filed in the US District Court for the Southern District of New York and focuses in part on the dissemination of so-called ‘manipulated’ audio. It alleges that Believe has built its business through “industrial-scale copyright infringement” of “the world’s most popular copyrighted recordings.”

Source: UMG sues Believe and TuneCore for $500 million, alleging ‘industrial-scale copyright infringement’

GEMA Releases ‘AI Principles’ Amid Continued Regulatory Push

Amid a continued push to develop a licensing framework for generative AI, Germany’s GEMA has unveiled 10 AI principles. Importantly, the charter isn’t an in-depth collection of detail-oriented policy proposals. By GEMA’s own description, the concise resource “shall serve as food for thought and provide guidelines for a responsible use of generative AI.”

Source: GEMA Releases ‘AI Principles’ Amid Continued Regulatory Push

Shamrock Capital raises $1.6bn for two new investment funds

While Shamrock doesn’t confirm that it will be looking to make acquisitions in the music industry specifically via the new funds, it did say that it will focus on “buyout and later-stage growth equity investments in middle market companies” across its target sectors. Shamrock’s target sectors include: media, entertainment, content, communication, sports, marketing, and education.

Source: Shamrock Capital, the firm that bought Taylor Swift’s masters, raises $1.6bn for two new investment funds

Merlin CEO Jeremy Sirota: “Licensing has always been too restrictive”

Startups need to understand (and respect) music and music’s value more, but equally copyright owners need to stop putting unnecessary obstacles. The startup world is predicated on the most terrifying failure rates. Even if they raise money, they are often handing over too much equity and will hand over even more to raise the next funding round; and if they need to license music, the deals are always in the labels’ and publishers’ favour and can be pulled when the short licensing terms expire.

Source: Merlin CEO Jeremy Sirota: “Licensing has always been too restrictive”

SESAC wins 10.4% increase in royalties collected from US radio

US performing rights organization SESAC has won an increase in the fees it collects from US radio stations, after an arbitration panel ruled on an ongoing dispute between SESAC and station owners. The new rate will apply from January 1, 2023, through December 31, 2026. Because the rates apply retroactively, radio stations can expect to see an adjustment to the rates they have paid going back to the beginning of 2023.

Source: SESAC wins 10.4% increase in royalties collected from US radio

The Elephant in the Room in the Google Search Case: Generative AI 

Large language models (LLMs) like Gemini require access to massive amounts of training data to be effective. Simply put, Google is able to gain an advantage in training its own generative AI models because of the massive amounts of user data it derived from illegally maintaining a monopoly across Search. Real-time data about what, when, and how people search the internet every day is only the beginning.

Source: The Elephant in the Room in the Google Search Case: Generative AI | TechPolicy.Press

Dutch publisher to use AI to translate ‘limited number of books’ into English

Veen Bosch & Keuning, the largest publisher in the Netherlands, has confirmed plans to trial the use of artificial intelligence to assist in translation of commercial fiction.“There will be one editing phase, and authors have been asked to give permission for this,” a VBK spokesperson told the Bookseller. “We are not creating books with AI, it all starts and ends with human action.”

Source: Dutch publisher to use AI to translate ‘limited number of books’ into English

Get the latest RightsTech news and analysis delivered directly in your inbox every week
We respect your privacy.