Data

French press take on digital databases to defend journalist copyright against AI

Two professional organisations representing 800 newspapers and magazines employing over half of journalists in France announced Monday that they are taking “coordinated action” against public datasets used to train generative artificial intelligence services, such as ChatGPT. The Apig, the general news medial alliance, and the Sepm, the magazine publisher’s union, aim to remove their members’ content from Common Crawl, C4 and Oscar.

Source: French press take on digital databases to defend journalist copyright against AI

AI giants race to scoop up elusive real-world data

Over the past two months, OpenAI has tied up with e-commerce majors Shopee and Shopify, while Google and Perplexity have doled out free access to their advanced AI tools to some users in India. Experts believe these moves will help the companies access structured consumer queries, product behaviors, and transactional data — training signals that are often unavailable via public data alone.

Source: AI giants race to scoop up elusive real-world data

Fastly warns AI bots can hit sites 39K times per minute

Cloud services giant Fastly has released a report claiming AI crawlers are putting a heavy load on the open web, slurping up sites at a rate that accounts for 80 percent of all AI bot traffic, with the remaining 20 percent used by AI fetchers. Bots and fetchers can hit websites hard, demanding data from a single site in thousands of requests per minute.

Source: Fastly warns AI bots can hit sites 39K times per minute

Perplexity's Comet AI Web Browser Had a Major Security Vulnerability

Comet, Perplexity’s new AI-powered web browser, recently suffered from a significant security vulnerability, according to a blog post last week from Brave, a competing web browser company. The vulnerability has since been fixed, but it points to the challenges of incorporating large language models into web browsers.

Source: Perplexity’s Comet AI Web Browser Had a Major Security Vulnerability

More UK news publishers are adopting ‘consent or pay’ advertising model

Sixteen of the 50 biggest news websites in the UK are now using a “consent or pay” model to allow users to pay to reject personalised advertising or even avoid ads altogether. UK publishers began to implement the model last year as the Information Commissioner’s Office cracked down on the requirement for the biggest sites to display a “reject all cookies” button as prominently as the option to “accept all”.

Source: More UK news publishers are adopting ‘consent or pay’ advertising model

AI Becomes Hollywood’s New Secret Weapon in Talent Negotiations: ‘The Data Doesn’t Lie’

Where many in Hollywood view AI as something of a bogeyman, fearing widespread job loss, others see opportunity. Acharia, who hails from a tech background as an entrepreneur before pivoting to management, is one of a small but growing legion of talent reps who are harnessing AI data to better advocate for and negotiate on behalf of their clients on future deals.

Source: AI Becomes Hollywood’s New Secret Weapon in Talent Negotiations: ‘The Data Doesn’t Lie’

AI crawler Firecrawl raises $14.5M, is still looking to hire agents as employees

The Firecrawl founders are working on tools to help website owners, publishers, and other content creators “get paid when AI uses their content. We think this is the way it should be,” CEO Caleb Peffer said. While there have been lots of efforts around this idea from big names like Adobe and Getty, Peffer feels that Firecrawl has an edge because it’s already working with those who are scraping data.

Source: AI crawler Firecrawl raises $14.5M, is still looking to hire agents as employees | TechCrunch

How web scraping actually works – and why AI changes everything

AI’s appetite for scraped content, without returning readers, is leaving site owners and content creators fighting for survival. Both search and AI use the results of absolutely ginormous scraping and spidering operations, but one provides benefits to the scrapees, while the other profits enormously from the work of others while simultaneously destroying their motivation to keep doing the work.

Source: How web scraping actually works – and why AI changes everything

Publisher traffic sources: Google steady but social and direct referrals are down

New data from Chartbeat suggests that “search” as a source of total traffic to major news publishers has remained stable over the last year. This appears to chime with a Google statement earlier this month downplaying the impact of AI Overviews and AI Mode on publisher referrals. However, this includes Google Discover – which has replaced search as the main source of Google traffic. Social media has however sharply declined as a source of publisher traffic in recent years, as has direct traffic.

Source: Publisher traffic sources: Google steady but social and direct referrals are down

Synthetic data is the new AI gold rush, but critics call it ‘data laundering’

The prospect of relying heavily on synthetic data hasn’t gone unnoticed by the creative industries. “I believe the main reason companies like OpenAI are having to rely more on synthetic data now is that they’ve run out of high-quality human created data to mine from the public facing internet,” says Reid Southern, a film concept artist and illustrator, adding, “It further distances them from any copyrighted materials they’ve trained on that could land them in hot water.”

Source: Synthetic data is the new AI gold rush, but critics call it ‘data laundering’

Get the latest RightsTech news and analysis delivered directly in your inbox every week
We respect your privacy.