Data

Tech companies are turning to ‘synthetic data’ to train AI models – but there’s a hidden cost

A primary concerns is that AI models can “collapse” when they rely too much on synthetic data. This means they start generating so many “hallucinations” – a response that contains false information – and decline so much in quality and performance that they are unusable. For example, AI models already struggle with spelling some words correctly. If this mistake-riddled data is used to train other models, then they too are bound to replicate the errors.

Source: Tech companies are turning to ‘synthetic data’ to train AI models – but there’s a hidden cost

Google finds new way to train AI models using smaller ‘teacher’ models

A joint team from Google Research and DeepMind has developed a training method called SALT (Small model aided large model training) that cuts training time by up to 28 percent while improving performance. The key innovation? Using smaller language models as assistant teachers. The researchers also created an enhanced version called SALTDS that carefully selects training data, focusing on examples where the smaller model performs well.

Source: Google finds new way to train AI models using smaller ‘teacher’ models

How Do AI Detectors Work: Cracking the Code on AI Content

As AI-generated content proliferates, the demand for detectors is on the rise. Search engines are becoming especially wary of results pages flooded with AI-generated content that’s largely unoriginal and low-quality. To remedy this, several businesses are implementing AI content detectors into their content editing and publishing strategy.

Source: How Do AI Detectors Work: Cracking the Code on AI Content

European authorities say AI can use personal data without consent for training

The European Data Protection Board (EDPB) issued a wide-ranging report on Wednesday exploring the many complexities and intricacies of modern AI model development. It said that it was open to potentially allowing personal data, without owner’s consent, to train models, as long as the finished application does not reveal any of that private information.

Source: European authorities say AI can use personal data without consent for training

Italian Privacy Watchdog Fines OpenAI $15.5 Million Over ChatGPT Data Use

The Data Protection Authority said Friday that the Californian darling of the artificial intelligence sector processed users’ personal data to train its ChatGPT chat service without identifying an adequate legal basis and violated the country’s rules on transparency for users. It also said the company did not notify it of a March 2023 data breach.

Source: Italian Privacy Watchdog Fines OpenAI $15.5 Million Over ChatGPT Data Use

Shutterstock launches a ‘research licence’ for GenAI companies

The stock-image house has unveiled a “research licence” for the training of open-source AI models, which it hopes will be a springboard into commercial licences as partners develop their businesses. “By first integrating a research licence, startups and AI companies can build and refine AI tools on premium, licensed data before making a larger commitment in a full commercial licence,” is how Shutterstock described it.

Source: Shutterstock launches a ‘research licence’ for GenAI companies

The Role of ISNI in Book Publishing: Enabling Accuracy, Discoverability, and Collaboration

The International Standard Name Identifier (ISNI) has become an imperative in the evolving landscape of book publishing. As our industry grows ever more digital and interconnected, ISNI’s role in ensuring accurate identification and the attribution of works to contributors—including authors, editors, illustrators, translators, agents, and organizations—has shifted from a technical utility to a strategic necessity.

Source: The Role of ISNI in Book Publishing: Enabling Accuracy, Discoverability, and Collaboration

How the EU AI Act Can Increase Transparency Around AI Training Data 

The EU’s AI Act, which went into force earlier this year, provides the most significant opportunity to advance transparency around training data to date. Amongst many other things, it mandates that developers of so-called “general-purpose AI” (GPAI) models — EU lingo for “foundation models” — publish a “sufficiently detailed summary” of the data used to train their models.

Source: How the EU AI Act Can Increase Transparency Around AI Training Data | TechPolicy.Press

Reddit is taking on Google and ChatGPT with its own AI chatbot

Reddit Answers will respond to users’ questions with summaries of conversations from across the social media platform, and provide links to relevant communities and posts, the company said. The chatbot can also make recommendations and suggest follow-up questions. With the AI chatbot, Reddit users can skip using Google search or OpenAI’s ChatGPT to find information and discussions on Reddit.

Source: Reddit is taking on Google and ChatGPT with its own AI chatbot

Aptos co-founder: AI training consent a ‘perfect use case’ for blockchain 

Giving artificial intelligence models consent to use content for training is a “perfect use case” for blockchain technology, according to Avery Ching, co-founder and chief technology officer of Aptos. He highlighted the potential for blockchain to provide clear consent mechanisms for determining whether specific content can be used for AI training.

Source: AI training consent a ‘perfect use case’ for blockchain — Aptos co-founder

Get the latest RightsTech news and analysis delivered directly in your inbox every week
We respect your privacy.