A primary concerns is that AI models can “collapse” when they rely too much on synthetic data. This means they start generating so many “hallucinations” – a response that contains false information – and decline so much in quality and performance that they are unusable. For example, AI models already struggle with spelling some words correctly. If this mistake-riddled data is used to train other models, then they too are bound to replicate the errors.
Source: Tech companies are turning to ‘synthetic data’ to train AI models – but there’s a hidden cost



The Data Protection Authority said Friday that the Californian darling of the artificial intelligence sector processed users’ personal data to train its ChatGPT chat service without identifying an adequate legal basis and violated the country’s rules on transparency for users. It also said the company did not notify it of a March 2023 data breach.
The stock-image house has unveiled a “research licence” for the training of open-source AI models, which it hopes will be a springboard into commercial licences as partners develop their businesses. “By first integrating a research licence, startups and AI companies can build and refine AI tools on premium, licensed data before making a larger commitment in a full commercial licence,” is how Shutterstock described it.
The International Standard Name Identifier (ISNI) has become an imperative in the evolving landscape of book publishing. As our industry grows ever more digital and interconnected, ISNI’s role in ensuring accurate identification and the attribution of works to contributors—including authors, editors, illustrators, translators, agents, and organizations—has shifted from a technical utility to a strategic necessity.
Reddit Answers will respond to users’ questions with summaries of conversations from across the social media platform, and provide links to relevant communities and posts, the company said. The chatbot can also make recommendations and suggest follow-up questions. With the AI chatbot, Reddit users can skip using Google search or OpenAI’s ChatGPT to find information and discussions on Reddit.
Giving artificial intelligence models consent to use content for training is a “perfect use case” for blockchain technology, according to Avery Ching, co-founder and chief technology officer of Aptos. He highlighted the potential for blockchain to provide clear consent mechanisms for determining whether specific content can be used for AI training.