Data

Reddit’s deal with OpenAI will plug its posts into “ChatGPT and new products”

OpenAI has signed a deal for access to real-time content from Reddit’s data API, which means it can surface discussions from the site within ChatGPT and other new products. It’s an agreement similar to the one Reddit signed with Google earlier this year that was reportedly worth $60 million. The deal will also “enable Reddit to bring new AI-powered features to Redditors and mods” and use OpenAI’s large language models to build applications

Source: Reddit’s deal with OpenAI will plug its posts into “ChatGPT and new products”

The Battle Over Using Journalism to Build AI Models is Just Starting

ChatGPT will tell you that the news is factual, includes language variation and cultural awareness, comprises complex sentence structures, includes quotes that convey real-world conversations, excels at summarization and condensation. In fact, the news is so valuable to this endeavor that it makes up half of the top 10 sites incorporated into one of Google’s datasets that is being used to train some of the most popular large language models.

Source: The Battle Over Using Journalism to Build AI Models is Just Starting | Nieman Reports

TikTok is adding an “AI-generated” label to watermarked third-party content

TikTok already automatically applies an “AI-generated” tag to content on its platform made using TikTok’s AI tools, and that same label will now apply to content created on other platforms. Now, TikTok will detect when images or videos are uploaded to its platform containing metadata tags indicating the presence of AI-generated content and says it’s the first social media platform to support the new Content Credentials.

Source: TikTok is adding an “AI-generated” label to watermarked third-party content

Can Regulation Deep Six Deepfakes?

The National Institute of Standards and Technology (NIST), a basic science and research arm of the Commerce Department best known, if at all, for tackling knotty challenges like accurately centering quantum dots in photonic chips and developing standard reference materials for measuring the contents of human poop used in medical research and treatments, last week took up the problem of identifying AI generated and manipulated audio, video, images and text.

Tasked by President Biden’s Executive Order on AI with helping to improve the safety, security and trustworthiness of AI systems, NIST has issued a GenAI Challenge inviting teams of researchers from academia, industry and other research labs to participate in a series of challenges intended to evaluate systems and methods of identifying synthetic content.

This Week in AI: Generative AI and the problem of compensating creators

A recently published research paper co-authored by Boaz Barak, a scientist on OpenAI’s Superalignment team, proposes a framework to compensate copyright owners “proportionally to their contributions to the creation of AI-generated content.” How? Through cooperative game theory.

Source: This Week in AI: Generative AI and the problem of compensating creators | TechCrunch

AI Is Gathering a Growing Amount of Training Data Inside Virtual Worlds

For color images, the widely used RGB (red, green, blue) model can correspond to over 16 million possible colors. So as graphics rendering technology becomes ever more photorealistic, the distinction between pixels captured by real-world cameras and ones rendered in a game engine is falling away.

Source: AI Is Gathering a Growing Amount of Training Data Inside Virtual Worlds

SAG-AFTRA Will Use Nielsen Data as Part of Enforcing Studio Pact on Streaming Content

SAG-AFTRA will license Nielsen‘s streaming content data, which the union will to enforce the terms of its 2023 contract with Hollywood studios. Under the deal for Nielsen’s Streaming Content Ratings, SAG-AFTRA will have “an objective source of domestic viewership data for original streaming programming,” the parties announced.

Source: SAG-AFTRA Will Use Nielsen Data as Part of Enforcing Studio Pact on Streaming Content

The TikTok Follies

Congress has passed, and President Biden has now signed, a bill requiring ByteDance to sell TikTok to an American buyer or American-controlled company within 270 days (possibly extendable to a year), or face having the app banned from the U.S.

Things are not likely to work out quite as neatly as that forced choice would have it.

TikTok CEO Shou Zi Chew issued a defiant statement in response to the bill’s passage proclaiming “we aren’t going anywhere,” and vowing to challenge the law in court. “We are confident, and we will keep fighting for your rights in the courts,” he said. “The facts and the Constitution are on our side, and we expect to prevail again.”

Rightsholders Want U.S. KYC Proposal to Include Domain Name Services 

The U.S. Department of Commerce has proposed new customer verification requirements for Infrastructure as a Service providers. The goal of the ‘Know Your Customer’ regime is to prevent fraud and abuse, including piracy. In response to this plan, prominent rightsholders want the department to expand the proposal’s scope to include domain name registrars and registries. Ideally, they argue, domain companies should also be required to take down pirate domains.

Source: Rightsholders Want U.S. “Know Your Customer” Proposal to Include Domain Name Services * TorrentFreak

Why vector databases are having a moment as the AI hype cycle peaks 

The proliferation of large language models and generative AI has created fertile ground for vector database technologies to flourish. Vector databases, store and process data in the form of vector embeddings, which convert text, documents, images, and other data into numerical representations that capture the meaning and relationships between the different data points.

Source: Why vector databases are having a moment as the AI hype cycle peaks | TechCrunch

Get the latest RightsTech news and analysis delivered directly in your inbox every week
We respect your privacy.