Paul Sweeting

AI and the News: Deal, or No Deal?

February 27, 2024By Paul Sweeting

Reddit, the self-anointed “front page of the internet,” sits atop a huge archive of original content. It contains more than a billion posts created by its 73 million average daily unique users self-organized into more than 100,000 interest-based communities, or subreddits, ranging from sports to politics, technology, pets, movies music & TV, health & nutrition, business, philosophy and home & garden. You name it, there’s likely to be a subreddit for it.

The scale and diversity of the Reddit archive, replete with uncounted links to all corners of the World Wide Web and made freely accessible via API, has long-been a highly valued resource for researchers, academics and developers building third-party applications for accessing Reddit communities. More recently, it has also eagerly been mined by developers of generative AI tools in need of large troves of natural language texts on which to train their models.

Fighting Deep Fakes: IP, or Antitrust? (Updated)

February 21, 2024By Paul Sweeting

The Federal Trade Commission last week elbowed its way into the increasingly urgent discussion around how to respond to the flood of AI-generated deep fakes plaguing celebrities, politicians, and ordinary citizens. As noted in our previous post, the agency issued a Supplemental Notice of Proposed Rulemaking (SNPRM) seeking comment on whether its recently published rule prohibiting business or government impersonation should be extended to cover the impersonation of individuals as well.

The impersonation rule bars the unauthorized use of government seals or business logos when communicating to consumers by mail or online. It also bans the spoofing of email addresses, such as .gov addresses, or falsely implying an affiliation with a business or government agency.

Suddenly, Everyone is Adding Watermarks to AI Generated Media

February 16, 2024By Paul Sweeting

With election season in full swing in the U.S. and European Union, and concern growing over deep-fake and AI-manipulated images and video targeting politicians as well as celebrities, AI heavyweights are starting to come around to supporting for industry initiatives to develop and adopt technical standards for identifying AI-produced content.

At last month’s World Economic Forum in Davos, Meta president of global Affairs Nick Clegg called efforts to identify and detect AI content “the most urgent task” facing the industry. The Facebook and Instagram parent began requiring political advertisers using its platforms to disclose whether they used AI tools to create their posts late last year. But it is also now gotten behind the technical standard developed by the Coalition for Content Provenance and Authenticity (C2PA) for certifying the source and history of digital content.

Copyright and AI: Where’s the Harm?

February 12, 2024By Paul Sweeting

Berkley law professor Pamela Samuelson has ruffled more than a few feathers among creators and rights owners over the years. In her role as co-founder and chair of the Authors Alliance, her seats on the boards of the Electronic Frontier Foundation and Public Knowledge, and in spearheading the American Law Institute’s controversial restatement of copyright law, she has been a high-profile and vocal skeptic of expansive views of copyright protections, particularly in the realm of digital platforms and technologies.

News Value: Is AI On the Money?

January 5, 2024By Paul Sweeting

Facing a potentially ruinous lawsuit from the New York Times over the unlicensed use of the newspaper’s reporting to train its GPT Large Language Model, OpenAI is putting out the word that it is not opposed to paying publishers for access to their content, as it recently did with Axel Springer.

“We are in the middle of many negotiations and discussions with many publishers. They are active. They are very positive,” Tom Rubin, OpenAI’s chief of intellectual property and content, told Bloomberg News. “You’ve seen deals announced, and there will be more in the future.”

All the News That’s Fit to Scrape

January 2, 2024By Paul Sweeting

If you’re reading this post you likely know by now that the New York Times last week filed a massive copyright infringement lawsuit against OpenAI and Microsoft over the unlicensed use of Times content to train the GPT line of generative AI foundation models.

It’s tempting to view this as the Big One, the Battle of the Titans that will make it all the way to the Supreme Court for a definitive resolution of the most contentious question in the realm of AI and copyright. It’s the New York Times, after all, one of the premier names in journalism anywhere in the world, and one of the few publishers with the resources to take on the tech giants and pursue the case to the end.

Revealing Sources: The News on AI

December 19, 2023By Paul Sweeting

For news publishers, AI can giveth, and AI can taketh away. On the latter side of the ledger, publishers are in a cold sweat over Google’s “Search Generative Experience,” (SGE) product, which the search giant has been testing for the past several months. The tool, trained in part on publishers’ content, uses AI to generate fulsome responses to users’ search queries, rather than merely providing links to websites where answers might be found.

Last week, the Arkansas-based publisher Helena World Chronicle filed a prospective class-action lawsuit against Google, accusing the search giant of anti-competitive practices and specifically citing Search Generative Experience.

What’s In a Name? Seeking An Answer to Deep Fakes

December 12, 2023By Paul Sweeting

When it comes to AI and intellectual property, most of the focus has been on the use of copyrighted works in training generative AI models and the patent and copyright eligibility of inventions or works produced with the technology. Insofar as the political deal European Union officials reached over the weekend on the AI Act addresses IP, it confines itself to requiring foundation-model developers to document and disclose their training data and the labeling of AI-generated content. Training and IP eligibility have also been the main focus of AI litigation to date in the U.S.

But the rapid spread and growing ease of so-called deep fake apps have led to growing calls to provide protection against the unauthorized appropriation of a person’s name, image and likeness (NIL) or celebrity. The calls run like a secondary theme through comments filed by with the Copyright Office in its current study of AI and copyright (see here, here and here), and the issue played a starring role in the labor strife that recently rocked Hollywood.

EU AI Act: Down to the Wire (Update)

December 4, 2023By Paul Sweeting

Negotiations toward a final text of the European Union’s AI Act are going down to the wire this week as the final “trilogue” session among the EU Parliament, Commission and Council is scheduled for Wednesday (Dec. 6). The pressure is on to reach an agreement before the end of the year, as the June 2024 EU Parliamentary elections loom over talks. If agreement can’t be reached before then, there’s a danger that the process would have to be restarted with a new Parliament and new leadership in the Council, which could potentially scuttle the whole project.

Yet despite the pressure, the parties to the current talks appear to be farther apart than where they started, endangering what had been touted as the world’s first comprehensive regulatory regime for of AI. The consensus on the basic structure of the proposed regulations that seemed at hand in the summer was thrown into turmoil last month when France, supported by Germany and Italy, suddenly reversed its position and embraced “mandatory self-regulation” via codes of conduct for the largest foundation models instead of the once-agreed tiered system of binding obligations.

The Future of Generative AI Might Be Smaller Than You Think

November 28, 2023By Paul Sweeting

The distinguishing characteristic of large language models (LLMs) is, as the name implies, their sheer size. Meta’s LLaMA-2 and OpenAI GPT-4 are each comprised of well more than 100 billion parameters — the individual weights and variables they derive from their training data and use to process prompt inputs. Scale is also the defining characteristic of the training process LLM’s undergo. The datasets they ingest and are almost incomprehensively large — equivalent to the entire World Wide Web — and require immense amounts of computing capacity and energy to analyze.