OpenAI’s Sam Altman did himself no favors when he tweeted out “her” to mark the official unveiling of ChatGPT-4o, the company’s new talking chatbot. The tweet (or whatever we’re supposed to call them these days) appeared to be a reference to the 2013 Oscar-winning film, Her, for which Scarlett Johansson provided the sultry voice of Samantha, an AI assistant, and which Altman has publicly identified as his favorite movie.
It also appeared to confirm that the voice of Sky, one of the five voices available in the chatbot, and one which bears a striking resemblance to that of Johansson’s character in the film, was specifically and intentionally designed to mimic the actress, whether by cloning her voice from recordings or by hiring another actress to imitate her.
Johansson certainly thought so. The actress, who has previously battled deepfake uses of her likeness, and took on and wrested a settlement out of Disney over the streaming release of Black Widow, immediately put out a statement threatening legal action against OpenAI.
“When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference,” the statement read in part. “As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAI, setting out what they had done and asking them to detail the exact process by which they created the ‘Sky’ voice.”
It didn’t help that Altman himself had in fact reached out twice to Johansson about providing a voice for the new chatbot in the months leading up to the release and she declined the offer.
In response to the letters from Johansson’s lawyers, OpenAI pulled the voice of Sky from the app. But several legal experts have suggested she could have a clear cause of action under various state publicity laws, including in California and New York. Some experts also point to the successful cases brought by Bette Midler and Tom Waits against the use of other singers to imitate their distinctive voices and vocal stylings in advertisements without permission.
Given the parties’ high profiles and the publicity around the story, the controversy is also likely to resonate in Washington, DC. Lawmakers there are currently considering at least two bills that, for the first time, would confer federally protected intangible property status on individuals’ name, image, likeness and voice (NILV), superseding state laws. Supporters of those bills will no doubt seize on the controversy to bolster the case for passage.
But the case could also serve to highlight the difficult challenges drafting statutes to create a new class of intellectual property is likely to face. For instance, what aspects of someone’s voice should be protectable and what should be the criteria for infringement?
There is at least some evidence that the voice of Sky’s resemblance to Johansson’s voice is, if not incidental, then not a deliberate copy.
In a blog post, OpenAI describes what it claims are the steps it took and the timeline for selecting the voices for ChatGPT-4o.
We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice—Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice. To protect their privacy, we cannot share the names of our voice talents….
On May 10, 2023, the casting agency and our casting directors issued a call for talent. In under a week, they received over 400 submissions from voice and screen actors. To audition, actors were given a script of ChatGPT responses and were asked to record them. These samples ranged from answering questions about mindfulness to brainstorming travel plans, and even engaging in conversations about a user’s day…
An internal team at OpenAI reviewed the voices from a product and research perspective, and after careful consideration, the voices for Breeze, Cove, Ember, Juniper and Sky were finally selected…
This entire process involved extensive coordination with the actors and the casting team, taking place over five months. We are continuing to collaborate with the actors, who have contributed additional work for audio research and new voice capabilities in GPT-4o.
The Washington Post, was separately able to confirm many of those details. The paper spoke with the agent of the Sky actress, who said the actress confirmed that neither Johansson nor the movie Her were ever mentioned by OpenAI.
The Post confirms that the actress’ natural speaking voice “sounds identical to the AI-generated Sky voice, based on brief recordings of her initial voice test reviewed by” the paper, and quotes a statement provided by the actress herself saying, in part, the backlash “feels personal being that it’s just my natural voice and I’ve never been compared to [Johansson] by the people who do know me closely.”
Proponents of a new federal standard, including members of congress, often describe what they envision as a “federal right of publicity,” presumably on analogy to state right of publicity laws. But the text of the NO FAKES Act pending in the senate, currently the leading vehicle for federal action, explicitly defines the “image, voice, and visual likeness of individuals” as a property right unrelated to whether the individual is a celebrity, performer or otherwise recognizable.
(A) IN GENERAL.—The right described in paragraph (1) shall have the following characteristics:
(i) The right is—
(I) a property right; and
(II) descendible and licensable in whole or in part, by the individual to whom the right applies.
(ii) The right shall not expire upon the death of the individual to whom the right applies, without regard to whether the right is commercially exploited by that individual during the lifetime of the individual (emphasis added).
(iii) The right shall be exclusive to—
(I) the applicable individual, subject to the licensing of those rights, as provided in this paragraph, during the lifetime of that individual; and
(II) the executors, heirs, assigns, or devisees of the applicable individual for a period of 70 years after the death of the individual.
(B) REQUIREMENTS FOR LICENSE.—A license described in subparagraph (A) shall be valid only if—
(i) the applicable individual was represented by counsel in the transaction and the assignment agreement was in writing; or
(ii) the licensing of the right covered by the assignment is governed by a collective bargaining agreement.
The text grounds the act in the Constitution’s commerce clause. But what it describes would function very much like copyright, derived from the intellectual property clause, including the duration of its term. And its an odd fit under the rubric of IP.
As musician, attorney and technologist Damien Riehl points out, all other forms of intellectual property — patents, trademarks, copyright and trade secrets — involve a bargain between the rights owner and society: a limited monopoly in exchange for a useful contribution to society.
Copyright and patent protection are intended to “promote the progress of science and useful arts,” by incentivizing new writings and inventions. But as Riehl notes, “Scarlett Johansson does not need an incentive to sound like Scarlett Johansson.” Yet a law along the lines of the NO FAKES Act could effectively grant her a monopoly over “Scarlett Johansson-y voices.”
Moreover, a statute or verdict that would benefit Johansson in a case such as Sky and ChatGPT-4o, could represent a kind of taking, from society in general by potentially limiting speech, and from the unnamed other voice actress behind Sky in particular. If she could not use her voice professionally, either because of a law or potential employers’ fear of liability, she would be unable to work, with no corresponding benefit to her.
Deepfakes and AI voice clones are serious problems, particularly for artists and performers, but also for political candidates who are vulnerable to being portrayed as saying or doing something they have not. The sense of urgency to “do something” about them is both palpable and understandable.
But creating a new form of intellectual property, with all that entails, is no trivial matter and should be based on careful consideration of the incentives and tradeoffs involved. What a law like the NO FAKES Act would be most likely to incentivize is not more creative contributions but a flood of “you stole my voice” litigation similar to the “you stole my melody” lawsuits currently washing through the music business, even where, as appears to be the case with Sky, there is no intentional deception or AI trickery involved.
That’s not a recipe for promoting the progress of science and useful arts.