The Fineprint
Posts
OpenAI Levies Fiery Accusations Of Evidence Manipulation Against NYT

OpenAI Levies Fiery Accusations Of Evidence Manipulation Against NYT

March 04, 2024

Good morning, and welcome to The Fineprint, the newsletter that brings you 0% jargon and 100% juice.

Coming up in today’s print:

Dramatic developments in the legal war between OpenAI and The New York Times.
Dwayne Johnson is awarded IP rights over his nickname “The Rock.”
Tesla is sued for racism.

Btw, did you know… It’s illegal to walk your cow down the street during the daytime in Scotland.

COPYRIGHT

OpenAI Levies Accusations Of Evidence Manipulation Against NYT

Why we’re paying attention: Hundreds of millions of us are now using A.I. generated content in our work and business. But what A.I. generated content are we legally allowed to use according to copyright law? The legal battle being waged between the New York Times and OpenAI is currently the largest battleground for settling this debate. The outcome could affect the way we interact with A.I. as consumers, the A.I. services that businesses are allowed to sell and utilize internally, and could hold ramifications for the future of journalism and content creation. Yep, the stakes are higher than Taylor Swift’s 2024 presidential endorsement.

Back in December, the New York Times filed a lawsuit against OpenAI.

The publication accused OpenAI and Microsoft of attempting to "free-ride on the Times's massive investment in its journalism" and went as far as saying that their A.I. products “imperiled the very enterprise of journalism.”

Shots fired.

Well, on Monday, OpenAI returned fire by filing a motion to dismiss the case. The 37 page document launches a barrage of counterarguments against the case put forward by the New York Times.

One of the most dramatic claims is that the NYT paid for OpenAI to get hacked in order to build their case. If these accusations prove true, this whole thing could backfire for The New York Times.

A bunch of other compelling points were put forward by OpenAI’s legal team in their fight for the future of A.I. content, but before we unpack them, here’s a quick refresher on the original complaint put forward by The New York Times.

NYT’S SIDE OF THE STORY

Some of the biggest gripes voiced by NYT in their lawsuit:

Keep My Publication’s Stories Out Your F***ing Model: The Times claims that OpenAI unlawfully used copyrighted NYT articles to train their A.I. models without any permission or payment.

This is no bueno in NYT’s eyes. Because why should OpenAI swoop in and make billions of dollars off of a library of stories that took the Times a century of blood, sweat, and tears to publish?
Verbatim regurgitation — The centerpiece of NYT’s case is “Exhibit J” — 100 screenshots of OpenAI powered products ripping off the Times’s content in various ways. One of the biggest offenses is that ChatGPT can reproduce New York Times articles verbatim when prompted…

NYT argues that this is a flagrant ripoff of their copyrighted content. They say this disincentivizes readers from visiting their website leading to lost revenue, made worse by the fact that ChatGPT can even reproduce paywalled content.
Bing’s theft-by-summary — another violation put forward by NYT is the fact that Bing (owned by Microsoft) uses OpenAI models to generate summaries of NYT articles.

These summaries often quote the article in question exactly. So why would readers bother clicking on the full article, the NYT posits. This in turn leads to a loss in affiliate revenue.
Fictional Frustration — finally, the NYT is mad about the fact that ChatGPT and other A.I. services can “hallucinate” NYT articles that don’t exist. By telling readers about fictional articles, the Times argues that OpenAI is damaging their reputation.

OPENAI’S RETALIATION

For every complaint lodged by the Times, OpenAI’s legal team had a compelling response.

Too Little, Too Late — OpenAI readily admits that NYT articles were included in the massive dataset used to train their models. The problem is that the Times has been aware of that since 2020. This is evidenced by the fact that the paper reported on GPT-3, saying the technology could be “enormously useful” and “open the door to a wide range of new possibilities.”

According to law, the Times had three years to lodge a complaint, so why did they leave it so late? (Hint: perhaps it has something to do with the fact that OpenAI is now doing over two billion dollars in annual revenue). So their point is that NYT had their window to complain and missed it.

BUT, even if that weren’t true, OpenAI argues that copyright law protects them from training their model on NYT content anyway. Why? Because progress. It’s perfectly legal to use copyrighted content for purposes of technological innovation. Courts have used this to protect everything from home video recording to internet search in the past.

Bad Faith — OpenAI doesn’t think that ChatGPT quoting NYT articles verbatim holds much water either. According to copyright law, OpenAI would have had to have been:

A) aware of the specific breaches
B) complicit in the breaches

OpenAI attacks this one on multiple fronts.

Firstly, the offending conversations put forward by the NYT are a breach of OpenAI’s terms of service, undermining the notion that they facilitated the offending conversations.

Secondly, they were certainly not made aware of these specific breaches, because OpenAI asked the New York Times to see them so they could help, but the Times refused, “rather, the Times kept these results to itself, apparently to set up this lawsuit.”

Thirdly, and perhaps the most damning, OpenAI argues that getting ChatGPT to quote articles verbatim is an extremely rare use case and difficult to pull off. In fact, OpenAI has reason to believe that the Times paid someone to prompt GPT tens of thousands of times in order to achieve these results. They even accuse the Times of showing the conversations out of order to make things look worse and manipulate the evidence.

OpenAI did not share their evidence, likely as a strategic move to deter the Times from moving forward, given that the evidence would be revealed during discovery.

(Also, most of NYT’s examples involves a user copy and pasting part of an article into ChatGPT in order to get a response, which means that the user presumably would have had to have gone to NYT’s website to get the article in the first place, further weakening the argument that ChatGPT is stealing traffic and revenue).
Well Within Our Copyrights — As for ChatGPT and Bing being able to summarize and quote NYT content, OpenAI says this is protected by copyright law, notably evidenced by the fact that the Times frequently quotes other sources in their own work:

“But the law does not prohibit reusing facts or styles. If it did, the Times would owe countless billions to other journalists who“ invest enormous amount[s] of time, money, expertise, and talent” in reporting stories, only to have the Times summarize them in its pages.”

The Times tried arguing that this is still “unfair competition” according to New York law but OpenAI says that federal copyright law must be applied instead.

4. Illusions Of Fairness — Finally, as for the hallucination issue, OpenAI readily admits that this is a bug that A.I. researchers are working hard to solve. But they also point out that, should anyone click on a hallucinated link, it would become immediately obvious that the article doesn’t exist. And OpenAI warns its users about hallucinations.

WHAT’S NEXT

Both sides are now waiting for a judge to rule on OpenAI’s motion to dismiss.

How this all ultimately shakes out remains hard to predict. All we know for certain is that the outcome will have ramifications for how all of us interact with and generate A.I. content.

But in the meantime, place your bets! Reply to us and let us know who you’ve got your money on.

SNIPPETS

Musk piles on: Hot on the heels of NYT's lawsuit, Elon Musk adds fuel to OpenAI's legal fire, suing over its pivot from public mission to profit driven. The suit demands OpenAI's research be made public, raising eyebrows for some about Musk's motives—especially since launching rival AI firm, xAI, last year. Either way, OpenAI’s legal team is making serious bank.

Rock Solid Rights: Dwayne "The Rock" Johnson secured exclusive rights to 25 of his iconic catchphrases and nicknames, including “The Rock,” “Rock Bottom,” and “candy ass,” turning his WWE banter into a safeguarded branding goldmine. Hitting "Rock Bottom" now has a whole new meaning - only “The Rock” has the right to claim it, ensuring every mention pays.

Canceled Order: The EU Parliament officially banned Amazon’s lobbyists from setting foot inside their hallowed halls. The reason? Amazon repeatedly skipped meetings over the labor conditions of warehouse workers. The takeaway? Don’t miss important meetings.

Battery of Complaints: Nearly 6,000 Black factory workers are revving up to take Tesla to court over alleged racial discrimination and harassment at its Fremont plant in California. While Tesla denies tolerating such behavior, it might find itself in the wrong lane with a potential multimillion-dollar judgment.

Illuminating Infringement: UK supermarket chain M&S embraced the Christmas spirit with a gin bottle that lights up. But then rival store Aldi stole their shine by knocking off the design. This week, a court ruled in M&S's favor, ordering Aldi to halt sales of the copycat gin. Lesson learned: innovation over imitation.

MEME

Anyone…?

Was this email forwarded to you by someone awesome? You can sign up here.

Remember: always read the Fineprint!

Reply

or to participate.