NVIDIA Got Caught Sliding Into a Pirate Library’s DMs (And It’s Worse Than You Think)
One-Line Flow: The world’s most valuable company asked pirates for 500TB of stolen books, got warned it was illegal, and said “lol proceed anyway.”
Why This Matters (Even If You Don’t Care About AI):
The trillion-dollar company that powers your gaming GPU decided buying books was for peasants. They went straight to the internet’s largest book piracy operation, got explicitly told “hey this is illegal,” asked their bosses, and within a week got the green light. If you ever wondered whether laws apply to companies worth more than most countries — congratulations, you now have your answer. Spoiler: they don’t.
🎬 The Setup: How We Got Here
So there’s this lawsuit. Authors vs. NVIDIA. Started in early 2024 when writers discovered their books were being fed to AI models without, you know, anyone asking them or paying them.
NVIDIA’s initial defense was chef’s kiss:
“Books are nothing more than statistical correlations to our AI models.”
Translation: “We didn’t steal your book, we just… mathematically absorbed its entire essence and now profit from it. Totally different.”
But then discovery happened. Internal emails got unsealed. And oh boy.
📧 The Emails That Make This Hilarious (And Horrifying)
Here’s the timeline, straight from court documents filed January 16, 2026:
Step 1: NVIDIA’s data team slides into Anna’s Archive DMs
They literally emailed the world’s biggest pirate book library asking about “including Anna’s Archive in pre-training data for our LLMs.”
For context: Anna’s Archive is like if the Library of Alexandria came back as a torrent site. 61+ million books. 95+ million papers. Zero permissions from anyone.
Step 2: Anna’s Archive responds like a responsible criminal
The pirates — THE PIRATES — warned NVIDIA:
“Hey, just so you know, all this stuff was illegally acquired. Do you have internal buy-in for this? We’ve wasted too much time on companies who couldn’t get approval.”
Imagine being a trillion-dollar corporation and getting a compliance check from pirates.
Step 3: NVIDIA asks the boss
Someone at NVIDIA actually had to walk into an executive’s office and say: “Hey, the pirates want to know if we have permission to commit piracy.”
Step 4: The green light (within days)
“Within a week of contacting Anna’s Archive, and days after being warned by Anna’s Archive of the illegal nature of their collections, NVIDIA management gave ‘the green light’ to proceed.”
That’s faster than most companies approve expense reports.
📊 The Scale: 500 Terabytes of 'Oops'
Anna’s Archive offered NVIDIA access to roughly 500 terabytes of data.
To put that in perspective:
- The entire printed collection of the Library of Congress is ~15 terabytes
- This is 33x that
- It’s basically every book ever written, plus sequels that don’t exist yet
And NVIDIA wanted the fast lane. Anna’s Archive charges “tens of thousands of dollars” for high-speed access because regular piracy is apparently too slow for companies with unlimited budgets.
The irony: They could afford to pay pirates for faster piracy speeds but couldn’t afford to pay authors.
🏃 Why The Rush? (Competitive Pressure Made Them Do Crimes)
The lawsuit explains the motive beautifully:
“Competitive pressures drove NVIDIA to piracy.”
Here’s what happened:
- It’s fall 2023
- OpenAI just made ChatGPT go viral
- NVIDIA has their developer conference coming up
- They need 8 trillion tokens for their “NextLargeLLM” project
- Publishers are responding with “we’re not ready to engage”
- Deadline approaching
- Books are “the most valuable” training data
So naturally, when legal acquisition fails, you just… don’t acquire legally.
The internal project names were literally “NextLargeLLM” and “NextLLMLarge.” Not even creative enough to hide what they were doing.
🎭 Plot Twist: Everyone's Doing It
Here’s the part that makes this both better and worse.
From Anna’s Archive’s own blog (quoted in the lawsuit):
“Virtually all major companies building LLMs contacted us to train on our data… We have given high-speed access to about 30 companies.”
The lawsuit names names:
- Meta — Downloaded 81+ terabytes through Anna’s Archive torrents
- OpenAI — Used similar shadow library sources
- Anthropic — Same deal
Anna’s Archive basically became the unofficial AI training data dealer. The pirates saved piracy by selling to tech billionaires.
“Not too long ago, shadow-libraries were dying. Sci-Hub had stopped taking in new works due to lawsuits. Z-Library’s founders were arrested… Then came AI.”
AI literally resurrected internet piracy. The circle of life.
⚖️ The Legal Arguments (And Why They're Hilarious)
NVIDIA’s defense: “It’s fair use because we’re not reproducing the books, we’re just extracting statistical patterns.”
The counter-argument from Hacker News (where lawyers and tech bros argue):
“It’s legal for you to possess a single joint. It’s not legal to possess 400 tons of weed in a warehouse.”
Scale matters in law. Reading one book? Fine. Mathematically consuming every book ever written to build a profit machine? Perhaps different.
The procurement problem:
Even if training is fair use (still being decided), you still have to get the books legally first. NVIDIA didn’t buy 500TB of books. They didn’t check them out from libraries. They downloaded them from pirates after being explicitly warned.
It’s like saying “I didn’t steal the car, I just drove it to work.” The driving might be legal, but…
🔥 The Bonus Allegations (It Gets Worse)
The lawsuit doesn’t stop at “NVIDIA downloaded pirated books.” It goes further:
They allegedly helped customers pirate too:
NVIDIA “distributed scripts and tools that enabled corporate customers to automatically download ‘The Pile’” (a dataset containing 196,000+ pirated books)
This adds vicarious and contributory infringement claims — meaning NVIDIA allegedly helped other companies commit the same crimes.
The full shadow library shopping list:
- Anna’s Archive ✓
- Library Genesis (LibGen) ✓
- Sci-Hub ✓
- Z-Library ✓
- Books3/Bibliotik ✓
They really went for the completionist achievement.
🎯 Who's Suing (And What They Want)
Plaintiffs include:
- Abdi Nazemian (Like a Love Story)
- Brian Keene (Ghost Walk)
- Stewart O’Nan (Last Night at the Lobster)
- Andre Dubus III (The Garden of Last Days, Townie)
- Susan Orlean (The Orchid Thief, The Library Book)
The irony of Susan Orlean — author of The Library Book — suing a company for stealing from a pirate library cannot be overstated.
What they want:
- Statutory damages under copyright law
- Destruction of all illegally obtained copies
- Attorney’s fees
- Class action status (potentially hundreds of additional authors)
The case is proceeding in U.S. District Court, Northern District of California. Case No. 4:24-cv-01454-JST.
🤡 The Internet's Reaction (Hacker News Edition)
Selected highlights from people who are definitely not mad:
“Did you pirate this movie? No, it’s fair use because this movie is nothing more than a statistical correlation to my dopamine production.”
“I saw the movie, but I did not watch it.”
“The trillion-dollar company refuses to pay for digital media? Just to clarify?”
“Laws are for the poor anyways, you ought to think it would be common knowledge by now.”
“A great retaliation to Trump tariffs would be just cancelling copyright for American works in your country.”
And my personal favorite:
“It’s generous of them to ask for permission.”
Response: “They weren’t asking permission. They wanted access to a faster pipe. Permission wasn’t the question.”
NVIDIA didn’t email pirates because they felt bad. They emailed because downloading 500TB at normal speeds takes forever.
📚 The Sources (For When Someone Says 'Source?')
Primary Court Document:
First Consolidated Amended Complaint — Case No. 4:24-cv-01454-JST, filed January 16, 2026
Breaking Coverage:
TorrentFreak: NVIDIA Contacted Anna’s Archive to Secure Access to Millions of Pirated Books
Additional Coverage:
Discussion:
Hacker News Thread — 151+ comments of lawyers and developers arguing
The Takeaway
A trillion-dollar company:
- Couldn’t negotiate with publishers fast enough
- Contacted the world’s largest pirate library
- Got warned by the pirates that this was illegal
- Asked their executives for approval
- Got the green light within a week
- Accessed 500TB of stolen books
- Built AI models with it
- Now argues it’s “just statistics”
And when caught, their defense is essentially: “We didn’t copy the books, we just… learned from them. Like a student. A student who downloaded the entire Library of Congress from criminals and turned it into a profit machine.”
The funniest part? The pirates had better compliance processes than NVIDIA. Anna’s Archive literally asked “do you have internal authorization for this?” before proceeding.
When the pirate library has more ethical guardrails than your Fortune 500 company, maybe it’s time to re-evaluate some choices.
!