You are
right! The Data Is The New Oil — And You’re The Damn Well They’re Drilling 
Big tech didn’t win AI because they’re geniuses. They won because they hoarded your shit.
What You’re Walking Away With
Everything you need to understand why your 2AM doomscroll is worth more than your paycheck — and why trillion-dollar companies are fighting over the digital exhaust you leave behind like seagulls over a french fry.
Why This Actually Matters
- Zero skills needed → Still affects your money, privacy, and leverage in 2026
- They’re literally selling you → And you’re not getting a cut
- The game is rigged → But at least now you’ll know HOW it’s rigged
What You Get From This
Finally understand why “free” apps cost you everything
Learn what your data is actually worth on the black market
See exactly how companies turn your clicks into cash
Discover the secret data wars between nations
Find out why AI companies are running out of training data (yes, really)
Get actual tools to fight back (or at least be less clueless)
The punchline nobody told you:
Google, Facebook, Microsoft — they didn’t win AI because their code is magic. They won because they have mountains of your data to train those algorithms on. The code is free. The data is the monopoly.
🏛️ The Real Power Move: Why Code Is Free But Data Isn't
Here’s the thing that’ll piss you off:
Google open-sourced TensorFlow. Free for everyone.
Meta open-sourced PyTorch. Free for everyone.
Microsoft open-sourced everything. Still free.
“How generous!” you might think.
Lol. No.
They gave away the recipe because they knew you don’t have the ingredients. The actual power isn’t in the algorithm — it’s in the billions of data points they’ve been collecting while you searched for “why does my cat stare at walls” at 3am.
The reality check:
- ✓ Google/Facebook open-sourced their AI code (free for everyone)
- ✓ But they keep the training data locked up (the actual valuable part)
- ✓ Data value increases every day AI advances
- ✓ 80% of the world’s data is privately held, not on the public internet
- ✓ Companies releasing “free AI tools” = recruiting tactic, not charity
- ✓ Future competition = who has the best data, not the best code
Translation: They gave you the gun. They kept the bullets.
🛢️ The Oil Metaphor — Let's Push It Until It Breaks
Everyone keeps saying “data is the new oil.” Fine. Let’s actually think about what that means.
Who Are The Digital Rockefellers?
Just like Standard Oil controlled 90% of oil refineries in the 1890s, a handful of companies now control most of the world’s data:
- Google → Knows what you search, where you go, what you watch
- Meta → Knows who you know, what you like, what makes you angry
- Amazon → Knows what you buy, what you want, what you can afford
- Apple → Knows your health, your conversations, your face
These aren’t tech companies. They’re data extraction operations that happen to sell phones and ads.
Can Data “Spill” Like Oil?
Oh, absolutely. It’s called a data breach and it’s arguably worse.
Oil spills kill ecosystems. Data spills kill your identity, credit score, and peace of mind — forever. Unlike oil, you can’t clean up leaked data. Once your SSN hits the dark web, it’s there until the heat death of the universe.
The fun part? Companies treat data breaches like oil companies treated spills in the 1970s: deny, delay, pay a fine that equals 0.01% of profits, repeat.
Will We Hit “Peak Data”?
Here’s where it gets weird. AI companies are actually running out of high-quality human-written text to train on. The estimate? Sometime between 2026-2028, we exhaust all the useful human-generated content on the internet.
After that? AI starts training on AI-generated content. Which leads to “model collapse” — basically AI inbreeding where each generation gets slightly dumber and weirder.
The irony: Tech companies spent years scraping the entire internet without asking. Now they’re running dry and acting surprised.
What’s The “Refined Gasoline” Version of Data?
Raw data = crude oil. Useless until processed.
Refined data products:
- Your browsing history → Targeted ad profiles
- Your location data → Foot traffic analytics for retail
- Your health app data → Insurance risk assessments
- Your typing patterns → Behavioral authentication
- Your face → Surveillance capitalism
Your phone isn’t a communication device. It’s a pocket-sized oil refinery extracting value from everything you do.
💰 Data Economics: What Your Shit Is Actually Worth
Street Value of Your Daily Doomscroll
Let’s get specific. Here’s what your data sells for on the black market (and the legal “gray” market):
Dark Web Prices (2025):
| Data Type |
Price |
| Full identity package (SSN, DOB, address) |
$15-$30 |
| Credit card with CVV |
$5-$25 |
| Bank login credentials |
$40-$200 |
| Medical records |
$250-$1,000 |
| Driver’s license scan |
$20-$100 |
| Selfie with ID (for verification bypass) |
$40-$60 |
Legal Data Broker Prices:
| Data Type |
Price Per Record |
| Name + Email |
$0.007 |
| Name + Email + Demographics |
$0.20 |
| Mobile advertising ID |
$0.01-$0.04 |
| Precise location history |
$0.50-$2.00 |
You’re generating $10-50 of data value PER DAY. Getting paid $0 for it.
The Black Market Is Wild
The data broker industry is essentially legalized identity trafficking. Companies like Radaris compile profiles on millions of people — your address, relatives, criminal records, property ownership — and sell it to anyone with a credit card.
In 2024, the National Public Data breach exposed literally everyone. 2.9 billion records. Social Security numbers for basically every American adult. The company filed for bankruptcy. Nobody went to jail.
Is There A Secret Data OPEC?
Kind of. It’s called the Big Tech Antitrust Paradox — these companies don’t compete on data. They collude by not competing.
Google doesn’t sell data to Facebook. Facebook doesn’t sell data to Google. They each maintain their own data moats. It’s not explicit coordination — it’s structural monopoly power.
Data Laundering Is A Real Thing
Remember when AI companies needed training data but couldn’t legally scrape copyrighted content?
They laundered it through academic nonprofits. Universities compiled massive datasets. AI companies used those datasets. Technically legal. Ethically… well, you get it.
Stability AI used this exact trick to train on millions of copyrighted artworks without paying artists a dime.
👁️ Privacy Paranoia: Yes, It's That Bad
Your Smart Fridge Is Snitching
This isn’t paranoia. Consumer Reports found that most smart devices share way more data than they need to function.
Devices that are definitely spying on you:
- Smart TVs (Roku, Samsung, LG — all of them)
- Robot vacuums (Ecovacs got hacked in 2024 — people heard voices through their vacuums)
- Voice assistants (obviously)
- Smart doorbells (Ring shares with cops without warrants)
- Fitness trackers (health insurance companies love this data)
Who Owns Your Sleep Data?
You’d think YOU own the data your body generates while unconscious. Nope.
When you use a sleep tracker, that data belongs to the company. They can sell it. Amazon’s sleep tracking ambitions are particularly creepy — they want data from inside your bedroom.
The Target Pregnancy Story (With A Plot Twist)
You’ve probably heard the famous story: Target figured out a teen was pregnant before her father did, based on her shopping habits.
Plot twist: It might be bullshit. The original story has holes. But here’s the thing — it’s plausible enough that nobody questioned it. That’s how normalized this surveillance has become.
They Can Read Your Emotions From How You Type
Not a joke. Academic research proves that keystroke dynamics — how fast you type, how long you hold keys, your rhythm — can reveal emotional state.
Banks use this for fraud detection. Dating apps could use it to know when you’re desperate. Insurance companies could use it to detect depression.
Patents That Should Keep You Awake
Tech companies have filed patents for:
These aren’t science fiction. They’re filed patents with assigned numbers.
What Happens When Your Data Company Goes Bankrupt?
Your data becomes an asset that gets sold to whoever bids highest.
When 23andMe started circling the drain, privacy groups raised alarms — your genetic data could end up owned by anyone. The FTC has tried to intervene in these cases before (Toysmart, RadioShack, Borders), but enforcement is weak.
Your DNA is the new Bitcoin — except you can’t delete it, and you gave it away for free to find out you’re 3% Irish.
🤖 AI's Appetite: The Machines Are Hungry
How Many Cat Photos Does AI Need?
Surprisingly few, actually. Modern techniques can train accurate classifiers with ~1000 images. But that’s for simple stuff.
For something like GPT-4? We’re talking trillions of tokens of text. Billions of images. Basically the entire indexed internet, multiple times over.
Are Memes Junk Food For AI?
Academic researchers actually study this. Memes are hard for AI because they require cultural context, irony detection, and understanding of constantly evolving references.
Training AI on memes is like feeding a robot a diet of pure absurdism. Some researchers do it anyway.
Model Collapse: AI Eating Its Own Vomit
This is getting real. When AI trains on AI-generated content, each generation gets worse. Research confirms that model collapse is inevitable when synthetic data dominates training sets.
The problem: The internet is filling up with AI slop. Soon, AI companies won’t be able to find clean human-generated content. They’ll have to train on each other’s outputs. Quality degrades. Recursion goes brrrr.
AI Winters — Will It Happen Again?
AI has crashed before:
- Mid-1970s: First AI winter
- Late 1980s-early 1990s: Second AI winter
Both times: hype exceeded reality, funding dried up, researchers scattered.
LessWrong debates rage about whether it’ll happen again. The current argument against: cloud computing and big data changed the equation. The argument for: energy costs and model collapse might force a reckoning.
Carbon Footprint Nobody Talks About
Training GPT-3 emitted as much carbon as five cars over their entire lifetimes.
GPT-4 was bigger. GPT-5 will be bigger still. Each model requires more compute, more energy, more cooling.
Track it yourself: ML CO2 Impact Calculator
⚔️ Geopolitics: The Data Wars Are Real
The Global Data Arms Race
This isn’t hyperbole. The Atlantic Council mapped it: submarine cables that carry 99% of intercontinental data are strategic assets. Countries are racing to control them.
The Five Eyes alliance (US, UK, Canada, Australia, New Zealand) runs the largest data collection operation in history. Snowden exposed it. Nothing changed.
Data-Rich vs Data-Poor Nations
There’s a new form of colonialism happening. Researchers call it digital colonialism.
Developing nations generate data. That data flows to servers in the US and China. Those countries train AI. That AI gets sold back to developing nations. The data extraction follows old colonial patterns.
Countries Building Their Own Internets
- Russia’s RuNet: A sovereign internet that can disconnect from the global web
- China’s Great Firewall: The OG model
- EU’s GDPR: Soft sovereignty through regulation
- India’s data localization: All data about Indians must stay in India
The global internet is fragmenting. Stanford calls it “The Splinternet”.
Submarine Cables: The Actual Suez Canal of Data
99% of intercontinental internet traffic goes through undersea cables. There are only ~500 of them. Cut a few strategic ones and entire continents go dark.
The Arctic is becoming a battleground for cable routes as ice melts. Russia has been suspiciously active near cables in the Atlantic.
TikTok: The Case Study
The TikTok ban drama is really about this: CFIUS (Committee on Foreign Investment) decided that Chinese access to American user data is a national security threat.
Whether you agree or not, it shows how seriously governments take data as a strategic asset.
🔮 Future Shock: What's Coming
Running Out of Training Data
PBS reported it: high-quality human-written text could be exhausted by 2026. Nature confirmed it.
Solutions being explored:
- Synthetic data: AI-generated training data (causes model collapse — see above)
- Licensed content: Pay publishers for training data (expensive)
- Private data: Corporate emails, internal documents (privacy nightmare)
Synthetic Data: Lab-Grown Meat For AI
The idea: instead of scraping the internet, generate fake data that looks real.
It works… sometimes. Researchers are building whole toolkits for it. Microsoft has an entire project.
The catch: synthetic data carries the biases of the model that generated it. It’s not a clean solution.
Data Unions: Collective Bargaining For Your Bits
What if everyone who used Facebook formed a union and collectively bargained for their data’s value?
It’s being explored. The idea is labor unions, but for data. You can’t delete Facebook alone — but millions acting together have leverage.
Data Dividends: Getting Paid
California floated a “data dividend” proposal — tech companies would pay residents a share of the profits from their data.
It hasn’t happened yet. Critics say it would just raise prices and reduce service quality.
Data Poisoning: Digital Sabotage
Artists are fighting back against AI by “poisoning” their images — subtle changes that break AI training.
Tools like Glaze and Nightshade let creators protect their work. GitHub has a whole collection of poisoning techniques.
Privacy activists are exploring similar approaches: generating fake data to corrupt surveillance profiles.
Will Data Become Worthless?
Possibly. If everyone generates data constantly, and AI can create infinite synthetic data, supply could overwhelm demand.
Some researchers argue we’re moving from data scarcity to data abundance. The value might shift from having data to having trusted data.
Data Archaeologists
When companies die, their databases survive. There’s now a field — data archaeology — dedicated to excavating value from abandoned digital systems.
The MySpace exodus, Vine archives, dead social networks — all contain cultural artifacts that researchers are now mining.
Resources Worth Your Time
📊 Track The Chaos
Dark Web Price Tracking:
Dead Platforms Memorial:
AI Carbon Impact:
🔒 Privacy & Security Research
Brian Krebs’ Greatest Hits (security journalism that actually matters):
Privacy Org Reports:
FTC Resources (know your enemy… and protector):
📚 Deep Reading (Academic + Policy)
Data Monopolies:
Digital Colonialism:
Geopolitics:
AI Training Data Crisis:
🧪 Tools & Repos
Synthetic Data Generation:
Data Poisoning (for the spicy folks):
Weird Datasets (for the curious):
🎤 Forum Discussions Worth Reading
Hacker News:
LessWrong (AI forecasting + philosophy):
The Bottom Line
If you’re building something: Code is free. Data is the moat. Figure out how to collect or own your data.
If you’re investing: Companies with proprietary data > companies with just good code.
If you’re a user: Your data made these companies rich. They got the oil. You got free services and targeted anxiety.
The future: Whoever controls data controls AI. Whoever controls AI controls… a lot.
Data = new oil. Big tech pumped it. Now they own the wells. 
Source for original quote: Fortune Brainstorm Tech 2016 — Shivon Zilis, Bloomberg Beta