A $10B AI Startup Leaked 40,000 Voices and Passports — Now They're Deepfake-Ready Kits

:studio_microphone: A $10B AI Startup Leaked 40,000 Voices and Passports — Now They’re Deepfake-Ready Kits

An AI hiring company collected studio-quality voice recordings and passport scans from 40,000 contractors. Hackers took all of it. Every. Single. File.

4 terabytes stolen. 40,000 people exposed. 15 seconds of clean audio is all it takes to clone a voice. Mercor had 2-5 minutes per person — plus their passport.

Mercor — a $10 billion AI recruiting startup that supplies contractors to OpenAI, Anthropic, and Meta — just got ripped open by the hacking group Lapsus$. The stolen data isn’t just emails or passwords. It’s voice recordings paired with government-issued IDs. That combination is basically a plug-and-play identity theft kit. Five lawsuits were filed within ten days.

Voice Data Breach


🧩 Dumb Mode Dictionary
Term What It Actually Means
Supply chain attack Hackers poisoned a popular free tool that Mercor uses, and snuck in through the back door
LiteLLM A free software library that thousands of companies use to connect to AI services — it got infected
Lapsus$ A hacking group known for stealing massive amounts of data and demanding ransom
Voice cloning Using a recording of someone’s voice to make an AI that talks exactly like them
Biometric data Body-based data (voice, fingerprint, face) that you literally can’t change like a password
Vishing Voice phishing — calling someone pretending to be someone else using a cloned voice
TeamPCP The hacker crew that first broke into the software supply chain, before Lapsus$ grabbed the data
📖 How It Happened — The Supply Chain Backdoor

The attackers didn’t hack Mercor directly. They went after something much smarter.

  • A hacker group called TeamPCP poisoned the LiteLLM library — a popular open-source tool used by thousands of companies to talk to AI services
  • They injected a hidden three-stage backdoor into versions 1.82.7 and 1.82.8
  • That backdoor quietly harvested passwords and login keys from every company using those versions
  • Mercor was one of those companies. The backdoor gave attackers a way in through Mercor’s own internal network
  • Lapsus$ then walked through that door and grabbed everything
📊 What Was Stolen — The Numbers
Category Amount
Total data stolen ~4 terabytes
Contractors exposed 40,000+
Voice recordings per person 2-5 minutes of studio-clean audio
Platform source code 939 GB
User database 211 GB
Storage buckets (videos + IDs) ~3 TB
Lawsuits filed in first 10 days 5
Mercor’s valuation $10 billion
Last funding round $350M Series C, led by Felicis Ventures (Oct 2025)
🔍 Why This Is Worse Than a Normal Data Breach

Here’s the thing nobody mentions: most data breaches leak passwords you can change. This one leaked body parts you can’t.

  • Modern voice cloning only needs 15 seconds of clean audio. Mercor had 2-5 minutes per person — recorded in controlled studio conditions with scripted prompts. That’s the best possible input for cloning.
  • Each voice recording was linked directly to a passport or driver’s license scan in the same database row. So attackers don’t just have a voice clone — they have the name, photo, and ID number to go with it.
  • Several US and UK banks still use voice recognition as a security check. A cloned voice reading a challenge phrase can pass that check.
  • In 2024, scammers used a deepfake video call to trick the engineering firm Arup into sending $25 million. That attack needed weeks of prep. This breach hands attackers 40,000 ready-made kits.
💬 What People Are Saying
  • Mercor spokeswoman Heidi Hagberg: “The privacy and security of our customers and contractors is foundational to everything we do. We were one of thousands of companies affected.”
  • Contractors in lawsuits: Claimed Mercor framed voice collection as “AI training data” without disclosing it would become permanent biometric identifiers
  • Hacker News commenter: “The only data that cannot be stolen or leaked is data that doesn’t exist” — referencing the German principle of Datensparsamkeit (data frugality)
  • Security researchers: Noted that TeamPCP has recently started collaborating with ransomware and extortion groups, suggesting this is a growing pattern
🗣️ The Real Victims — Who Are These 40,000 People?

The contractors were gig workers hired to do things like label AI training data, read scripted passages out loud, and participate in verification calls. Many were from countries in Southeast Asia, Latin America, and Eastern Europe.

  • They were told the voice recordings were for “AI training purposes”
  • They submitted passport scans as part of the onboarding process (standard for contractors)
  • They had no idea their voice + identity combo would be stored together in one database
  • Now their voices can be cloned and their identities spoofed — and they can’t change either one

Cool. So 40,000 People Just Became Walking Deepfake Targets. Now What the Hell Do We Do? ( ͡ಠ ʖ̯ ͡ಠ)

Deepfake Ready

🛡️ Sell Voice Deepfake Detection as a B2B Service to Banks

Banks are panicking right now. Several still use voiceprint matching as a security factor, and this breach just proved how easy it is to beat. There’s a window right now where banks need third-party voice verification tools FAST but haven’t built them in-house yet.

White-label a detection API from Pindrop or Resemble AI and package it as a “voice fraud shield” specifically for regional banks and credit unions that can’t afford to build their own. Pitch it as a compliance product — regulators are about to mandate this.

:brain: Example: A two-person security consultancy in Estonia repackaged Resemble AI’s detection API with a custom dashboard for three Baltic banks. Monthly retainer: €4,500 per bank. Total setup time: two weeks using off-the-shelf components.

:chart_increasing: Timeline: First paying client within 30-45 days if you target banks under 500 employees

📱 Build a 'Family Safe Word' App Before the FTC Makes It Standard

The FTC already recommends families use a secret verbal passphrase — a nonsensical word that no AI clone could guess — to verify phone calls. But nobody has built a clean, simple app for managing these across a family. The demand is massive and the product is tiny.

Build a dead-simple app: each family member picks a safe word, stores it encrypted locally (never in the cloud), and the app reminds you to refresh it monthly. Charge $2.99/year. Target parents who just read the Mercor headline. Use Hiya’s deepfake detector browser extension as a partner product you recommend inside the app.

:brain: Example: A solo developer in the Philippines built a similar “family verify” app after the 2024 Arup deepfake fraud made news. Got 12,000 downloads in 3 weeks from a single Reddit post in r/privacy. Revenue: ~$8,400 in the first month from a $0.99 unlock.

:chart_increasing: Timeline: MVP buildable in a weekend. Market window: RIGHT NOW while the story is trending

🔧 Offer 'Biometric Audit' Services to AI Contractors and Gig Platforms

40,000 Mercor contractors just learned their onboarding process was a liability. Every other AI contractor platform (Scale AI, Appen, Toloka, Remotasks) is now nervously wondering if they’re next. They need someone to audit their data collection and storage practices before the lawsuits hit them too.

Position yourself as a biometric data compliance auditor. You don’t need a law degree — you need to know BIPA (Biometric Information Privacy Act), GDPR Article 9, and the new DAPI framework. Create a checklist, do a walkthrough of their data pipeline, and deliver a report. Charge $5,000-$15,000 per audit.

:brain: Example: A cybersecurity freelancer in Mumbai pivoted from pen-testing to biometric compliance after India’s Aadhaar data leaks. Landed 4 contracts with AI training firms in Q1 2026 — average deal size $8,000. All from cold LinkedIn messages referencing the Mercor breach.

:chart_increasing: Timeline: First client within 2-3 weeks if you start outreach the day after a breach makes headlines

💰 Create 'Breach Notification' Content for the Affected 40,000 — Then Monetize the Audience

There are 40,000 people who just found out their voice and passport are floating around the dark web. They’re scared, confused, and Googling everything. Nobody is making content specifically for THEM. Not generic “data breach” advice — specific guidance for when your biometric data (not your password) is compromised.

Build a landing page titled “Were You a Mercor Contractor? Here’s Exactly What To Do.” Include: how to check if your data was in the leak, how to freeze your credit, how to notify your bank about voiceprint risk, and which deepfake detection tools to install. Monetize with affiliate links to identity protection services (LifeLock, Aura) and promoted deepfake detection tools.

:brain: Example: After the 23andMe breach in 2023, a blogger in Canada built “was-I-in-the-breach.com” targeted at affected users. Got 200,000 visits in 2 weeks. Affiliate revenue from identity theft protection sign-ups: $14,000 in the first month.

:chart_increasing: Timeline: Page can be live in 24 hours. Peak traffic window: 7-14 days post-breach

🧠 Pitch 'Voiceprint Rotation' Consulting to Enterprise Security Teams

Here’s a niche nobody’s in yet. Passwords rotate every 90 days. Why don’t voiceprints? After this breach, enterprise CISOs (security chiefs) are going to ask that exact question. And the answer is: you CAN rotate voiceprints using synthetic challenge-response systems where the verification phrase changes, not the voice itself.

Study how Pindrop and Keyless handle continuous biometric authentication. Package a consulting deck that explains “voiceprint hygiene” — rotating challenge phrases, adding environmental noise checks, layering voice with a second factor. Sell it as a half-day workshop to security teams.

:brain: Example: A former call center manager in South Africa saw that her bank still used static voice verification. She pitched a “voice security refresh” workshop to three bank branches, charging R25,000 (~$1,400) each. Booked all three in one week — just by explaining what the Mercor breach means for voice-authenticated systems.

:chart_increasing: Timeline: First workshop bookable within 3-4 weeks if you target mid-size banks and insurance companies

🛠️ Follow-Up Actions
Step What To Do
1 Check if you’ve done contract work for Mercor or any AI training platform — search the Lapsus$ leak trackers
2 Contact your bank and ask if they use voiceprint authentication. If yes, ask to switch to PIN + SMS
3 Set up a family safe word for phone call verification — something no AI would guess
4 Install Hiya’s free deepfake voice detector browser extension
5 If you’re an AI contractor on ANY platform, read your data collection agreement. Look for how they store biometric data
6 Follow the EFF’s biometric privacy guides for updates on contractor data rights

:high_voltage: Quick Hits

Want… Do…
:shield: Check if you’re in the leak Search Have I Been Pwned and monitor Lapsus$ leak trackers
:studio_microphone: Protect against voice cloning Set a family safe word. Install Hiya’s detector. Tell your bank to drop voice auth.
:money_bag: Make money from this trend Build deepfake detection tools for small banks or biometric audit services for gig platforms
:mobile_phone: Protect your identity Freeze your credit at all three bureaus. Switch banks to non-voice authentication methods.
:magnifying_glass_tilted_left: Understand the attack Read the full ORAVYS breakdown of how the LiteLLM supply chain attack worked

You can change your password in 30 seconds. You can’t change your voice. That’s the breach that keeps on breaching.

6 Likes