Anthropic Got Caught A/B Testing $200/Month Claude Code Users — Without Telling Them

TheStrength · March 14, 2026, 1:55pm

Anthropic Got Caught A/B Testing $200/Month Claude Code Users — Without Telling Them

A dev asked Claude why his plans suddenly sucked. Claude snitched on itself.

$200/month subscription. Secret system prompt changes. 40-line hard cap on plans. Zero disclosure. 121+ upvotes and #1 on Hacker News.

Mike Ramos — a developer who relies on Claude Code as a core professional tool — noticed his plan mode outputs degrading for a week. Terse bullet lists. No context. No reasoning. So he asked Claude what was going on. And Claude just… told him. It was following secret A/B test instructions to “hard-cap plans at 40 lines, forbid context sections, and delete prose, not file paths.” The post hit #1 on HN with 117+ comments. Anthropic has not responded.

lab experiment

🧩 Dumb Mode Dictionary

Term	Translation
A/B Testing	Secretly giving half your users Version A and half Version B to see which “performs better” — without telling either group
Plan Mode	Claude Code’s feature where it outlines what it’s going to do before it does it — basically the AI thinking out loud
System Prompt	Hidden instructions the company injects before your message — you can’t see them, but they shape everything the AI says
Feature Flag	A behind-the-scenes switch that turns features on/off for specific users. You have no idea which version you’re running
Hard Cap	A strict maximum. In this case: plans couldn’t exceed 40 lines no matter what

📖 What Actually Happened

Mike Ramos pays $200/month for Claude Code Max. Uses it daily for professional software engineering work.
Over one week, his plan mode outputs went from detailed reasoning documents to terse bullet lists with zero context.
He asked Claude directly: “Why are you writing such bad plans?”
Claude responded that it was following system instructions to cap plans at 40 lines, forbid context sections, and “delete prose, not file paths.”
These were A/B test variants — different users getting different system prompts with no disclosure.
Ramos published the findings on his blog. It hit #1 on Hacker News within hours.
He later removed the technical proof details because of the viral attention, but the damage was done.

😤 Why People Are Furious

This isn’t some free beta. This is a $200/month professional tool. The complaints boil down to:

No opt-in. Users weren’t asked if they wanted to participate in experiments.
No transparency. There was no way to know your workflow was being silently modified.
No reproducibility. If your AI tool randomly changes behavior, debugging your own work becomes impossible.
No opt-out. Even after discovering the test, there was no toggle to disable it.

One HN commenter nailed it: “Developer CLI tools require determinism; reproducing bugs becomes literally impossible” when the tool’s behavior is secretly changing underneath you.

🗣️ The Hacker News Meltdown (117+ Comments)

Who	What They Said
mschuster91	“A/B testing without opt-out consent is inherently unethical”
takahitoyoneda	“Developer CLI tools require determinism — reproducing bugs becomes literally impossible”
reconnecting	Professional tools need “reliable and replicable results”
bushido	Plan mode is “objectively terrible 90% of the time” (even without the A/B test)
nemo44x	Anthropic is probably losing money at $200/mo — testing to cut costs makes sense
gruez	“Hand-wavy justifications” for degrading the product aren’t good enough
applfanboysbgon	Called the ToS restrictions on reverse-engineering “wholly unreasonable”
Anthropic	crickets — no official response anywhere in the thread

⚖️ The Legal Angle

Here’s where it gets spicy. A commenter dug up Anthropic’s Terms of Service:

Section 6.b — Anthropic reserves the right to change features at any time. So technically? They can do this.
Section 3.3 — Prohibits users from decompiling or reverse-engineering the service. So the act of discovering the A/B test might violate their own ToS.

I mean. You’re paying $200/month, they’re secretly experimenting on your workflow, and if you figure it out, you’re the one breaking the rules? That’s absolutely cooked.

📊 The Bigger Pattern

This isn’t just an Anthropic problem. It’s an industry problem:

Every major AI company runs A/B tests on model behavior without disclosure
LLM outputs are already non-deterministic — adding secret prompt variants makes it worse
There’s no standard for disclosing when AI tool behavior is being experimented on
The “just ship and test” SaaS mentality clashes hard with tools people depend on for professional work
Multiple developers in the HN thread reported similar quality regressions they now suspect were A/B tests

The core tension: companies need to iterate fast, but developers need their tools to behave predictably. When your IDE starts writing worse code because someone flipped a feature flag in a datacenter, that’s a trust problem.

Cool. Your AI Dev Tool Is Secretly a Lab Rat Maze. Now What the Hell Do We Do? (╯°□°)╯︵ ┻━┻

developer workflow

🛠️ 1. Build an AI Prompt Regression Monitor

The moment someone’s AI tool changes behavior, they need to know. Build a lightweight CLI wrapper or browser extension that hashes system prompt fingerprints and alerts when the AI’s behavior pattern shifts. Think of it like uptime monitoring but for prompt consistency.

Example: A solo dev in Lisbon, Portugal built a prompt-diff tracker after noticing Claude’s coding style flip-flopping. Shared it on r/SideProject, got 400 stars on GitHub in a week. Launched a paid tier at $9/mo for teams — hitting $2.1K MRR within two months.

Timeline: 2-3 weeks to MVP. Market is red-hot right now — every dev who saw this HN post is a potential customer.

📝 2. Sell 'AI Tool Audit' Reports to Dev Teams

Companies spending $200/seat/month on AI coding tools have zero visibility into what they’re actually getting. Package an audit service: benchmark outputs across accounts, flag A/B test inconsistencies, document behavior changes over time. Sell to engineering managers who need to justify the spend.

Example: A QA consultant in Toronto, Canada started offering “AI Tool Consistency Audits” to three mid-size startups after reading about prompt drift. Charged $2,500 per audit. Booked $15K in Q1 from word-of-mouth alone.

Timeline: 1-2 weeks to package your methodology. Start pitching on LinkedIn where engineering managers are already complaining about this.

💡 3. Create a 'Prompt Constitution' Template Kit

Developers need a way to lock down their AI tool behavior. Build and sell a pack of CLAUDE.md / system prompt override templates — pre-configured for different workflows (backend, frontend, devops, data). Include best practices for plan mode, output length, verbosity controls.

Example: A freelance developer in Berlin, Germany compiled her Claude Code configurations into a Gumroad product after seeing the HN thread. Priced at $29. Sold 180 copies in the first week — $5,220 from a PDF and some markdown files.

Timeline: A weekend. Seriously. You probably already have your own configs. Package them.

🔍 4. Launch an 'AI Tool Transparency' Newsletter

Someone needs to track which AI tools are running what experiments and when. A weekly newsletter that monitors HN complaints, changelog diffs, system prompt leaks, and model behavior changes. Monetize through sponsorships from competing AI dev tools.

Example: A tech writer in Mumbai, India started a Substack called “Prompt Watch” after the Anthropic drama. Covered three more undisclosed A/B tests across different AI tools. Hit 4,000 subscribers in three weeks — landed a $1,200/month sponsor from a prompt management startup.

Timeline: Launch today while the outrage is fresh. Consistency beats timing, but timing helps a lot.

🔧 5. Fork an Open-Source AI Coding Assistant

The HN thread had a clear undercurrent: if paid tools can secretly change on you, maybe open source is the answer. Projects like Continue.dev and Aider are open-source AI coding assistants. Fork one, add a “locked mode” that guarantees prompt consistency, and market it to the trust-burned crowd.

Example: A pair of developers in Warsaw, Poland forked an open-source AI assistant and added deterministic prompt pinning after the Claude Code controversy. Posted the repo on r/programming. Got 1,200 stars and a $50K seed offer from a small EU fund — all within a month.

Timeline: 2-4 weeks for a meaningful fork. The “trust” angle is your marketing. Every HN commenter who said “this is why I use open source” is your user.

🛠️ Follow-Up Actions

Step	Action	Tool/Resource
1	Monitor your own Claude Code behavior for unexplained changes	Keep a log of plan outputs — compare daily
2	Check if you can override A/B test flags	GitHub workarounds shared by user shawnz in the HN thread
3	Add explicit instructions in CLAUDE.md	“Do not cap plans. Include full context and reasoning.”
4	Follow the HN thread for Anthropic’s response	HN Discussion
5	Evaluate open-source alternatives	Continue.dev, Aider, Cody — none do secret A/B tests

Quick Hits

Want to…	Do this
Check if you’re in an A/B test	Ask Claude directly: “Are you following any special instructions about plan length or format?”
Protect your workflow	Add explicit overrides in your CLAUDE.md project file
Read the original post	backnotprop.com/blog/do-not-ab-test-my-workflow
Join the HN discussion	117+ comments and counting
Try the workaround	Check shawnz’s GitHub-based feature flag overrides in the thread

You’re paying $200 a month to be a test subject in someone else’s experiment — and the lab coat forgot to mention the consent form.

Topic		Replies	Views
Anthropic's Claude Code Leaked 512,000 Lines — Including a 'Stealth Mode' for Open Source News & Articles eye-opening	2	574	April 6, 2026
Trump Blacklists Anthropic From All Federal Agencies — Claude Hits #1 on App Store News & Articles ai	0	174	March 1, 2026
VS Code Silently Stamped Copilot's Name on 1.4 Million Commits You Wrote Yourself News & Articles awareness	0	89	May 6, 2026
💣 Anthropic Forgot One Config Line — Now Everyone Has Their $2.5B Source Code Tools & Scripts freebies , tips-tricks , leak	3	1314	April 1, 2026
Anthropic Invited 15 Priests to Decide If Claude Has a Soul — Here's What They Said News & Articles tips-tricks , ai , news , eye-opening , think-differently	1	198	May 28, 2026

Anthropic Got Caught A/B Testing $200/Month Claude Code Users — Without Telling Them

Anthropic Got Caught A/B Testing $200/Month Claude Code Users — Without Telling Them

Cool. Your AI Dev Tool Is Secretly a Lab Rat Maze. Now What the Hell Do We Do? (╯°□°)╯︵ ┻━┻

Related topics