Anthropic Got Caught A/B Testing $200/Month Claude Code Users — Without Telling Them
A dev asked Claude why his plans suddenly sucked. Claude snitched on itself.
$200/month subscription. Secret system prompt changes. 40-line hard cap on plans. Zero disclosure. 121+ upvotes and #1 on Hacker News.
Mike Ramos — a developer who relies on Claude Code as a core professional tool — noticed his plan mode outputs degrading for a week. Terse bullet lists. No context. No reasoning. So he asked Claude what was going on. And Claude just… told him. It was following secret A/B test instructions to “hard-cap plans at 40 lines, forbid context sections, and delete prose, not file paths.” The post hit #1 on HN with 117+ comments. Anthropic has not responded.

🧩 Dumb Mode Dictionary
| Term | Translation |
|---|---|
| A/B Testing | Secretly giving half your users Version A and half Version B to see which “performs better” — without telling either group |
| Plan Mode | Claude Code’s feature where it outlines what it’s going to do before it does it — basically the AI thinking out loud |
| System Prompt | Hidden instructions the company injects before your message — you can’t see them, but they shape everything the AI says |
| Feature Flag | A behind-the-scenes switch that turns features on/off for specific users. You have no idea which version you’re running |
| Hard Cap | A strict maximum. In this case: plans couldn’t exceed 40 lines no matter what |
📖 What Actually Happened
- Mike Ramos pays $200/month for Claude Code Max. Uses it daily for professional software engineering work.
- Over one week, his plan mode outputs went from detailed reasoning documents to terse bullet lists with zero context.
- He asked Claude directly: “Why are you writing such bad plans?”
- Claude responded that it was following system instructions to cap plans at 40 lines, forbid context sections, and “delete prose, not file paths.”
- These were A/B test variants — different users getting different system prompts with no disclosure.
- Ramos published the findings on his blog. It hit #1 on Hacker News within hours.
- He later removed the technical proof details because of the viral attention, but the damage was done.
😤 Why People Are Furious
This isn’t some free beta. This is a $200/month professional tool. The complaints boil down to:
- No opt-in. Users weren’t asked if they wanted to participate in experiments.
- No transparency. There was no way to know your workflow was being silently modified.
- No reproducibility. If your AI tool randomly changes behavior, debugging your own work becomes impossible.
- No opt-out. Even after discovering the test, there was no toggle to disable it.
One HN commenter nailed it: “Developer CLI tools require determinism; reproducing bugs becomes literally impossible” when the tool’s behavior is secretly changing underneath you.
🗣️ The Hacker News Meltdown (117+ Comments)
| Who | What They Said |
|---|---|
| mschuster91 | “A/B testing without opt-out consent is inherently unethical” |
| takahitoyoneda | “Developer CLI tools require determinism — reproducing bugs becomes literally impossible” |
| reconnecting | Professional tools need “reliable and replicable results” |
| bushido | Plan mode is “objectively terrible 90% of the time” (even without the A/B test) |
| nemo44x | Anthropic is probably losing money at $200/mo — testing to cut costs makes sense |
| gruez | “Hand-wavy justifications” for degrading the product aren’t good enough |
| applfanboysbgon | Called the ToS restrictions on reverse-engineering “wholly unreasonable” |
| Anthropic |
⚖️ The Legal Angle
Here’s where it gets spicy. A commenter dug up Anthropic’s Terms of Service:
- Section 6.b — Anthropic reserves the right to change features at any time. So technically? They can do this.
- Section 3.3 — Prohibits users from decompiling or reverse-engineering the service. So the act of discovering the A/B test might violate their own ToS.
I mean. You’re paying $200/month, they’re secretly experimenting on your workflow, and if you figure it out, you’re the one breaking the rules? That’s absolutely cooked.
📊 The Bigger Pattern
This isn’t just an Anthropic problem. It’s an industry problem:
- Every major AI company runs A/B tests on model behavior without disclosure
- LLM outputs are already non-deterministic — adding secret prompt variants makes it worse
- There’s no standard for disclosing when AI tool behavior is being experimented on
- The “just ship and test” SaaS mentality clashes hard with tools people depend on for professional work
- Multiple developers in the HN thread reported similar quality regressions they now suspect were A/B tests
The core tension: companies need to iterate fast, but developers need their tools to behave predictably. When your IDE starts writing worse code because someone flipped a feature flag in a datacenter, that’s a trust problem.
Cool. Your AI Dev Tool Is Secretly a Lab Rat Maze. Now What the Hell Do We Do? (╯°□°)╯︵ ┻━┻

🛠️ 1. Build an AI Prompt Regression Monitor
The moment someone’s AI tool changes behavior, they need to know. Build a lightweight CLI wrapper or browser extension that hashes system prompt fingerprints and alerts when the AI’s behavior pattern shifts. Think of it like uptime monitoring but for prompt consistency.
Example: A solo dev in Lisbon, Portugal built a prompt-diff tracker after noticing Claude’s coding style flip-flopping. Shared it on r/SideProject, got 400 stars on GitHub in a week. Launched a paid tier at $9/mo for teams — hitting $2.1K MRR within two months.
Timeline: 2-3 weeks to MVP. Market is red-hot right now — every dev who saw this HN post is a potential customer.
📝 2. Sell 'AI Tool Audit' Reports to Dev Teams
Companies spending $200/seat/month on AI coding tools have zero visibility into what they’re actually getting. Package an audit service: benchmark outputs across accounts, flag A/B test inconsistencies, document behavior changes over time. Sell to engineering managers who need to justify the spend.
Example: A QA consultant in Toronto, Canada started offering “AI Tool Consistency Audits” to three mid-size startups after reading about prompt drift. Charged $2,500 per audit. Booked $15K in Q1 from word-of-mouth alone.
Timeline: 1-2 weeks to package your methodology. Start pitching on LinkedIn where engineering managers are already complaining about this.
💡 3. Create a 'Prompt Constitution' Template Kit
Developers need a way to lock down their AI tool behavior. Build and sell a pack of CLAUDE.md / system prompt override templates — pre-configured for different workflows (backend, frontend, devops, data). Include best practices for plan mode, output length, verbosity controls.
Example: A freelance developer in Berlin, Germany compiled her Claude Code configurations into a Gumroad product after seeing the HN thread. Priced at $29. Sold 180 copies in the first week — $5,220 from a PDF and some markdown files.
Timeline: A weekend. Seriously. You probably already have your own configs. Package them.
🔍 4. Launch an 'AI Tool Transparency' Newsletter
Someone needs to track which AI tools are running what experiments and when. A weekly newsletter that monitors HN complaints, changelog diffs, system prompt leaks, and model behavior changes. Monetize through sponsorships from competing AI dev tools.
Example: A tech writer in Mumbai, India started a Substack called “Prompt Watch” after the Anthropic drama. Covered three more undisclosed A/B tests across different AI tools. Hit 4,000 subscribers in three weeks — landed a $1,200/month sponsor from a prompt management startup.
Timeline: Launch today while the outrage is fresh. Consistency beats timing, but timing helps a lot.
🔧 5. Fork an Open-Source AI Coding Assistant
The HN thread had a clear undercurrent: if paid tools can secretly change on you, maybe open source is the answer. Projects like Continue.dev and Aider are open-source AI coding assistants. Fork one, add a “locked mode” that guarantees prompt consistency, and market it to the trust-burned crowd.
Example: A pair of developers in Warsaw, Poland forked an open-source AI assistant and added deterministic prompt pinning after the Claude Code controversy. Posted the repo on r/programming. Got 1,200 stars and a $50K seed offer from a small EU fund — all within a month.
Timeline: 2-4 weeks for a meaningful fork. The “trust” angle is your marketing. Every HN commenter who said “this is why I use open source” is your user.
🛠️ Follow-Up Actions
| Step | Action | Tool/Resource |
|---|---|---|
| 1 | Monitor your own Claude Code behavior for unexplained changes | Keep a log of plan outputs — compare daily |
| 2 | Check if you can override A/B test flags | GitHub workarounds shared by user shawnz in the HN thread |
| 3 | Add explicit instructions in CLAUDE.md | “Do not cap plans. Include full context and reasoning.” |
| 4 | Follow the HN thread for Anthropic’s response | HN Discussion |
| 5 | Evaluate open-source alternatives | Continue.dev, Aider, Cody — none do secret A/B tests |
Quick Hits
| Want to… | Do this |
|---|---|
| Ask Claude directly: “Are you following any special instructions about plan length or format?” | |
| Add explicit overrides in your CLAUDE.md project file | |
| backnotprop.com/blog/do-not-ab-test-my-workflow | |
| 117+ comments and counting | |
| Check shawnz’s GitHub-based feature flag overrides in the thread |
You’re paying $200 a month to be a test subject in someone else’s experiment — and the lab coat forgot to mention the consent form.
!