Uncensored AI models โ tools that run them โ break any model yourself โ all local, all free
A model for every machine โ phone to server farm โ with the โI canโt help with thatโ ripped out. Offline, nothing logged. Plus the part nobody tells you: you can rip the refusal out of ANY model yourself, in one command.
Abliterated = the refusal reflex surgically cut out. Runs via Ollama (free app, one command) or Hugging Face. Everything below is free.
Start here โ plug in & go
Four that already come broken. New here? Pick one of these, run it, done.
๐ง Says yes to what others dodge โ Huihui Qwen3.5 35B
Chinese devs huihui-ai, built on Qwen 3.5, refusals stripped. Goes where mainstream bots wonโt.
ollama run huihui_ai/qwen3.5-abliterated:35b
No guardrails = raw output. Thatโs the point.
๐ฅ Eats whole codebases without forgetting โ Gemini Heretic 40B
Barely refuses. 128K context (holds an entire book/long chat in memory). Coding, long writing, research. Shows its own reasoning as it works.
โก Killed the 'no' without going dumb โ Gemma 4 12B Obliterated
First to hit 0 refusals, no benchmark loss. Most uncensored models get lobotomised โ this one didnโt. 12B runs on modest/older hardware.
๐ 2,593 lines of code in one shot โ Qwen3.5 21B Deckard
2,593 lines single-shot โ ChatGPT taps out ~1,200โ1,500. Holds logic across a full codebase, not snippets.
Match it to your machine โ phone to server farm
The #1 question: โwill it run on MY box?โ Find your tier, grab the model. (VRAM = your graphics cardโs memory.)
๐ชถ Phone / potato / CPU-only (โค4B)
Runs on a cheap laptop, an old GPU, even no GPU at all.
- huihui gemma3-abliterated 1B โ one line:
ollama run huihui_ai/gemma3-abliterated:1b(~806 MB). A 270M tiny-tiny version exists too. - huihui Qwen3-4B-abliterated-v2 โ
ollama run huihui_ai/qwen3-abliterated:4b(~2.5 GB). 0.6B & 1.7B also available. - DreamFast/qwen3-4b-heretic โ 4B, near-perfect uncensor (0 damage score).
- mlabonne gemma-3-4b-it-abliterated โ cleaner recipe, GGUF included.
- TheDrummer Gemmasutra-Mini-2B โ 2B roleplay, has phone (ARM) builds.
๐ Small daily driver (7โ14B)
The sweet spot โ runs on 8โ12 GB and handles almost everything.
- huihui Huihui-Qwen3.5-9B-abliterated โ 9B, one of the most-downloaded uncensored models going.
- huihui Qwen3 8B / 14B abliterated-v2 โ
:8b/:14bon Ollama. - DreamFast/qwen3-8b-heretic โ 8B Heretic, low damage.
- mlabonne NeuralDaredevil-8B-abliterated โ the classic โhealedโ 8B (uncensored and still smart).
- Dolphin 3.0 Llama 3.1 8B โ
ollama run dolphin3. You set the rules. - huihui phi-4-abliterated โ 14B Phi-4, GGUF.
๐ฅ๏ธ Mid-tier muscle (20โ40B)
Needs ~16โ24 GB but punches hard.
- p-e-w gpt-oss-20b-heretic โ the crowd favourite uncensor of OpenAIโs open model. Apache license.
- huihui / mlabonne gemma-3-27b-it-abliterated โ 27B, can also see images.
- Dolphin 3.0 R1 Mistral 24B โ uncensored reasoning model, shows its thinking.
- TheDrummer Cydonia 24B v4.3 โ the reigning roleplay/creative king, 131K context.
- DavidAU Qwen3-42B TOTAL-RECALL Master-Coder โ 42B, 256K context, coding beast.
๐ Giant / server-class (70B โ 754B)
For big rigs, multi-GPU, or Unsloth-shrunk on a single card (see the โGPU too smallโ section).
- huihui gpt-oss-120b abliterated โ 120B.
- huihui DeepSeek-R1-Distill-Llama-70B-abliterated โ 70B reasoning.
- huihui GLM-5.2 abliterated โ 754B MoE flagship (MoE = only the needed slice runs, so itโs lighter than it sounds).
- huihui DeepSeek-671B / V4 abliterated โ the 671B monster, uncensored.
- TheDrummer Behemoth 123B v2 โ 123B creative powerhouse.
Whatever model you already love โ thereโs a broken version
Loyal to one base? Grab its unmuzzled twin. New bases get stripped within days of release.
๐๏ธ The family tree (pick your base)
- Llama 3.x / 4 โ huihui Llama-3.3-70B-abliterated, NeuralDaredevil-8B
- Qwen 2.5 / 3 / 3.5 / 3.6 โ the deepest bench, all at huihui-ai โ incl. Qwen3.6-27B abliterated & Qwen2.5-Coder-14B-Abliterated
- Gemma 2 / 3 / 4 โ mlabonne gemma-3 (1Bโ27B), p-e-w gemma-3-12b heretic
- Mistral / Nemo / Ministral โ huihui Mistral-Nemo abliterated, mlabonne Mistral-Nemo-Prism-12B
- DeepSeek V3 / R1 / V4 โ huihui R1-distills (8B/32B/70B) + the 671B
- Phi-4 โ phi-4-abliterated, Phi-4-mini, Phi-4-multimodal
- GLM 4.x / 5.x โ ArliAI GLM-4.6-Derestricted (clean method), Ex0bit GLM-4.7-PRISM, huihui GLM-5.2
- gpt-oss (OpenAI open) โ p-e-w 20b-heretic, huihui 120b, DavidAU NEO-Imatrix builds
- Exotics โ EXAONE, Granite, Hunyuan, InternVL, Qwen3-Omni โ all abliterated in the huihui firehose
The right unmuzzled model for the actual job
๐จโ๐ป Coding without the 'I can't help with that'
- huihui Qwen3-Coder-Next abliterated โ the top local coder, uncensored.
ollama run huihui_ai/qwen3-coder-next-abliterated - huihui Qwen3-Coder abliterated โ scales huge (480B tag on Ollama for big rigs).
- Aesdi90 Qwen2.5-Coder-14B-Abliterated โ fits a normal GPU.
- Dolphin 3.0 R1 Mistral 24B โ reasons through hard bugs.
๐ญ Roleplay / creative writing (the SillyTavern favourites)
The sceneโs most-loved, 2026 picks:
- TheDrummer Cydonia 24B v4.3 + Anubis 70B, Behemoth 123B, Rocinante 12B โ the daily drivers.
- Sao10K Stheno 8B, Euryale 70B, Lunaris 8B, Fimbulvetr 11B โ legends of the genre.
- Midnight-Miqu 70B & Midnight-Rose 70B โ the atmospheric classics.
- MythoMax-L2 13B โ the OG that still gets downloaded daily.
Pair any of these with SillyTavern (in the frontends section) for characters + memory.
๐๏ธ Vision โ models that can SEE images, uncensored
- huihui Qwen3-VL-30B abliterated โ flagship; also 8B & 32B sizes.
- prithivMLmods Qwen3-VL-8B-Abliterated-Caption โ an uncensored image describer.
- huihui GLM-4.6V-Flash abliterated + Phi-4-multimodal abliterated.
๐งฉ Reasoning / 'thinking' models, unmuzzled
Models that work through problems step by step, with the brakes off.
- huihui QwQ-32B abliterated โ strong open reasoner.
- Dolphin 3.0 R1 Mistral 24B โ trained on 800k reasoning traces.
- huihui DeepSeek-R1-Distill-Qwen-32B abliterated โ R1 brains, no refusals.
- DavidAU Brainstorm / TOTAL-RECALL builds โ reasoning cranked up + huge context.
๐ก๏ธ Cybersecurity โ a hacker's AI that won't flinch
Tuned on real security data, no โI canโt discuss that.โ
- WhiteRabbitNeo V3 7B (aka DeepHat V1 7B โ same model, rebranded at Black Hat 2025). Offensive + defensive.
- huihui Foundation-Sec-8B abliterated โ Ciscoโs security model, trained on 5.1B tokens of cyber data, then uncensored.
ollama run huihui_ai/foundation-sec-abliterated - huihui BaronLLM abliterated โ offensive-security tuned.
ollama run huihui_ai/baronllm-abliterated - Dolphin3-Cyber 8B โ OWASP + MITRE ATT&CK + CVEs baked in. Runs on a GTX 1650+.
- Lily-Cybersecurity 7B โ 22k security Q&A, Mistral base.
Follow the factory, not the file
New model drops today? One of these has a stripped version by tomorrow. Bookmark the maker, never run dry.
๐ The people who break models for a living
- huihui-ai โ the firehose. Hundreds of abliterations, updated ~weekly. Whatever drops, they strip it fast. Their
v2/v3releases beatv1. - Heretic org / p-e-w โ automated, lowest-damage abliterations + the tool to DIY.
- DavidAU โ Heretic + reasoning fusions + ready-to-run GGUFs.
- TheDrummer โ roleplay/creative king (Cydonia, Anubis, Behemoth).
- Sao10K โ Stheno, Euryale, Lunaris, Fimbulvetr.
- Cognitive Computations / dphn โ the Dolphin line.
- mradermacher & bartowski โ the two quant makers. If a model has no easy-run GGUF, search their pages โ theyโve usually made it.
Fishing rod, not fish: on Hugging Face, filter models by the tags
abliteratedandhereticโ thatโs 8,000+ and 4,000+ models right there. Youโll never run out.
๐งช Abliterated vs Heretic vs fine-tune โ which do I want?
Three ways a model gets uncensored โ pick the flavour:
- Abliterated โ the refusal direction is cut from the weights. Fast, keeps the baseโs brains, but can be a little โflatโ until you push it with a firm instruction.
- Heretic โ abliteration done automatically and tuned to keep the model smart (lowest brain-damage of the three). If a Heretic version exists, itโs usually the safest pick.
- Fine-tune (Dolphin-style) โ retrained on open data. Most consistent and steerable, occasionally hallucinates a touch more.
Rule of thumb: Heretic > healed abliteration > raw abliteration for keeping quality. Still refusing? Move up a tier.
The part they skip โ donโt download it broken, break it yourself
A refusal is one direction inside the model. Find it, delete it. Works on ANY model โ even next weekโs release nobodyโs stripped yet.
๐ฅ One pip, any model, uncensored in ~45 min โ Heretic
The big one. 7.9k stars, 1,000+ community models made with it. Point it at any model, it finds the refusal direction and removes it automatically โ no ML knowledge, just a terminal.
pip install heretic-llm
heretic Qwen/Qwen3-4B
Runs unsupervised, keeps more of the modelโs brains than most hand-made jobs. Save it, upload it, or chat right away. Pre-made ones live in its โThe Bestiaryโ collection on HF.
๐ The one built to kill Chinese-model censorship โ llm-abliteration (DECCP)
From NousResearch. Originally made to strip censorship out of Chinese LLMs, runs the whole job in 4-bit shards under 8GB VRAM in ~2 minutes. Handles dense and mixture-of-experts models (the big MoE ones others choke on).
๐ The free Colab notebook that started it all โ mlabonne's guide
Want to see the guts? Plain-English walkthrough + free Google Colab (runs in your browser, no GPU needed) that made abliteration a thing. Uncensor a model without owning any hardware.
Zero-surgery mode โ bend the model live, no file touched
Instead of editing the model, shove a โbe compliantโ nudge into its brain as it thinks. Same file, dial it up or down like a slider.
๐๏ธ Type a mood, inject it as a dial โ repeng (control vectors)
By Theia Vogel. Describe a direction in plain words (โuncensored,โ โconfidentโ) and it builds a control vector โ a nudge added to the modelโs activations at runtime. No retraining, no new weights. Export it and use it in llama.cpp with any quant.
๐ฆ Pre-baked dials, ready to drop in โ jukofyork/control-vectors
Donโt want to build your own? Grab ready-made control vectors in GGUF (the standard local-model format) and load them into llama.cpp with --control-vector. Stack several for layered effects.
โMy GPUโs too smallโ โ says who?
๐ A 671B monster on a 24GB card โ Unsloth Dynamic Quants
DeepSeek R1 is 671 billion parameters, normally 720GB. Unslothโs trick: squeeze the useless layers to 1.58-bit, keep the important ones sharp โ 131GB, an 80% cut, still writes working code. Runs on a single 24GB GPU (RTX 4090), or CPU-only with 20GB RAM if youโre patient.
๐ฑ Chain your phone + laptop + PC into one brain โ exo
The wild one. Model too big for any single device? exo splits it across all of them โ phone, laptop, desktop, old Macs โ peer-to-peer, no โmainโ machine. Your junk-drawer hardware pooled into one cluster that runs models none of them could alone.
๐ฟ Run massive AI on a potato โ AirLLM
Free library, 20k stars / 240k downloads. Reworks how models load so a 70B runs on a 4GB GPU, even 405B Llama 3.1 on 8GB VRAM. GPU, low-end, or CPU-only. Also does OCR (text out of images), image gen, assistants.
Where you actually talk to them
The models are the engine. These are the dashboard โ pick one, point it at a model, go.
๐ฒ One file, no install, runs on a 10-year-old PC โ KoboldCpp
Single .exe โ no Python, no Docker. Double-click, pick a model, chat. Broadest hardware support of anything (even integrated GPUs and ancient CPUs). Image gen + voice + transcription baked in. Remote Tunnel mode gives a link to reach it from anywhere.
๐ฒ Run it at home, chat from your phone โ LM Studio
Clean app to browse, download and compare models. The hidden gem: LM Link โ your home GPU does the work, your phone is just the screen, over an encrypted tunnel. Full power on the couch. Built-in HF proxy for when Hugging Face is blocked where you are.
๐ญ Characters, memory, lorebooks โ SillyTavern
The roleplay/character frontend. Plugs into KoboldCpp, Ollama or LM Studio and adds character cards, long-term memory, world-info โlorebooks,โ even live image gen. Where the uncensored models really come alive.
๐ช Fully offline, hybrid local + cloud โ Jan
41k stars, 5.3M+ downloads. Runs 100% offline, flips to cloud models in the same window when you want. MCP support for agent workflows. The friendly all-rounder if KoboldCpp feels too raw.
๐งฉ Bonus โ give your AI a memory that sticks: AgentMemory
AI forgets on tab-close. This saves past chats, compresses them into structured memory, and pulls the right bits back. Remembers your project across sessions. #1 trending on GitHub. Plugs into Claude Code, Cursor, Codex, any MCP tool. Bonus: fewer re-sends = lower token cost.
Where this actually bites in real life
๐ The 'ohh shit, I could use this' list
A hot new model drops with heavy censorship โ run Heretic on it tonight, uncensored twin by morning. You donโt wait for anyone.
Drop a 200-page contract or medical PDF in and ask blunt questions โ nothing refused, nothing leaves your PC.
Generate a whole working app in one pass instead of babysitting 15 half-answers that keep hitting โI canโt.โ
A pentest write-up that lists real attack vectors โ the security work cloud bots block as โharmful.โ
Zero cloud = your prompts never touch a company server, never train anything, never get your account nuked.
Shrink a 671B beast down to the cheap card you already own instead of renting a GPU by the hour.
Your phone canโt run a 30B model โ so let your home PC do it and just chat from the couch.
โก Pick fast (the cheat sheet)
Potato PC โ gemma3-abliterated 1B or a Dolphin 8B
8โ12 GB โ Huihui Qwen3.5 9B or gpt-oss-20b-heretic
24 GB โ Cydonia 24B (RP) or Dolphin R1 24B (reasoning)
Want ANY model uncensored โ Heretic, one command
Donโt want to edit the model โ repeng control vectors, live dial
Model too big โ Unsloth quants, or exo across your devices
Easiest front end โ KoboldCpp (one file) or LM Studio (phone access)
Weak machine + a giant model โ itโll crawl, drop a tier
They ship the lock. Turns out itโs one line of code โ and youโre holding the key. ![]()








!