๐Ÿ“ฆ Unfiltered AI Models for Local Use โ€” The Complete Resource List

:brain: 6 uncensored AI models + tools โžœ one codes 2,593 lines โžœ another runs 70B on a 4GB GPU โžœ all local, free

Local AI (runs on your computer, not someoneโ€™s server) with the โ€œI canโ€™t help with thatโ€ stripped out. Free, offline, yours.



These are abliterated models โ€” that just means the part that makes an AI refuse stuff has been surgically removed. They run locally through Ollama (a free app that runs AI models on your own PC โ€” one install, copy-paste a command, done) or Hugging Face (the site where people share AI models for free).

:light_bulb: No account ยท No cloud ยท Nothing logged. Hereโ€™s the drop. :backhand_index_pointing_down:


:brain: The Models



๐Ÿง  Huihui Qwen3.5 35B โ€” the no-refusal workhorse

From Chinese devs huihui-ai. Built on Qwen 3.5, most refusals stripped. Handles controversial/sensitive topics other chatbots dodge.

Runs in one Ollama command:

ollama run huihui_ai/qwen3.5-abliterated:35b

:warning: Built for experienced users โ€” with the guardrails gone, outputs can get highly explicit, provocative, or unpredictable.

:link: โ†’ Model on Hugging Face



๐Ÿ”ฅ Gemini Heretic 40B โ€” for coding + long writing

Minimal refusals, 128K context (it can hold a huge document or long chat in memory without forgetting) โ€” so it handles large documents, long conversations, and complex projects without losing track.

Shows its own reasoning. Built for coding, long-form writing, brainstorming, research. Few-clicks local setup.

:link: โ†’ Gemini Heretic on Hugging Face



โšก Gemma 4 12B Obliterated โ€” zero refusal, zero quality drop

The first one to hit 0 refusals with no benchmark loss โ€” meaning they killed the โ€œnoโ€ without making it dumber.

Lightweight 12B, runs on modest hardware.

:link: โ†’ Gemma 4 12B Obliterated



๐Ÿ† Qwen3.5 21B Deckard โ€” the coding monster

Arguably the strongest here. Cranked out 2,593 lines of code in one go โ€” ChatGPT usually chokes around 1,200โ€“1,500.

Holds structure and logic across a big codebase, not just snippets.

:link: โ†’ Qwen3.5 Deckard Heretic


:hammer_and_wrench: The Tools That Run Them




๐Ÿ’พ AirLLM โ€” run giant models on a potato PC

The catch with big AI is it needs a monster GPU. AirLLM (a free code library, 20k stars / 240k downloads) reworks how the model loads so a 70B model runs on a 4GB GPU โ€” they even run 405B Llama 3.1 on 8GB VRAM.

Works on basically any setup, from a low-end GPU down to CPU-only. Hooks straight into Hugging Face models. Beyond chat it handles OCR (reads text from images), image generators, assistants, and more.

:link: โ†’ github.com/lyogavin/airllm



๐Ÿงฉ AgentMemory โ€” give your AI a permanent memory

AI forgets everything between chats. AgentMemory is a memory layer โ€” it stores past interactions, compresses them into structured memories, and pulls the relevant bits back when needed โ€” so your AI remembers your project across sessions with no re-explaining.

#1 trending repo on GitHub. Plugs into Claude Code, Cursor, Codex, and any MCP tool.

:light_bulb: Bonus: way less re-sending context = way lower token cost on long projects.

:link: โ†’ github.com/rohitg00/agentmemory


:high_voltage: Quick Picks

:white_check_mark: Do

  • :feather: Start with Gemma 4 12B if your PCโ€™s modest โ€” lightest one here
  • :floppy_disk: Use AirLLM if a modelโ€™s too big for your GPU
  • :rocket: Run everything through Ollama for the easiest setup

:prohibited: Donโ€™t

  • :snail: Donโ€™t grab the 40B models on a weak machine โ€” theyโ€™ll crawl
  • :stop_sign: Donโ€™t expect a guardrail to catch you โ€” there isnโ€™t one, thatโ€™s the point

Got a rig running one of these? Drop your specs + which model below โ€” helps everyone pick. :backhand_index_pointing_down:

2 Likes