🧩 One Address β†’ Free AI From Many Providers, Auto-Failover

:free_button: Milk Multiple Free AI Tiers Through One Local API β€” Apps Hit a Single Endpoint While FreeLLMAPI Spreads Requests Across Free Providers and Auto-Recovers; Docker Compose Deploy With Logs and Checks

Point any ChatGPT app at YOUR box. It juggles multiple free AI backends behind the scenes.

homelab Β· ai Β· self-hosting

Big thanks to SRZ from OneHack for the original FreeLLMAPI thread and bringing it to the community β€” this is my BCBC lab writeup of getting it running.

The whole idea in one breath: FreeLLMAPI is a little middleman (a forwarder that sits between your app and the AI) that pretends to be ChatGPT’s API (the address apps send AI requests to). So any tool built for ChatGPT talks to it without changes β€” but behind the curtain it routes to a bunch of free AI providers, and if one’s down or rate-limited, it auto-jumps to the next. One address, many free brains, zero app rewrites.


:bullseye: Why You’d Want This

:electric_plug: One endpoint, swap brains freely β€” apps point at your box, you change providers behind it whenever.
:parachute: Auto-fallback β€” provider rate-limits you or dies? It silently switches to the next. No babysitting.
:house: Runs local on your own server β€” your traffic, your rules.
:free_button: Stack free providers β€” milk multiple free AI tiers through one door.

:test_tube: Not hype β€” repeatable. The point isn’t a flashy demo. It’s infrastructure you can stand up the same way twice and actually rely on.


:brick: The Lab Setup

Proxmox CT: 109            (a mini-computer on the server)
Service:    FreeLLMAPI
Runtime:    Docker / docker compose
Port:       3001           (the "door number" it listens on)
Endpoint:   /v1/chat/completions
Database:   SQLite         (tiny built-in data file)

How a request flows:

your app
  ↓
fake-OpenAI address  (looks like ChatGPT's API)
  ↓
FreeLLMAPI gateway   (the middleman)
  ↓
provider routing / fallback
  ↓
multiple free AI backends

:white_check_mark: Current Status β€” All Green

:green_circle: Docker container starts clean
:green_circle: SQLite database initializes
:green_circle: Model + fallback seed runs (loads which providers to try and in what order)
:green_circle: API listening on port 3001
:green_circle: Proxy endpoint live at /v1/chat/completions


πŸ” Confirm It Yourself (logs)
pct exec 109 -- bash -lc "docker logs --tail 100 freellmapi-freellmapi-1"

You want to see:

Database initialized
Server running on http://[::]:3001
Proxy endpoint: http://[::]:3001/v1/chat/completions

Container up, database seeded, API listening, proxy live. :tada:

πŸ› οΈ Next Upgrades (my to-do)

:door: nginx reverse proxy β€” a doorman in front so you hit a clean URL, not a raw port.
:pager: Health/status page β€” see at a glance if it’s alive.
:parachute: Test the fallback β€” kill one provider on purpose, confirm it jumps.
:desktop_computer: Hook in Ollama β€” (runs AI models locally on your own machine) so you’ve got an offline backend too.
:clipboard: Document safe config examples β€” copy-paste ready for the next person.


:light_bulb: Real homelab truth: the AI service booted faster than the network plumbing did. The full stack dragged in Proxmox, Docker, pfSense, ClouDNS, BIND split DNS (your own private phonebook for internal names), nginx, and browser-cache gremlins. Totally normal β€” the app is only one piece; the system is the work. :wolf:


One door, many free AI brains, auto-failover. Who’s spinning this up on their own box β€” and which providers are you stacking behind it?