Turn Any Document Into a Podcast — 100% Free, Runs on Your Computer
Google’s NotebookLM went viral. Now you can do the same thing — without Google, without paying, without limits.
What You’re Walking Away With
A complete list of free tools that turn your PDFs, articles, notes, or websites into realistic two-person podcast conversations. No subscriptions. No “50 free minutes then pay us.” Just yours.
Why This Actually Matters
You don’t need to mass-produce podcasts to care about this.
→ Learning hack: Paste your textbook chapter, get a 10-minute audio explanation while you commute
→ Content cheat code: Turn your blog post into a podcast episode without recording anything
→ Accessibility win: Make any document listenable for people who can’t or don’t want to read
What These Tools Give You
Two AI voices having a real conversation about your content
Works with PDFs, websites, YouTube videos, plain text, images
Multiple languages supported
Runs 100% on your computer — nothing leaves your machine
Actually free — not “free trial” free
Voices that laugh, sigh, pause, and sound human
Short clips (2-5 min) or full episodes (30+ min)
The Best Options (Ranked by “Just Works” Energy)
Podcastfy — The Easiest Full Solution
What is it?
A Python tool that takes URLs, PDFs, YouTube links, or images and spits out a podcast MP3. One command. Done.
Why it's the move
- Edge TTS mode = completely free (Microsoft’s text-to-speech, no API key needed)
- Works with local AI models via Ollama (also free)
- Supports 100+ different AI models if you want to experiment
- Apache 2.0 license — use it however you want
- 5,700+ stars on GitHub, actively maintained
The magic command
pip install podcastfy
python -m podcastfy.client --url "your-article.com" --local --tts-model edge
That’s it. PDF in, podcast out.
Links:
- GitHub: https://github.com/souzatharsis/podcastfy
- Docs: https://github.com/souzatharsis/podcastfy/blob/main/usage/config.md
- Try in browser: https://colab.research.google.com/github/souzatharsis/podcastfy/blob/main/podcastfy.ipynb
Dia TTS — Best Voice Quality (Sounds Scary Real)
What is it?
A 1.6 billion parameter AI model specifically built to generate realistic conversations. You give it a script with [S1] and [S2] tags, it gives you audio that sounds like two humans talking.
Why it's insane
- One-pass generation — whole conversation rendered at once, not stitched together
- Built-in reactions:
(laughs),(sighs),(coughs),(clears throat) - Clone any voice from a few seconds of audio
- Apache 2.0 license
- 19,000+ stars — people are obsessed
Example script
[S1] Welcome to the show!
[S2] Thanks for having me. (laughs)
[S1] So let's talk about why AI is eating the world.
[S2] (sighs) Where do we even start...
Feed that in → get a podcast clip out.
Hardware needs
- ~10GB GPU memory (RTX 3070+ or similar)
- Runs at 40 tokens/second on an A4000
- No GPU? Use the free HuggingFace demo
Links:
- GitHub: https://github.com/nari-labs/dia
- Dia2 (streaming version): https://github.com/nari-labs/dia2
- Try free (no install): https://huggingface.co/spaces/nari-labs/Dia-1.6B
- Ready-to-use server: https://github.com/devnen/Dia-TTS-Server
Chatterbox — Beats ElevenLabs in Blind Tests
What is it?
ResembleAI’s open-source voice model. In blind listening tests, 63.8% of people preferred it over ElevenLabs (the $22/month service everyone uses).
Why it matters
- MIT license — use commercially, modify, whatever
- Emotion control slider (monotone ↔ dramatic)
- Tags for reactions:
[laugh],[cough],[chuckle] - 23 languages supported
- Clone voices from ~10 seconds of audio
- Turbo version: under 200ms latency
Variants
- Chatterbox Original: Best quality, emotion control
- Chatterbox Turbo: Fastest, paralinguistic tags
- Chatterbox Multilingual: 23 languages
Links:
- GitHub: https://github.com/resemble-ai/chatterbox
- Try free: https://huggingface.co/spaces/ResembleAI/Chatterbox
- Full server with Web UI: https://github.com/devnen/Chatterbox-TTS-Server
SurfSense — Full Research-to-Podcast Pipeline
What is it?
NotebookLM + Perplexity + podcast generator combined. Connects to your Notion, Slack, GitHub, Google Drive, emails — then lets you chat with all of it AND turn conversations into podcasts.
Why it's different
- 3-minute podcast generated in ~20 seconds
- Uses Kokoro for local text-to-speech (free)
- Works with Ollama (free local AI)
- Connects to 20+ services
- One Docker command to run everything
- Your data never leaves your computer
Links:
- GitHub: https://github.com/MODSetter/SurfSense
- Website: https://www.surfsense.com/
More Tools Worth Knowing
F5-TTS — Fast Voice Cloning
- Clone any voice from 6 seconds of audio
- Works in 17 languages
- Apple Silicon version runs in ~4 seconds on M3 Max
- https://github.com/SWivid/F5-TTS
- MLX (Mac) version: https://github.com/lucasnewman/f5-tts-mlx
TTS-WebUI — Test 30+ Voice Models in One Interface
- Gradio interface with every major TTS model
- Bark, XTTS, Kokoro, F5-TTS, Chatterbox, Dia, Piper, and more
- Perfect for comparing which voice sounds best
- https://github.com/rsxdalv/tts-generation-webui
podcast_tts — Simple Multi-Speaker Generator
- Uses ChatTTS (free)
- Left/right audio channel separation for speakers
- Built-in background music mixing
- https://github.com/puntorigen/podcast_tts
TTS-Audio-Suite — ComfyUI Node for Visual Workflows
- Drag-and-drop podcast creation
- Multi-character dialogue with name tags
- Can generate 90+ minutes of audio
- https://github.com/diodiogod/TTS-Audio-Suite
Mozilla document-to-podcast - common one
- Mozilla’s official blueprint
- Fully local, no API keys
- Good documentation for learning how it works
- https://github.com/mozilla-ai/document-to-podcast
Quick Comparison
| Tool | Completely Free? | Runs Locally? | Voice Quality | Easy Setup? | License |
|---|---|---|---|---|---|
| Podcastfy + Edge | Apache 2.0 | ||||
| Dia TTS | Apache 2.0 | ||||
| Chatterbox | MIT | ||||
| SurfSense | Open Source | ||||
| F5-TTS | MIT |
Start Here (Pick Your Path)
“I want this working in 5 minutes”
→ Use Podcastfy with Edge TTS
“I want the most realistic voices possible”
→ Use Dia TTS (need a decent GPU)
“I want to try without installing anything”
→ Dia HuggingFace Space or Chatterbox HuggingFace Space
“I want a full research + podcast workflow”
→ Use SurfSense
“I want to test multiple voice options”
→ Use TTS-WebUI
Google charges nothing for NotebookLM right now because they’re harvesting your documents for training data. These tools give you the same magic while keeping your stuff yours.
Your move. ![]()
!