Local Voice Solution

I am stuck. I want to make a local voice gen solution for a project for Indic languages. I tried Parler-TTS, it give Hindi in Bihari accent. I tried Chatterbox: the quality is good but the watermark it gen kills all the pauses in naration. IndicF5 is drunk reverse. My project is not that i record and use later for assembly; it is record, immediately to timeline, marry with graphics and publish video. Can someone suggest a better solution? My volume to voice gen is very high; can’t survive with token purchase so want a local run solution.

2 Likes

You can try this AI Tool

Yes, this is option and many like this. When they shut shop or the lifetime turns into 250k tokens limit per month is the issue. i have come across 2-3 other sol like this. someone suggested coqui xtts, will look at that. thanks for the link.

For local Hindi TTS, your best bets in order of readiness:

Ready to use now: Silero Indic TTS — 9 Indic languages, 17 speakers, runs locally via PyTorch with romanization handled through aksharamukha. Also check Indic-Parler-TTS by ai4bharat — 16 Indian languages, MOS-evaluated by native speakers, solid community backing.

Worth evaluating: XTTS-v2 Hindi fine-tune has improved prosody, but be aware of this open bug where Hindi support breaks on local installs despite working on the demo. Coqui itself has shut down, so upstream fixes are unlikely. Qwen3-TTS is a newer option from Alibaba — fully local, 10 languages, 12Hz audio tokenizer.

For the video pipeline: The Sportmaster Lab dubbing pipeline writeup is the closest reference to what you’re building — covers the full assembly workflow with neural TTS. Yandex’s neural dubbing architecture is worth reading for voice/intonation preservation design.

For benchmarking your choices: This open TTS model survey and NtechLab’s architecture comparison will help you pick between them objectively. Community threads on XTTS hallucination issues (nonsense words, extra syllables) are essential reading before committing.

Pro tip: If you go the XTTS route, train your own voice with this VITS Hindi training repo instead of relying on the broken upstream Hindi config — it uses the Coqui framework but sidesteps the language code bug entirely.

1 Like