Local tts options that actually sound good
Practical guide to local text-to-speech for Home Assistant that actually sounds human, with hardware recommendations and setup tips.
Last updated: 2026-05-17
Local text-to-speech has come a long way. A few years ago, your Home Assistant announcements sounded like a robot from the 90s. Today, you can get pleasant voice output without relying on cloud APIs. Here’s what’s actually worth running.
Why Go Local
Cloud TTS services like Google or Amazon work fine until your internet drops. Then your doorbell announcements go silent and your automation notifications become useless. Local TTS keeps your smart home functional when the outside world disappears.
The other reason is latency. Waiting 500ms for an API response feels sluggish when you’re announcing “front door opened.” Local generation happens in milliseconds.
Quality used to be the problem. It isn’t anymore.
Piper: The Best Entry Point
Piper is a neural TTS system from Mozilla that’s designed to run on modest hardware. It sounds far better than any earlier open-source option and handles real sentences naturally.
On a Home Assistant Green or Home Assistant Yellow, Piper runs fine. You’ll get maybe 100-200ms generation time per sentence. On something more powerful like a Beelink EQ13 or Intel NUC 12 Pro, it’s nearly instantaneous.
The tradeoff: Piper has a limited voice catalog. You pick from pre-trained models rather than choosing any voice you want. Most voices are decent, but none are exceptional. This is fine for announcements— you’re not listening to audiobooks.
Setup in Home Assistant is straightforward via the Piper add-on. Point it at your media players and you’re done.
Coqui TTS: Higher Quality, Higher Cost
Coqui TTS produces better output than Piper, with more natural prosody and fewer robotic artifacts. If you care about how announcements sound, Coqui is worth the extra resources.
The catch: Coqui needs significantly more CPU. A Pi 4 will struggle. A Minisforum UM790 Pro or GMKtec G3 handles it comfortably. Plan accordingly.
Coqui also supports more voice options and lets you fine-tune style. You can get output that approaches commercial quality if you spend time tuning it.
For most people, Piper hits the sweet spot. Coqui matters only if you’re particular about voice quality.
Using Smart Speakers as Output
Local TTS generates audio files. You need somewhere to play them.
If you have Amazon Echo (4th Gen) or Amazon Echo Show 8 (3rd Gen) devices, Home Assistant can push TTS to them via the Alexa integration. The audio quality from Echo speakers is solid, and you get volume control through the native interface.
Apple-homepod-2nd-gen and Apple HomePod mini work similarly through AirPlay, though the setup is more involved.
Google Google Nest Mini (2nd Gen) speakers work through cast integration, though it’s less reliable than the other two.
The smart speaker route gives you better audio than most cheap USB speakers and lets you keep using those devices for their normal functions.
The Edge TTS Compromise
If local processing feels like overkill but you want something better than basic TTS, Microsoft Edge TTS runs locally as a browser engine but uses cloud synthesis. It’s faster than full cloud APIs and sounds better than older local options.
It’s not truly local—it still hits Microsoft’s servers—but it’s closer to the local-first ideal than Google or Amazon TTS. Some Home Assistant users run this as a middle ground.
Quick Verdict
Piper is the answer for most people. It’s good enough, fast enough, and runs on hardware you already have. Coqui if you want better quality and have the extra CPU. Everything else is either worse or more complicated than needed.