Local TTS options that actually sound good

Local text-to-speech has come a long way. A few years ago, your Home Assistant announcements sounded like a robot from the 90s. Today, you can get pleasant voice output without relying on cloud APIs. Here’s what’s actually worth running.

Why Go Local

Cloud TTS services like Google or Amazon work fine until your internet drops. Then your doorbell announcements go silent and your automation notifications become useless. Local TTS keeps your smart home functional when the outside world disappears.

The other reason is latency. Waiting 500ms for an API response feels sluggish when you’re announcing “front door opened.” Local generation happens in milliseconds.

Quality used to be the problem. It isn’t anymore.

Piper: The Best Entry Point

Piper is a fast, local neural text-to-speech engine. It is not a Mozilla project — it grew out of the Rhasspy open-source voice effort (created by Michael Hansen) and is now maintained under the Open Home Foundation. It sounds far better than older open-source options like eSpeak and handles real sentences naturally, while still being light enough to run on a Raspberry Pi 4.

On a Home Assistant Green (or a Home Assistant Yellow if you already own one), Piper runs comfortably. Using medium-quality voices, a Raspberry Pi 4 can generate speech at roughly real time (about 1.6 seconds of audio per second of compute), and a more powerful mini PC makes it feel near-instant.

The tradeoff: Piper has a fixed catalog of pre-trained voices across many languages. You pick from those models rather than cloning any voice you want, and quality varies by voice. That is fine for announcements — you’re not listening to audiobooks.

Setup in Home Assistant is straightforward via the Piper add-on (it talks to Home Assistant over the Wyoming protocol). Select it as the text-to-speech engine in your Assist pipeline, point your automations or media players at it, and you’re done.

Coqui / XTTS: Higher Quality, Higher Cost

For more expressive, natural-sounding output, the Coqui TTS family (notably the XTTS v2 model) goes beyond Piper, with better prosody and support for voice cloning. If you care a lot about how announcements sound, it can be worth the extra resources.

Two honest caveats. First, the company behind Coqui shut down in early 2024; the project lives on as community-maintained open source (the actively maintained fork is hosted by the Idiap Research Institute), and several community Wyoming/HACS integrations exist to bridge XTTS into Home Assistant. It is usable, but it is community-supported, not a polished official add-on. Second, it is far heavier than Piper: a Raspberry Pi will struggle, and you’ll really want a capable mini PC, ideally with a GPU, for snappy responses.

For most people, Piper hits the sweet spot. Coqui/XTTS matters only if you are particular about voice quality and willing to manage a heavier, community-driven setup.

Using Smart Speakers as Output

Local TTS generates audio files. You need somewhere to play them, and this is where a lot of “local TTS” setups quietly stop being local. The question to ask about any speaker is whether it plays your Piper audio over the LAN, or whether it only relays text to a vendor’s cloud to be spoken back in the vendor’s voice.

Google Nest Mini (2nd Gen) speakers are the closest thing to a drop-in. Home Assistant’s Google Cast integration is local polling, and it can play a Piper-generated file served straight from your Home Assistant instance. Two caveats: Home Assistant has to be reachable at a configured external_url, and Chromecast devices ignore your local DNS, so the media URL needs to be an IP address or publicly resolvable. One buying note: Google ended production of the Nest Mini and Nest Audio in June 2026, and says existing units stay fully supported. So this path is fine if you already own one or find remaining stock, and Google lists the replacement Google Home Speaker as a Cast target too.

Amazon Echo (4th Gen) and Amazon Echo Show 8 (3rd Gen) are the trap. Home Assistant’s core Alexa integration exposes your entities to Alexa; it does not push audio to Echo speakers. The integration that does is Alexa Devices (added in Home Assistant 2025.6), and it is cloud polling: it wants your Amazon account credentials, and its Speak and Announce entities send text to Amazon, which speaks it in Alexa’s voice. Your Piper voice never plays, and none of it works when the internet is down. The community Alexa Media Player custom component takes the same route, through Amazon’s unofficial cloud API. Echo speakers sound fine, but they are not an output for local TTS.

Apple HomePod (2nd Gen) and Apple HomePod mini sit in between. A HomePod turns up in Home Assistant through the Apple TV integration as an AirPlay media player, and some people do get TTS out of it, but the path is unofficial and it has broken across HomePod OS releases. Do not build your announcements on it.

So the smart speaker you already own may or may not be usable here. If announcements have to survive an outage, the dull options are the dependable ones: a cheap speaker wired to the Home Assistant box, an ESPHome-based media player, or a Home Assistant Voice satellite.

The Edge TTS Compromise

If you don’t want to run a neural model yourself but want something better than the old robotic voices, Microsoft Edge TTS is a popular middle ground. Be clear about what it is, though: it is not local. It uses the same online neural voices as Microsoft’s Edge browser read-aloud feature, so synthesis happens on Microsoft’s servers. There is no local generation here.

The appeal is that the voices sound good and there’s no API key or paid tier to set up (community Home Assistant integrations wrap the unofficial endpoint). The downside is the obvious one: if your internet drops, it stops working — which defeats the main reason to go local in the first place. Treat it as a convenience option, not a local-first one. If offline reliability matters, stick with Piper.

Quick Verdict

Piper is the answer for most people. It’s good enough, fast enough, and runs on hardware you already have. Coqui if you want better quality and have the extra CPU. Everything else is either worse or more complicated than needed.

Local TTS options that actually sound good

Why Go Local

Piper: The Best Entry Point

Coqui / XTTS: Higher Quality, Higher Cost

Using Smart Speakers as Output

The Edge TTS Compromise

Quick Verdict

Next steps

Compare this category side by side

Inspect all products

Back up and read the explainers

Related articles

Best local-first smart home hubs

Best smart plugs with local control and energy monitoring

Best local security cameras for Home Assistant