Free Open-Source Artificial Intelligence @lemmy.world hok @lemmy.dbzer0.com 5d ago

Fish Speech 1.5, an open source voice cloning TTS that's actually good

github.com GitHub - fishaudio/fish-speech: SOTA Open Source TTS

SOTA Open Source TTS. Contribute to fishaudio/fish-speech development by creating an account on GitHub.

I've been waiting for an open source TTS model that was actually good enough to capture some of the subtleties of language and synthesize them in a natural-sounding way that makes sense. I think I finally found one that fits the requirements.

Model: https://huggingface.co/fishaudio/fish-speech-1.5

It uses an encoder rather than relying on phonemes, and generations sometimes vary because of that, but the amount of errors I've gotten are minimal, and the variations in the generation are all surprisingly natural in slightly different ways, which is very exciting.

Give it a spin if you are also looking for a TTS model that sounds good. It uses voice cloning, so find a good 10-20 second reference clip to have the generations use the same voice.

You're viewing a single thread.

6 comments

How do you run this locally? What program does one use? I know you can take LLM models and throw them into ollama or gpt4all. What about this?
- I followed their instructions here: https://speech.fish.audio/
  
  I am using the locally-run API server to do inference: https://speech.fish.audio/inference/#http-api-inference
  
  I don't know about other ways. To be clear, this is not (necessarily) an LLM, it's just for speech synthesis, so you don't run it on ollama. That said I think it does technically use Llama under the hood since there are two models, one for encoding text and the other for decoding to audio. Honestly the paper is terrible but it explains the architecture somewhat: https://arxiv.org/pdf/2411.01156

6 comments