A generative speech model for daily dialogue with standardized OpenAI API speech interface.
ChatTTS is a text-to-speech model designed specifically for dialogue scenarios such as LLM assistant. OpenAI API is mainly used for AI application platforms to implement speech capabilities, such as Dify, Flowise, etc., as well as modular development.
- English
- Chinese
- The main model is trained with Chinese and English audio data of 100,000+ hours.
- The open-source version on HuggingFace is a 40,000 hours pre-trained model without SFT.
git clone https://github.com/RavenMuse/ChatTTS-OpenApi.git
cd ChatTTS-OpenApi
pip install --upgrade -r requirements.txt
uv sync --upgrade
source .venv/bin/activate
- running by cpu
docker-compose up -d
- running by gpu
docker-compose -f docker-compose.gpu.yaml up -d
Make sure you are under the project root directory when you execute these commands below.
python examples/web/webui.py
It will save audio to
./output_audio_n.mp3
python examples/cmd/run.py "Your text 1." "Your text 2."
python api.py --port 7006
curl http://localhost:7006/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model":"chat_tts",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "shimmer"
}' \
--output speech.mp3