A FastAPI-based Text-to-Speech API service using the Kokoro-82M model. Supports multiple voices in American and British English accents, with OpenAI API compatibility.
- Multiple voices (American and British English)
- Easy-to-use REST API
- OpenAI API compatibility
- Web interface
- Automatic model file management
- Proper WAV file generation
- Detailed API documentation
- Using docker-compose:
# Build and start the service
./docker-build.sh
# Or manually:
docker-compose up --build- Using Docker directly:
# Build the image
docker build -t kokoro-tts .
# Run the container
docker run -p 8000:8000 \
-v $(pwd)/models.py:/app/models.py \
-v $(pwd)/kokoro.py:/app/kokoro.py \
-v $(pwd)/istftnet.py:/app/istftnet.py \
-v $(pwd)/plbert.py:/app/plbert.py \
-v $(pwd)/config.json:/app/config.json \
-v $(pwd)/kokoro-v0_19.pth:/app/kokoro-v0_19.pth \
-v $(pwd)/voices:/app/voices \
kokoro-tts- Install dependencies:
pip install -r requirements.txt- Run the server:
# API only
python main.py
# With web interface
python main-ui.pyThe server will automatically download required model files on first run.
Access the web interface at http://localhost:8000 to:
- Convert text to speech
- Choose from available voices
- Download generated audio
- View API examples
Returns API usage information and status.
Returns list of available voices with descriptions.
Converts text to speech.
Parameters:
text: Text to convert to speechvoice(optional): Voice ID to use (default: "af")
Example:
# Generate speech with default voice
curl -X POST "http://localhost:8000/tts?text=Hello%20world." --output output.wav
# Generate speech with specific voice
curl -X POST "http://localhost:8000/tts?text=Hello%20world.&voice=am_adam" --output output.wavLists available TTS models.
OpenAI-compatible endpoint for speech generation.
Example using OpenAI Python client:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed" # API key not required
)
response = client.audio.speech.create(
model="tts-1",
voice="alloy", # See voice mapping below
input="Hello world!"
)
response.stream_to_file("output.wav")- af: Default voice (50-50 mix of Bella & Sarah)
- af_bella: American Female - Bella
- af_sarah: American Female - Sarah
- am_adam: American Male - Adam
- am_michael: American Male - Michael
- bf_emma: British Female - Emma
- bf_isabella: British Female - Isabella
- bm_george: British Male - George
- bm_lewis: British Male - Lewis
- af_nicole: American Female - Nicole (ASMR voice)
- alloy → am_adam (Neutral male)
- echo → af_nicole (Soft female)
- fable → bf_emma (British female)
- onyx → bm_george (Deep male)
- nova → af_bella (Energetic female)
- shimmer → af_sarah (Clear female)
See requirements.txt for full list of dependencies.
This project uses GitHub Actions for continuous integration and deployment:
-
Branch Configuration:
- Default branch:
main - Protected branch rules recommended
- Workflow runs on
mainbranch and tags
- Default branch:
-
Repository Settings:
- Actions permissions: Read and write
- Packages enabled for Docker images
- Workflow permissions enabled
The workflow automatically builds and publishes Docker images:
- Triggers on:
- Pushes to
mainbranch - Version tags (
v*) - Pull requests
- Pushes to
- Publishes to GitHub Container Registry (ghcr.io)
- Uses build caching for faster builds
- Includes version tagging and metadata
# Pull the latest image
docker pull ghcr.io/bmv234/tts-v2-api-server:latest
# Run with GPU support
docker run --gpus all -p 8000:8000 \
-v ./models.py:/app/models.py \
-v ./kokoro.py:/app/kokoro.py \
-v ./istftnet.py:/app/istftnet.py \
-v ./plbert.py:/app/plbert.py \
-v ./config.json:/app/config.json \
-v ./kokoro-v0_19.pth:/app/kokoro-v0_19.pth \
-v ./voices:/app/voices \
ghcr.io/bmv234/tts-v2-api-server:latestThe workflow automatically handles version tagging:
- Latest: Always points to latest main branch build
- Version tags: Created when pushing tags (e.g., v1.0.0)
- SHA tags: Include git commit hash for traceability
The Docker setup includes:
- Multi-stage build for smaller image size
- Non-root user for security
- Health checks for container monitoring
- Volume mounts for model persistence
- Resource limits (configurable in docker-compose.yml)
CUDA_VISIBLE_DEVICES: Set to empty for CPU mode, remove for GPU support- Memory limits: 4GB max, 2GB min (adjustable in docker-compose.yml)