High-performance Text-to-Speech server with OpenAI-compatible API, 8 voices, emotion tags, and modern web UI. Optimized for RTX GPUs with out of the box support with LM Studio API.
Listen to sample outputs with different voices and emotions:
- Default Test Sample - Standard neutral tone
- Leah Happy Sample - Cheerful, upbeat demo
- Tara Sad Sample - Emotional, melancholic demo
- Zac Contemplative Sample - Thoughtful, measured tone
- LM Studio Orpheus API Compatible: Out of the Box support for the LM Studio Server API running the Orpheus-3b-0.1 model
- Default Max Token Lenght - 8192
- OpenAI API Compatible: Drop-in replacement for OpenAI's
/v1/audio/speechendpoint - Modern Web Interface: Clean, responsive UI with waveform visualization
- High Performance: Optimized for RTX GPUs with parallel processing
- Multiple Voices: 8 different voice options with different characteristics
- Emotion Tags: Support for laughter, sighs, and other emotional expressions
- Long-form Audio: Efficient generation of extended audio content in a single request
Orpheus-FastAPI/
├── app.py # FastAPI server and endpoints
├── requirements.txt # Dependencies
├── static/ # Static assets (favicon, etc.)
├── outputs/ # Generated audio files
├── templates/ # HTML templates
│ └── tts.html # Web UI template
└── tts_engine/ # Core TTS functionality
├── __init__.py # Package exports
├── inference.py # Token generation and API handling
└── speechpipe.py # Audio conversion pipeline
- Python 3.8+
- CUDA-compatible GPU (recommended: RTX series for best performance)
- Separate LLM inference server running the Orpheus model (e.g., LM Studio or llama.cpp server)
- Clone the repository:
git clone https://github.com/TheLocalLab/Orpheus-FastAPI-LMStudio.git
cd Orpheus-FastAPI-LMStudio- Create a Python virtual environment:
# Using venv (Python's built-in virtual environment)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Or using conda
conda create -n orpheus-tts python=3.10
conda activate orpheus-tts- Install PyTorch with CUDA support:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124- Install other dependencies:
pip3 install -r requirements.txt- Set up the required directories:
# Create directories for outputs and static files
mkdir -p outputs staticRun the FastAPI server:
python app.pyOr with specific host/port:
uvicorn app:app --host 0.0.0.0 --port 5005 --reloadAccess:
- Web interface: http://localhost:5005/ (or http://127.0.0.1:5005/)
- API documentation: http://localhost:5005/docs (or http://127.0.0.1:5005/docs)
- Download and Install LM Studio.
- Download the Orpheus-3b-0.1-ft-Q4_K_M-GGUF model in the discover tab.

- Select and load the model in the Developer Tab which should also start up the API server.

The server provides an OpenAI-compatible API endpoint at /v1/audio/speech:
curl http://localhost:5005/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "orpheus",
"input": "Hello world! This is a test of the Orpheus TTS system.",
"voice": "tara",
"response_format": "wav",
"speed": 1.0
}' \
--output speech.wavinput(required): The text to convert to speechmodel(optional): The model to use (default: "orpheus")voice(optional): Which voice to use (default: "tara")response_format(optional): Output format (currently only "wav" is supported)speed(optional): Speed factor (0.5 to 1.5, default: 1.0)
Additionally, a simpler /speak endpoint is available:
curl -X POST http://localhost:5005/speak \
-H "Content-Type: application/json" \
-d '{
"text": "Hello world! This is a test.",
"voice": "tara"
}' \
-o output.wavtara: Female, conversational, clearleah: Female, warm, gentlejess: Female, energetic, youthfulleo: Male, authoritative, deepdan: Male, friendly, casualmia: Female, professional, articulatezac: Male, enthusiastic, dynamiczoe: Female, calm, soothing
You can insert emotion tags into your text to add expressiveness:
<laugh>: Add laughter<sigh>: Add a sigh<chuckle>: Add a chuckle<cough>: Add a cough sound<sniffle>: Add a sniffle sound<groan>: Add a groan<yawn>: Add a yawning sound<gasp>: Add a gasping sound
Example: "Well, that's interesting I hadn't thought of that before."
This server works as a frontend that connects to an external LLM inference server. It sends text prompts to the inference server, which generates tokens that are then converted to audio using the SNAC model. The system has been optimised for RTX 4090 GPUs with:
- Vectorised tensor operations
- Parallel processing with CUDA streams
- Efficient memory management
- Token and audio caching
- Optimised batch sizes
For best performance, adjust the API_URL in tts_engine/inference.py to point to your LLM inference server endpoint.
You can easily integrate this TTS solution with OpenWebUI to add high-quality voice capabilities to your chatbot:
- Start your Orpheus-FASTAPI server
- In OpenWebUI, go to Admin Panel > Settings > Audio
- Change TTS from Web API to OpenAI
- Set APIBASE URL to your server address (e.g.,
http://localhost:5005) - API Key can be set to "not-needed"
- Set TTS Voice to one of the available voices:
tara,leah,jess,leo,dan,mia,zac, orzoe - Set TTS Model to
tts-1
This application requires a separate LLM inference server running the Orpheus model. You can use:
- GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
- LM Studio - Load the GGUF model and start the local server
- llama.cpp server - Run with the appropriate model parameters
- Any compatible OpenAI API-compatible server
Download the quantised model from lex-au/Orpheus-3b-FT-Q8_0.gguf and load it in your inference server.
The inference server should be configured to expose an API endpoint that this FastAPI application will connect to.
You can configure the system by setting environment variables:
ORPHEUS_API_URL: URL of the LLM inference API (tts_engine/inference.py)ORPHEUS_API_TIMEOUT: Timeout in seconds for API requests (default: 120)
Make sure the ORPHEUS_API_URL points to your running inference server.
- app.py: FastAPI server that handles HTTP requests and serves the web UI
- tts_engine/inference.py: Handles token generation and API communication
- tts_engine/speechpipe.py: Converts token sequences to audio using the SNAC model
To add new voices, update the AVAILABLE_VOICES list in tts_engine/inference.py and add corresponding descriptions in the HTML template.
This project is licensed under the Apache License 2.0 - see the LICENSE.txt file for details.



