Orpheus-FASTAPI w/ LM Studio API

High-performance Text-to-Speech server with OpenAI-compatible API, 8 voices, emotion tags, and modern web UI. Optimized for RTX GPUs with out of the box support with LM Studio API.

GitHub Repository

Voice Demos

Listen to sample outputs with different voices and emotions:

Default Test Sample - Standard neutral tone
Leah Happy Sample - Cheerful, upbeat demo
Tara Sad Sample - Emotional, melancholic demo
Zac Contemplative Sample - Thoughtful, measured tone

User Interface

Features

LM Studio Orpheus API Compatible: Out of the Box support for the LM Studio Server API running the Orpheus-3b-0.1 model
Default Max Token Lenght - 8192
OpenAI API Compatible: Drop-in replacement for OpenAI's /v1/audio/speech endpoint
Modern Web Interface: Clean, responsive UI with waveform visualization
High Performance: Optimized for RTX GPUs with parallel processing
Multiple Voices: 8 different voice options with different characteristics
Emotion Tags: Support for laughter, sighs, and other emotional expressions
Long-form Audio: Efficient generation of extended audio content in a single request

Project Structure

Orpheus-FastAPI/
├── app.py                # FastAPI server and endpoints
├── requirements.txt      # Dependencies
├── static/               # Static assets (favicon, etc.)
├── outputs/              # Generated audio files
├── templates/            # HTML templates
│   └── tts.html          # Web UI template
└── tts_engine/           # Core TTS functionality
    ├── __init__.py       # Package exports
    ├── inference.py      # Token generation and API handling
    └── speechpipe.py     # Audio conversion pipeline

Setup

Prerequisites

Python 3.8+
CUDA-compatible GPU (recommended: RTX series for best performance)
Separate LLM inference server running the Orpheus model (e.g., LM Studio or llama.cpp server)

Installation

Clone the repository:

git clone https://github.com/TheLocalLab/Orpheus-FastAPI-LMStudio.git
cd Orpheus-FastAPI-LMStudio

Create a Python virtual environment:

# Using venv (Python's built-in virtual environment)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Or using conda
conda create -n orpheus-tts python=3.10
conda activate orpheus-tts

Install PyTorch with CUDA support:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Install other dependencies:

pip3 install -r requirements.txt

Set up the required directories:

# Create directories for outputs and static files
mkdir -p outputs static

Starting the Server

Run the FastAPI server:

python app.py

Or with specific host/port:

uvicorn app:app --host 0.0.0.0 --port 5005 --reload

Access:

Web interface: http://localhost:5005/ (or http://127.0.0.1:5005/)
API documentation: http://localhost:5005/docs (or http://127.0.0.1:5005/docs)

Start Up LM Studio API Server

Download and Install LM Studio.
Download the Orpheus-3b-0.1-ft-Q4_K_M-GGUF model in the discover tab.
Select and load the model in the Developer Tab which should also start up the API server.

API Usage

OpenAI-Compatible Endpoint

The server provides an OpenAI-compatible API endpoint at /v1/audio/speech:

curl http://localhost:5005/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "orpheus",
    "input": "Hello world! This is a test of the Orpheus TTS system.",
    "voice": "tara",
    "response_format": "wav",
    "speed": 1.0
  }' \
  --output speech.wav

Parameters

input (required): The text to convert to speech
model (optional): The model to use (default: "orpheus")
voice (optional): Which voice to use (default: "tara")
response_format (optional): Output format (currently only "wav" is supported)
speed (optional): Speed factor (0.5 to 1.5, default: 1.0)

Legacy API

Additionally, a simpler /speak endpoint is available:

curl -X POST http://localhost:5005/speak \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world! This is a test.",
    "voice": "tara"
  }' \
  -o output.wav

Available Voices

tara: Female, conversational, clear
leah: Female, warm, gentle
jess: Female, energetic, youthful
leo: Male, authoritative, deep
dan: Male, friendly, casual
mia: Female, professional, articulate
zac: Male, enthusiastic, dynamic
zoe: Female, calm, soothing

Emotion Tags

You can insert emotion tags into your text to add expressiveness:

<laugh>: Add laughter
<sigh>: Add a sigh
<chuckle>: Add a chuckle
<cough>: Add a cough sound
<sniffle>: Add a sniffle sound
<groan>: Add a groan
<yawn>: Add a yawning sound
<gasp>: Add a gasping sound

Example: "Well, that's interesting I hadn't thought of that before."

Technical Details

This server works as a frontend that connects to an external LLM inference server. It sends text prompts to the inference server, which generates tokens that are then converted to audio using the SNAC model. The system has been optimised for RTX 4090 GPUs with:

Vectorised tensor operations
Parallel processing with CUDA streams
Efficient memory management
Token and audio caching
Optimised batch sizes

For best performance, adjust the API_URL in tts_engine/inference.py to point to your LLM inference server endpoint.

Integration with OpenWebUI

You can easily integrate this TTS solution with OpenWebUI to add high-quality voice capabilities to your chatbot:

Start your Orpheus-FASTAPI server
In OpenWebUI, go to Admin Panel > Settings > Audio
Change TTS from Web API to OpenAI
Set APIBASE URL to your server address (e.g., http://localhost:5005)
API Key can be set to "not-needed"
Set TTS Voice to one of the available voices: tara, leah, jess, leo, dan, mia, zac, or zoe
Set TTS Model to tts-1

External Inference Server

This application requires a separate LLM inference server running the Orpheus model. You can use:

GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
LM Studio - Load the GGUF model and start the local server
llama.cpp server - Run with the appropriate model parameters
Any compatible OpenAI API-compatible server

Download the quantised model from lex-au/Orpheus-3b-FT-Q8_0.gguf and load it in your inference server.

The inference server should be configured to expose an API endpoint that this FastAPI application will connect to.

Environment Variables

You can configure the system by setting environment variables:

ORPHEUS_API_URL: URL of the LLM inference API (tts_engine/inference.py)
ORPHEUS_API_TIMEOUT: Timeout in seconds for API requests (default: 120)

Make sure the ORPHEUS_API_URL points to your running inference server.

Development

Project Components

app.py: FastAPI server that handles HTTP requests and serves the web UI
tts_engine/inference.py: Handles token generation and API communication
tts_engine/speechpipe.py: Converts token sequences to audio using the SNAC model

Adding New Voices

To add new voices, update the AVAILABLE_VOICES list in tts_engine/inference.py and add corresponding descriptions in the HTML template.

License

This project is licensed under the Apache License 2.0 - see the LICENSE.txt file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Orpheus-FASTAPI w/ LM Studio API

Voice Demos

User Interface

Features

Project Structure

Setup

Prerequisites

Installation

Starting the Server

Start Up LM Studio API Server

API Usage

OpenAI-Compatible Endpoint

Parameters

Legacy API

Available Voices

Emotion Tags

Technical Details

Integration with OpenWebUI

External Inference Server

Environment Variables

Development

Project Components

Adding New Voices

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
docs		docs
static		static
templates		templates
tts_engine		tts_engine
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

License

TheLocalLab/Orpheus-FastAPI-LMStudio

Folders and files

Latest commit

History

Repository files navigation

Orpheus-FASTAPI w/ LM Studio API

Voice Demos

User Interface

Features

Project Structure

Setup

Prerequisites

Installation

Starting the Server

Start Up LM Studio API Server

API Usage

OpenAI-Compatible Endpoint

Parameters

Legacy API

Available Voices

Emotion Tags

Technical Details

Integration with OpenWebUI

External Inference Server

Environment Variables

Development

Project Components

Adding New Voices

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages