LiquidAI Audio API

FastAPI service for Speech-to-Text, Text-to-Speech, and Conversation

Powered by LiquidAI LFM2-Audio-1.5B

Features

Speech-to-Text (STT): Transcribe audio at 12.9x realtime
Text-to-Speech (TTS): High-quality synthesis
Conversation (Speech-to-Speech): Full voice-to-voice conversations
End-to-end model: Single model handles all audio tasks
GPU Accelerated: CUDA support with Blackwell compatibility

Quick Start

1. Setup Environment

# Clone the repository
git clone https://github.com/yourusername/audio-api.git
cd audio-api

# Create conda environment
conda create -n liquid-audio python=3.12
conda activate liquid-audio

# Install dependencies
pip install torch --index-url https://download.pytorch.org/whl/cu128
pip install liquid-audio fastapi uvicorn librosa soundfile python-multipart

# Copy and configure environment
cp .env.example .env
# Edit .env with your settings

2. Start the API

# Using the start script
./start_api.sh

# Or manually
conda activate liquid-audio
python liquid_audio_api.py

3. Access the API

API: http://localhost:5006
Swagger Docs: http://localhost:5006/docs
ReDoc: http://localhost:5006/redoc

API Endpoints

1. Transcribe Audio (Speech-to-Text)

curl -X POST http://localhost:5006/transcribe \
  -F "audio=@your_audio.wav" \
  -F "text_prompt=Transcribe the audio." \
  -F "max_tokens=256"

2. Synthesize Speech (Text-to-Speech)

curl -X POST http://localhost:5006/synthesize \
  -F "text=Hello, this is a test." \
  -F "max_tokens=512"

3. Conversation (Speech-to-Speech)

curl -X POST http://localhost:5006/converse \
  -F "audio=@your_audio.wav" \
  -F "system_prompt=Respond briefly." \
  -F "max_tokens=128"

4. Health Check

curl http://localhost:5006/health | python -m json.tool

Performance Metrics

Transcription (STT): ~12.9x realtime
Synthesis (TTS): ~2.95s for short text
Conversation: ~3.36s full response
VRAM Usage: ~3 GB allocated, ~6 GB reserved

Configuration

Configuration is done via environment variables (see .env.example):

Variable	Default	Description
`API_HOST`	`0.0.0.0`	Listen address
`API_PORT`	`5006`	Listen port
`GPU_DEVICE`	`cuda:0`	GPU to use
`STORAGE_BASE`	`./data`	Data storage directory
`MODEL_BASE`	`./models/cache`	Model cache directory

Systemd Service (Optional)

To run as a system service:

# Create service file
sudo tee /etc/systemd/system/audio-api.service << 'EOF'
[Unit]
Description=LiquidAI Audio API
After=network.target

[Service]
Type=simple
User=your_user
WorkingDirectory=/path/to/audio-api
Environment="PATH=/path/to/miniconda3/envs/liquid-audio/bin"
ExecStart=/path/to/miniconda3/envs/liquid-audio/bin/python liquid_audio_api.py
Restart=on-failure
RestartSec=15

[Install]
WantedBy=multi-user.target
EOF

# Enable and start
sudo systemctl daemon-reload
sudo systemctl enable audio-api
sudo systemctl start audio-api

Storage Structure

./data/                      # STORAGE_BASE
├── temp/                    # Temporary uploads
├── YYYY-MM-DD/             # Daily directories
│   ├── raw/                # Uploaded audio files
│   ├── processed/          # Generated audio files
│   ├── transcriptions.jsonl
│   ├── synthesis.jsonl
│   └── conversations.jsonl

Applications

Audio Dictation App

A lightweight desktop application for voice-to-text with clipboard integration.

Location: ./dictation-app/

cd dictation-app
./start_dictation.sh      # CLI version
./start_webui.sh          # Web UI version (port 7870)

Features:

Press SPACE to record/stop (CLI)
Auto-copy to clipboard
JSONL logging
Audio archive
1-2 second latency

See dictation-app/README.md for details.

Streaming API

Real-time streaming via WebSocket and WebRTC.

Location: ./streaming/

cd streaming
python conversation_stream.py  # Port 5008

See streaming/docs/ for setup guides.

Model Information

Property	Value
Model	LiquidAI/LFM2-Audio-1.5B
Parameters	1.45B
VRAM	~3 GB
Capabilities	STT + TTS + Conversation

Troubleshooting

Check if running

curl http://localhost:5006/health

View logs

# If using systemd
journalctl -u audio-api -f

# Or check data directory
tail -f ./data/$(date +%Y-%m-%d)/*.jsonl

Common Issues

CUDA out of memory: Reduce max_tokens or use smaller batch sizes
Model not found: Run python download_models.py first
Audio format error: Ensure input is WAV, MP3, or FLAC

License

MIT License - See LICENSE file for details.

Version: 1.1.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dictation-app		dictation-app
models		models
static		static
streaming		streaming
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audio_api.py		audio_api.py
download_models.py		download_models.py
fastrtc_test.py		fastrtc_test.py
liquid_audio_api.py		liquid_audio_api.py
requirements.txt		requirements.txt
start_api.sh		start_api.sh
test_api.py		test_api.py
test_audio_inference.py		test_audio_inference.py
test_liquid_audio.py		test_liquid_audio.py
test_models.py		test_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LiquidAI Audio API

Features

Quick Start

1. Setup Environment

2. Start the API

3. Access the API

API Endpoints

1. Transcribe Audio (Speech-to-Text)

2. Synthesize Speech (Text-to-Speech)

3. Conversation (Speech-to-Speech)

4. Health Check

Performance Metrics

Configuration

Systemd Service (Optional)

Storage Structure

Applications

Audio Dictation App

Streaming API

Model Information

Troubleshooting

Check if running

View logs

Common Issues

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

iideadeyeii/liquidai-audio-api

Folders and files

Latest commit

History

Repository files navigation

LiquidAI Audio API

Features

Quick Start

1. Setup Environment

2. Start the API

3. Access the API

API Endpoints

1. Transcribe Audio (Speech-to-Text)

2. Synthesize Speech (Text-to-Speech)

3. Conversation (Speech-to-Speech)

4. Health Check

Performance Metrics

Configuration

Systemd Service (Optional)

Storage Structure

Applications

Audio Dictation App

Streaming API

Model Information

Troubleshooting

Check if running

View logs

Common Issues

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages