Skip to content

iideadeyeii/liquidai-audio-api

Repository files navigation

LiquidAI Audio API

FastAPI service for Speech-to-Text, Text-to-Speech, and Conversation

Powered by LiquidAI LFM2-Audio-1.5B


Features

  • Speech-to-Text (STT): Transcribe audio at 12.9x realtime
  • Text-to-Speech (TTS): High-quality synthesis
  • Conversation (Speech-to-Speech): Full voice-to-voice conversations
  • End-to-end model: Single model handles all audio tasks
  • GPU Accelerated: CUDA support with Blackwell compatibility

Quick Start

1. Setup Environment

# Clone the repository
git clone https://github.com/yourusername/audio-api.git
cd audio-api

# Create conda environment
conda create -n liquid-audio python=3.12
conda activate liquid-audio

# Install dependencies
pip install torch --index-url https://download.pytorch.org/whl/cu128
pip install liquid-audio fastapi uvicorn librosa soundfile python-multipart

# Copy and configure environment
cp .env.example .env
# Edit .env with your settings

2. Start the API

# Using the start script
./start_api.sh

# Or manually
conda activate liquid-audio
python liquid_audio_api.py

3. Access the API

API Endpoints

1. Transcribe Audio (Speech-to-Text)

curl -X POST http://localhost:5006/transcribe \
  -F "audio=@your_audio.wav" \
  -F "text_prompt=Transcribe the audio." \
  -F "max_tokens=256"

2. Synthesize Speech (Text-to-Speech)

curl -X POST http://localhost:5006/synthesize \
  -F "text=Hello, this is a test." \
  -F "max_tokens=512"

3. Conversation (Speech-to-Speech)

curl -X POST http://localhost:5006/converse \
  -F "audio=@your_audio.wav" \
  -F "system_prompt=Respond briefly." \
  -F "max_tokens=128"

4. Health Check

curl http://localhost:5006/health | python -m json.tool

Performance Metrics

  • Transcription (STT): ~12.9x realtime
  • Synthesis (TTS): ~2.95s for short text
  • Conversation: ~3.36s full response
  • VRAM Usage: ~3 GB allocated, ~6 GB reserved

Configuration

Configuration is done via environment variables (see .env.example):

Variable Default Description
API_HOST 0.0.0.0 Listen address
API_PORT 5006 Listen port
GPU_DEVICE cuda:0 GPU to use
STORAGE_BASE ./data Data storage directory
MODEL_BASE ./models/cache Model cache directory

Systemd Service (Optional)

To run as a system service:

# Create service file
sudo tee /etc/systemd/system/audio-api.service << 'EOF'
[Unit]
Description=LiquidAI Audio API
After=network.target

[Service]
Type=simple
User=your_user
WorkingDirectory=/path/to/audio-api
Environment="PATH=/path/to/miniconda3/envs/liquid-audio/bin"
ExecStart=/path/to/miniconda3/envs/liquid-audio/bin/python liquid_audio_api.py
Restart=on-failure
RestartSec=15

[Install]
WantedBy=multi-user.target
EOF

# Enable and start
sudo systemctl daemon-reload
sudo systemctl enable audio-api
sudo systemctl start audio-api

Storage Structure

./data/                      # STORAGE_BASE
├── temp/                    # Temporary uploads
├── YYYY-MM-DD/             # Daily directories
│   ├── raw/                # Uploaded audio files
│   ├── processed/          # Generated audio files
│   ├── transcriptions.jsonl
│   ├── synthesis.jsonl
│   └── conversations.jsonl

Applications

Audio Dictation App

A lightweight desktop application for voice-to-text with clipboard integration.

Location: ./dictation-app/

cd dictation-app
./start_dictation.sh      # CLI version
./start_webui.sh          # Web UI version (port 7870)

Features:

  • Press SPACE to record/stop (CLI)
  • Auto-copy to clipboard
  • JSONL logging
  • Audio archive
  • 1-2 second latency

See dictation-app/README.md for details.

Streaming API

Real-time streaming via WebSocket and WebRTC.

Location: ./streaming/

cd streaming
python conversation_stream.py  # Port 5008

See streaming/docs/ for setup guides.

Model Information

Property Value
Model LiquidAI/LFM2-Audio-1.5B
Parameters 1.45B
VRAM ~3 GB
Capabilities STT + TTS + Conversation

Troubleshooting

Check if running

curl http://localhost:5006/health

View logs

# If using systemd
journalctl -u audio-api -f

# Or check data directory
tail -f ./data/$(date +%Y-%m-%d)/*.jsonl

Common Issues

  1. CUDA out of memory: Reduce max_tokens or use smaller batch sizes
  2. Model not found: Run python download_models.py first
  3. Audio format error: Ensure input is WAV, MP3, or FLAC

License

MIT License - See LICENSE file for details.


Version: 1.1.0

About

FastAPI service for Speech-to-Text, Text-to-Speech, and Conversation using LiquidAI LFM2-Audio-1.5B

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •