A completely local, containerized voice agent that can listen to speech, transcribe it using GPU-accelerated Whisper, and respond using local TTS and Ollama LLM. No API keys required! This serves as the foundation for an AI-powered receptionist system.
- 🎤 Local Speech Processing: GPU-accelerated Whisper for transcription (no API calls)
- 🗣️ Local Text-to-Speech: eSpeak/Festival TTS engines (completely offline)
- 🧠 Local AI Conversation: Ollama LLM for intelligent responses (runs on your GPU)
- 🌐 Web Interface: Simple browser-based testing client
- 🐳 Containerized: Complete Docker setup with GPU support
- ⚡ LiveKit Integration: Real-time audio streaming and room management
- 🔒 Privacy First: Everything runs locally, no data sent to external services
- Docker and Docker Compose
- NVIDIA GPU with Docker GPU support (recommended for best performance)
- No API keys needed! Everything runs locally
-
Clone and navigate to the project:
cd friday-livekit -
Create environment file (optional - defaults work fine):
cp .env.example .env # No API keys needed! Edit only if you want to change modelsKey overrides you might want:
LIVEKIT_HTTP_PORT/LIVEKIT_UDP_PORTto avoid host port conflictsPUBLIC_LIVEKIT_URLwhen exposing LiveKit on a different hostname/portOLLAMA_HOSTif your Ollama server runs somewhere other than localhost
-
Start Ollama locally and download the AI model:
# Ensure the Ollama server is running on your host machine # (see https://ollama.ai/download for installation instructions) chmod +x setup-ollama.sh ./setup-ollama.sh
-
Build and start all services:
docker-compose up --build
-
Access the web interface: Open http://localhost:8080 in your browser
-
Connect to the agent:
- Open http://localhost:8080
- Click "Connect to Agent"
- Wait for the connection status to show "Connected"
-
Start voice interaction:
- Click "Start Speaking" to activate your microphone
- Speak into your microphone
- The agent will transcribe your speech and respond with "I heard you say: [your text]"
-
Monitor the interaction:
- Watch the audio levels for both your microphone and agent response
- Check the transcript panel for conversation history
- System messages will show connection status and errors
- LiveKit Server: Real-time audio streaming (port 7880)
- Ollama Server: Local LLM for intelligent responses (port 11434)
- Voice Agent: Python service with local Whisper + eSpeak TTS (container)
- Web Client: Browser-based testing interface (port 8080)
User Speech → Microphone → LiveKit → Voice Agent → Whisper (GPU) → Transcription
↓
Ollama LLM → AI Response
↓
User Hears ← Speakers ← LiveKit ← Voice Agent ← eSpeak TTS ← Response Text
| Variable | Default | Description |
|---|---|---|
LIVEKIT_URL |
ws://localhost:7880 |
LiveKit server URL |
WHISPER_MODEL |
base |
Whisper model size (tiny/base/small/medium/large) |
TTS_MODEL |
espeak |
Local TTS engine (espeak/piper/festival) |
OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
USE_OLLAMA |
true |
Enable intelligent AI responses |
ROOM_NAME |
voice-agent-room |
Default LiveKit room name |
LIVEKIT_HTTP_PORT |
7880 |
Host port exposed for LiveKit HTTP/WebSocket |
LIVEKIT_UDP_PORT |
7881 |
Host port exposed for LiveKit RTP/UDP |
PUBLIC_LIVEKIT_URL |
(empty) | Overrides the URL returned to the web client |
PUBLIC_HOSTNAME |
(empty) | Hostname used if PUBLIC_LIVEKIT_URL is unset |
PUBLIC_USE_TLS |
false |
Set to true when LiveKit is exposed via TLS |
tiny: Fastest, least accurate (~39 MB)base: Good balance (~74 MB) - Defaultsmall: Better accuracy (~244 MB)medium: Even better accuracy (~769 MB)large: Best accuracy, slowest (~1550 MB)
espeak: Fast, robotic voice - Defaultpiper: Higher quality (requires model download)festival: Alternative TTS engine
friday-livekit/
├── voice_agent/
│ ├── main_agent.py # Main agent orchestrator
│ ├── speech_processor.py # Whisper + TTS processing
│ └── config.py # Configuration management
├── web_client/
│ ├── index.html # Web interface
│ ├── app.js # LiveKit client logic
│ └── style.css # UI styling
├── docker-compose.yml # Complete stack
├── Dockerfile.voice-agent # Voice agent container
└── requirements-voice.txt # Python dependencies
-
Start services individually:
# Start LiveKit and Ollama docker-compose up livekit -d # Ensure your local Ollama server is running separately (see step 3 above) # Set up Ollama model ./setup-ollama.sh
-
Run voice agent locally:
pip install -r requirements-voice.txt # No API keys needed! python -m voice_agent.main_agent -
Serve web client:
cd web_client python -m http.server 8080
-
"Connection failed" in web interface:
- Ensure LiveKit server is running on port 7880
- Check browser console for detailed errors
- Verify network connectivity between containers
-
"Microphone error" messages:
- Grant microphone permissions in your browser
- Ensure no other applications are using the microphone
- Try refreshing the page and reconnecting
-
Agent not responding:
- Check that Ollama is running:
curl http://localhost:11434/api/tags - Monitor voice agent container logs:
docker logs voice-agent - Verify GPU is available:
docker run --gpus all nvidia/cuda:12.1-runtime-ubuntu22.04 nvidia-smi - Ensure Ollama model is downloaded:
./setup-ollama.sh
- Check that Ollama is running:
-
Poor transcription quality:
- Try a larger Whisper model (small/medium/large)
- Ensure good microphone quality and minimal background noise
- Check that GPU acceleration is working
-
Robotic TTS voice:
- This is normal with eSpeak (default)
- For better quality, try setting
TTS_MODEL=piper(requires model download) - Or use
TTS_MODEL=festivalfor an alternative voice
# View all service logs
docker-compose logs -f
# View specific service logs
docker-compose logs -f voice-agent
docker-compose logs -f livekit
# Check service health
docker-compose psThis voice agent serves as the foundation for more advanced features:
- SIP Integration: Bridge with Asterisk servers for phone calls
- AI Conversation: Add LLM-powered conversation management
- Intent Recognition: Implement routing based on user requests
- Database Storage: Store conversation history and analytics
- Production Deployment: Scale for multiple concurrent users
This project builds upon the LiveKit ecosystem and OpenAI services. Please ensure compliance with their respective terms of service.