Voice Agent - 100% Local AI Voice Processing

A completely local, containerized voice agent that can listen to speech, transcribe it using GPU-accelerated Whisper, and respond using local TTS and Ollama LLM. No API keys required! This serves as the foundation for an AI-powered receptionist system.

Features

🎤 Local Speech Processing: GPU-accelerated Whisper for transcription (no API calls)
🗣️ Local Text-to-Speech: eSpeak/Festival TTS engines (completely offline)
🧠 Local AI Conversation: Ollama LLM for intelligent responses (runs on your GPU)
🌐 Web Interface: Simple browser-based testing client
🐳 Containerized: Complete Docker setup with GPU support
⚡ LiveKit Integration: Real-time audio streaming and room management
🔒 Privacy First: Everything runs locally, no data sent to external services

Quick Start

Prerequisites

Docker and Docker Compose
NVIDIA GPU with Docker GPU support (recommended for best performance)
No API keys needed! Everything runs locally

Setup

Clone and navigate to the project:
```
cd friday-livekit
```
Create environment file (optional - defaults work fine):
```
cp .env.example .env
# No API keys needed! Edit only if you want to change models
```
Key overrides you might want:
- LIVEKIT_HTTP_PORT / LIVEKIT_UDP_PORT to avoid host port conflicts
- PUBLIC_LIVEKIT_URL when exposing LiveKit on a different hostname/port
- OLLAMA_HOST if your Ollama server runs somewhere other than localhost

Start Ollama locally and download the AI model:

# Ensure the Ollama server is running on your host machine
# (see https://ollama.ai/download for installation instructions)

chmod +x setup-ollama.sh
./setup-ollama.sh

Build and start all services:
```
docker-compose up --build
```
Access the web interface: Open http://localhost:8080 in your browser

Testing the Voice Agent

Connect to the agent:
- Open http://localhost:8080
- Click "Connect to Agent"
- Wait for the connection status to show "Connected"
Start voice interaction:
- Click "Start Speaking" to activate your microphone
- Speak into your microphone
- The agent will transcribe your speech and respond with "I heard you say: [your text]"
Monitor the interaction:
- Watch the audio levels for both your microphone and agent response
- Check the transcript panel for conversation history
- System messages will show connection status and errors

Architecture

Components

LiveKit Server: Real-time audio streaming (port 7880)
Ollama Server: Local LLM for intelligent responses (port 11434)
Voice Agent: Python service with local Whisper + eSpeak TTS (container)
Web Client: Browser-based testing interface (port 8080)

Voice Processing Pipeline

User Speech → Microphone → LiveKit → Voice Agent → Whisper (GPU) → Transcription
                                                      ↓
                                                   Ollama LLM → AI Response
                                                      ↓
User Hears ← Speakers ← LiveKit ← Voice Agent ← eSpeak TTS ← Response Text

Configuration

Environment Variables

Variable	Default	Description
`LIVEKIT_URL`	`ws://localhost:7880`	LiveKit server URL
`WHISPER_MODEL`	`base`	Whisper model size (tiny/base/small/medium/large)
`TTS_MODEL`	`espeak`	Local TTS engine (espeak/piper/festival)
`OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`USE_OLLAMA`	`true`	Enable intelligent AI responses
`ROOM_NAME`	`voice-agent-room`	Default LiveKit room name
`LIVEKIT_HTTP_PORT`	`7880`	Host port exposed for LiveKit HTTP/WebSocket
`LIVEKIT_UDP_PORT`	`7881`	Host port exposed for LiveKit RTP/UDP
`PUBLIC_LIVEKIT_URL`	(empty)	Overrides the URL returned to the web client
`PUBLIC_HOSTNAME`	(empty)	Hostname used if `PUBLIC_LIVEKIT_URL` is unset
`PUBLIC_USE_TLS`	`false`	Set to `true` when LiveKit is exposed via TLS

Whisper Model Options (All Local)

tiny: Fastest, least accurate (~39 MB)
base: Good balance (~74 MB) - Default
small: Better accuracy (~244 MB)
medium: Even better accuracy (~769 MB)
large: Best accuracy, slowest (~1550 MB)

Local TTS Options

espeak: Fast, robotic voice - Default
piper: Higher quality (requires model download)
festival: Alternative TTS engine

Development

Project Structure

friday-livekit/
├── voice_agent/
│   ├── main_agent.py          # Main agent orchestrator
│   ├── speech_processor.py    # Whisper + TTS processing
│   └── config.py              # Configuration management
├── web_client/
│   ├── index.html             # Web interface
│   ├── app.js                 # LiveKit client logic
│   └── style.css              # UI styling
├── docker-compose.yml         # Complete stack
├── Dockerfile.voice-agent     # Voice agent container
└── requirements-voice.txt     # Python dependencies

Running in Development

Start services individually:

# Start LiveKit and Ollama
docker-compose up livekit -d
# Ensure your local Ollama server is running separately (see step 3 above)

# Set up Ollama model
./setup-ollama.sh

Run voice agent locally:

pip install -r requirements-voice.txt
# No API keys needed!
python -m voice_agent.main_agent

Serve web client:

cd web_client
python -m http.server 8080

Troubleshooting

Common Issues

"Connection failed" in web interface:
- Ensure LiveKit server is running on port 7880
- Check browser console for detailed errors
- Verify network connectivity between containers
"Microphone error" messages:
- Grant microphone permissions in your browser
- Ensure no other applications are using the microphone
- Try refreshing the page and reconnecting
Agent not responding:
- Check that Ollama is running: curl http://localhost:11434/api/tags
- Monitor voice agent container logs: docker logs voice-agent
- Verify GPU is available: docker run --gpus all nvidia/cuda:12.1-runtime-ubuntu22.04 nvidia-smi
- Ensure Ollama model is downloaded: ./setup-ollama.sh
Poor transcription quality:
- Try a larger Whisper model (small/medium/large)
- Ensure good microphone quality and minimal background noise
- Check that GPU acceleration is working
Robotic TTS voice:
- This is normal with eSpeak (default)
- For better quality, try setting TTS_MODEL=piper (requires model download)
- Or use TTS_MODEL=festival for an alternative voice

Logs and Monitoring

# View all service logs
   docker-compose logs -f

# View specific service logs
   docker-compose logs -f voice-agent
   docker-compose logs -f livekit

# Check service health
   docker-compose ps

Next Steps

This voice agent serves as the foundation for more advanced features:

SIP Integration: Bridge with Asterisk servers for phone calls
AI Conversation: Add LLM-powered conversation management
Intent Recognition: Implement routing based on user requests
Database Storage: Store conversation history and analytics
Production Deployment: Scale for multiple concurrent users

License

This project builds upon the LiveKit ecosystem and OpenAI services. Please ensure compliance with their respective terms of service.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
nginx		nginx
voice-echo-agent		voice-echo-agent
voice_agent		voice_agent
web_client		web_client
.env.example		.env.example
.gitignore		.gitignore
DEMO.md		DEMO.md
DEMO_SUMMARY.md		DEMO_SUMMARY.md
Dockerfile.voice-agent		Dockerfile.voice-agent
FIX_MICROPHONE.md		FIX_MICROPHONE.md
GEMINI.md		GEMINI.md
HTTPS_SETUP.md		HTTPS_SETUP.md
Makefile		Makefile
README.md		README.md
START_HERE.md		START_HERE.md
config.json		config.json
demo_test.py		demo_test.py
diagnose.sh		diagnose.sh
docker-compose.yml		docker-compose.yml
docker-compose.yml.backup		docker-compose.yml.backup
generate_token.py		generate_token.py
implementation_plan.md		implementation_plan.md
quick_demo.sh		quick_demo.sh
requirements-voice.txt		requirements-voice.txt
setup-ollama.sh		setup-ollama.sh
setup_https.sh		setup_https.sh
start.sh		start.sh
test-build.sh		test-build.sh
test_connection.html		test_connection.html
test_ws.py		test_ws.py
wait-for-it.sh		wait-for-it.sh
watch_logs.sh		watch_logs.sh
web_server.py		web_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Agent - 100% Local AI Voice Processing

Features

Quick Start

Prerequisites

Setup

Testing the Voice Agent

Architecture

Components

Voice Processing Pipeline

Configuration

Environment Variables

Whisper Model Options (All Local)

Local TTS Options

Development

Project Structure

Running in Development

Troubleshooting

Common Issues

Logs and Monitoring

Next Steps

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

jlportman3/friday-livekit

Folders and files

Latest commit

History

Repository files navigation

Voice Agent - 100% Local AI Voice Processing

Features

Quick Start

Prerequisites

Setup

Testing the Voice Agent

Architecture

Components

Voice Processing Pipeline

Configuration

Environment Variables

Whisper Model Options (All Local)

Local TTS Options

Development

Project Structure

Running in Development

Troubleshooting

Common Issues

Logs and Monitoring

Next Steps

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages