Skip to content

jlportman3/friday-livekit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Agent - 100% Local AI Voice Processing

A completely local, containerized voice agent that can listen to speech, transcribe it using GPU-accelerated Whisper, and respond using local TTS and Ollama LLM. No API keys required! This serves as the foundation for an AI-powered receptionist system.

Features

  • 🎤 Local Speech Processing: GPU-accelerated Whisper for transcription (no API calls)
  • 🗣️ Local Text-to-Speech: eSpeak/Festival TTS engines (completely offline)
  • 🧠 Local AI Conversation: Ollama LLM for intelligent responses (runs on your GPU)
  • 🌐 Web Interface: Simple browser-based testing client
  • 🐳 Containerized: Complete Docker setup with GPU support
  • LiveKit Integration: Real-time audio streaming and room management
  • 🔒 Privacy First: Everything runs locally, no data sent to external services

Quick Start

Prerequisites

  • Docker and Docker Compose
  • NVIDIA GPU with Docker GPU support (recommended for best performance)
  • No API keys needed! Everything runs locally

Setup

  1. Clone and navigate to the project:

    cd friday-livekit
  2. Create environment file (optional - defaults work fine):

    cp .env.example .env
    # No API keys needed! Edit only if you want to change models

    Key overrides you might want:

    • LIVEKIT_HTTP_PORT / LIVEKIT_UDP_PORT to avoid host port conflicts
    • PUBLIC_LIVEKIT_URL when exposing LiveKit on a different hostname/port
    • OLLAMA_HOST if your Ollama server runs somewhere other than localhost
  3. Start Ollama locally and download the AI model:

    # Ensure the Ollama server is running on your host machine
    # (see https://ollama.ai/download for installation instructions)
    
    chmod +x setup-ollama.sh
    ./setup-ollama.sh
  4. Build and start all services:

    docker-compose up --build
  5. Access the web interface: Open http://localhost:8080 in your browser

Testing the Voice Agent

  1. Connect to the agent:

  2. Start voice interaction:

    • Click "Start Speaking" to activate your microphone
    • Speak into your microphone
    • The agent will transcribe your speech and respond with "I heard you say: [your text]"
  3. Monitor the interaction:

    • Watch the audio levels for both your microphone and agent response
    • Check the transcript panel for conversation history
    • System messages will show connection status and errors

Architecture

Components

  • LiveKit Server: Real-time audio streaming (port 7880)
  • Ollama Server: Local LLM for intelligent responses (port 11434)
  • Voice Agent: Python service with local Whisper + eSpeak TTS (container)
  • Web Client: Browser-based testing interface (port 8080)

Voice Processing Pipeline

User Speech → Microphone → LiveKit → Voice Agent → Whisper (GPU) → Transcription
                                                      ↓
                                                   Ollama LLM → AI Response
                                                      ↓
User Hears ← Speakers ← LiveKit ← Voice Agent ← eSpeak TTS ← Response Text

Configuration

Environment Variables

Variable Default Description
LIVEKIT_URL ws://localhost:7880 LiveKit server URL
WHISPER_MODEL base Whisper model size (tiny/base/small/medium/large)
TTS_MODEL espeak Local TTS engine (espeak/piper/festival)
OLLAMA_HOST http://localhost:11434 Ollama server URL
USE_OLLAMA true Enable intelligent AI responses
ROOM_NAME voice-agent-room Default LiveKit room name
LIVEKIT_HTTP_PORT 7880 Host port exposed for LiveKit HTTP/WebSocket
LIVEKIT_UDP_PORT 7881 Host port exposed for LiveKit RTP/UDP
PUBLIC_LIVEKIT_URL (empty) Overrides the URL returned to the web client
PUBLIC_HOSTNAME (empty) Hostname used if PUBLIC_LIVEKIT_URL is unset
PUBLIC_USE_TLS false Set to true when LiveKit is exposed via TLS

Whisper Model Options (All Local)

  • tiny: Fastest, least accurate (~39 MB)
  • base: Good balance (~74 MB) - Default
  • small: Better accuracy (~244 MB)
  • medium: Even better accuracy (~769 MB)
  • large: Best accuracy, slowest (~1550 MB)

Local TTS Options

  • espeak: Fast, robotic voice - Default
  • piper: Higher quality (requires model download)
  • festival: Alternative TTS engine

Development

Project Structure

friday-livekit/
├── voice_agent/
│   ├── main_agent.py          # Main agent orchestrator
│   ├── speech_processor.py    # Whisper + TTS processing
│   └── config.py              # Configuration management
├── web_client/
│   ├── index.html             # Web interface
│   ├── app.js                 # LiveKit client logic
│   └── style.css              # UI styling
├── docker-compose.yml         # Complete stack
├── Dockerfile.voice-agent     # Voice agent container
└── requirements-voice.txt     # Python dependencies

Running in Development

  1. Start services individually:

    # Start LiveKit and Ollama
    docker-compose up livekit -d
    # Ensure your local Ollama server is running separately (see step 3 above)
    
    # Set up Ollama model
    ./setup-ollama.sh
  2. Run voice agent locally:

    pip install -r requirements-voice.txt
    # No API keys needed!
    python -m voice_agent.main_agent
  3. Serve web client:

    cd web_client
    python -m http.server 8080

Troubleshooting

Common Issues

  1. "Connection failed" in web interface:

    • Ensure LiveKit server is running on port 7880
    • Check browser console for detailed errors
    • Verify network connectivity between containers
  2. "Microphone error" messages:

    • Grant microphone permissions in your browser
    • Ensure no other applications are using the microphone
    • Try refreshing the page and reconnecting
  3. Agent not responding:

    • Check that Ollama is running: curl http://localhost:11434/api/tags
    • Monitor voice agent container logs: docker logs voice-agent
    • Verify GPU is available: docker run --gpus all nvidia/cuda:12.1-runtime-ubuntu22.04 nvidia-smi
    • Ensure Ollama model is downloaded: ./setup-ollama.sh
  4. Poor transcription quality:

    • Try a larger Whisper model (small/medium/large)
    • Ensure good microphone quality and minimal background noise
    • Check that GPU acceleration is working
  5. Robotic TTS voice:

    • This is normal with eSpeak (default)
    • For better quality, try setting TTS_MODEL=piper (requires model download)
    • Or use TTS_MODEL=festival for an alternative voice

Logs and Monitoring

# View all service logs
   docker-compose logs -f

# View specific service logs
   docker-compose logs -f voice-agent
   docker-compose logs -f livekit

# Check service health
   docker-compose ps

Next Steps

This voice agent serves as the foundation for more advanced features:

  • SIP Integration: Bridge with Asterisk servers for phone calls
  • AI Conversation: Add LLM-powered conversation management
  • Intent Recognition: Implement routing based on user requests
  • Database Storage: Store conversation history and analytics
  • Production Deployment: Scale for multiple concurrent users

License

This project builds upon the LiveKit ecosystem and OpenAI services. Please ensure compliance with their respective terms of service.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •