Whispers - Voice-to-Command System

Transform your voice into executable bash commands with AI-powered intelligence

Whispers is a sophisticated voice-controlled command-line interface that captures your speech, transcribes it using Whisper, and converts natural language into precise bash commands using AI. Say goodbye to typing complex commands!

Features

Core Capabilities

Real-time Speech Recognition - Ultra-low latency audio capture and transcription
AI-Powered Command Generation - Natural language to bash conversion using multiple AI providers
Intelligent Autocorrect - Automatically fixes file/directory names and paths
Multi-Platform Audio - Works with PulseAudio, PipeWire, and ALSA
Shell Integration - Seamless integration with zsh, tmux, and clipboard

AI Provider Support

Anthropic Claude (Haiku/Sonnet) - Fast and accurate
OpenAI GPT - Industry standard performance
Ollama - Local, private inference
Custom APIs - Bring your own AI endpoint

Advanced Features

Voice Activity Detection - Smart silence detection
Pre-buffering - Captures speech from the very beginning
Confirmation Mode - Optional safety prompts before execution
Command History - Automatic zsh history integration
Multi-format Audio - Supports various sample rates and formats

Quick Start

Prerequisites

Python 3.13+
A Whisper server running on port 8080
AI API key (Anthropic, OpenAI, etc.)
Audio device (microphone)

Installation

Clone and setup:

git clone <repository-url>
cd whispers
pip install -e .

Configure AI provider:

cp ai_inference_config.json.example ai_inference.json
# Edit ai_inference.json with your API key

Test audio capture:

./audio_capture.py --list  # List audio devices
./audio_capture.py -d 0    # Test with device 0

Basic Usage

Voice-to-Command Pipeline

# Generate commands (safe mode)
./voice-to-command.sh

# Execute commands automatically (use with caution!)
./voice-to-command.sh --execute

Zsh Integration

Add to your .zshrc:

source ~/whispers/voice-inject.plugin.zsh

# Enable confirmation prompts (recommended)
export VOICE_CONFIRM=1
export VOICE_TIMEOUT=5

Tmux Integration

# Monitor and auto-inject commands into tmux
./tmux-voice-inject.sh monitor &

Voice Command Examples

Just speak naturally:

Say This	Gets Converted To
"List all Python files"	`find . -name "*.py"`
"Show disk usage"	`df -h`
"Count lines in text files"	`find . -name "*.txt" -exec wc -l {} +`
"Find large files over 100MB"	`find . -size +100M -type f`
"Show running processes"	`ps aux`
"Compress this directory"	`tar -czf archive.tar.gz .`
"Watch system logs"	`tail -f /var/log/syslog`

Configuration

Audio Settings

# Low-latency capture with PipeWire
./capture-lowlat.sh --device 0

# Custom audio parameters  
python3 audio_capture.py \
  --sample-rate 44100 \
  --gain 20 \
  --device 2

AI Provider Configuration

{
  "provider": "anthropic",
  "api_key": "your-key-here",
  "model": "claude-3-haiku-20240307",
  "temperature": 0.0,
  "max_tokens": 1000
}

Voice Detection Tuning

python3 streaming_transcriber.py \
  --silence-threshold 0.02 \
  --silence-duration 1.5 \
  --pre-buffer 0.3 \
  --show-levels  # Visual audio level display

Architecture

                                                                      
 Microphone  →  Audio Capture  →  Whisper  →  AI Provider  
              (Low Latency)     Server      (Claude/GPT) 
                                                         
                        ↓                       ↓       
                                                         
 Shell/Tmux  ←  Command Executor  ←  Autocorrect  ←  Bash Command 
 Integration     & History           & Validate      Generation

Components Deep Dive

`audio_capture.py`

High-performance audio capture using sounddevice
Configurable gain, sample rates, and block sizes
Device enumeration and selection
Real-time audio streaming to stdout

`streaming_transcriber.py`

Intelligent voice activity detection
Pre-buffering for complete speech capture
Real-time audio level monitoring
Whisper server communication with retry logic

`ai_inference.py`

Multi-provider AI client (Anthropic, OpenAI, Ollama)
Configurable prompts and parameters
Request/response handling with error recovery
Configuration management

`voice-inject.plugin.zsh`

Intelligent path and filename autocorrection
Case-insensitive file matching
Automatic zsh history integration
Configurable confirmation prompts

Troubleshooting

Audio Issues

# List all audio devices
pactl list short sources

# Test PipeWire/PulseAudio  
pw-record --format=s16 --rate=16000 test.wav

# Check device permissions
groups $USER  # Should include 'audio'

Whisper Server

# Verify server is running
curl http://localhost:8080/health

# Check server logs
docker logs whisper-server

AI Provider Issues

# Test AI inference directly
echo "list files" | python3 ai_inference.py --verbose

# Validate configuration
python3 ai_inference.py --save-config test-config.json

Requirements

System:

Linux (Fedora/Ubuntu tested)
Python 3.13+
PulseAudio/PipeWire
Audio input device

Python Dependencies:

sounddevice>=0.5.2 - Audio capture
numpy>=2.3.2 - Audio processing
requests>=2.32.5 - HTTP client
pyperclip>=1.9.0 - Clipboard integration

External Services:

Whisper server (OpenAI Whisper API compatible)
AI provider (Anthropic/OpenAI/Ollama)

Contributing

Contributions welcome! Areas for improvement:

Additional AI provider integrations
Windows/macOS audio support
Web interface for configuration
Voice command training/customization
Performance optimizations

License

MIT License - See LICENSE file for details

Acknowledgments

OpenAI for Whisper speech recognition
Anthropic for Claude AI models
The sounddevice and NumPy communities
PipeWire/PulseAudio teams for low-latency audio

Security Note: This tool can execute arbitrary commands. Use confirmation mode (VOICE_CONFIRM=1) in production environments and review generated commands before execution.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
bin		bin
plugins		plugins
prompts		prompts
src		src
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
ai_inference_config.json.example		ai_inference_config.json.example
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

silijon/whispers

Folders and files

Latest commit

History

Repository files navigation

Whispers - Voice-to-Command System

Features

Core Capabilities

AI Provider Support

Advanced Features

Quick Start

Prerequisites

Installation

Basic Usage

Voice-to-Command Pipeline

Zsh Integration

Tmux Integration

Voice Command Examples

Configuration

Audio Settings

AI Provider Configuration

Voice Detection Tuning

Architecture

Components Deep Dive

audio_capture.py

streaming_transcriber.py

ai_inference.py

voice-inject.plugin.zsh

Troubleshooting

Audio Issues

Whisper Server

AI Provider Issues

Requirements

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`audio_capture.py`

`streaming_transcriber.py`

`ai_inference.py`

`voice-inject.plugin.zsh`

Packages