Transform your voice into executable bash commands with AI-powered intelligence
Whispers is a sophisticated voice-controlled command-line interface that captures your speech, transcribes it using Whisper, and converts natural language into precise bash commands using AI. Say goodbye to typing complex commands!
- Real-time Speech Recognition - Ultra-low latency audio capture and transcription
- AI-Powered Command Generation - Natural language to bash conversion using multiple AI providers
- Intelligent Autocorrect - Automatically fixes file/directory names and paths
- Multi-Platform Audio - Works with PulseAudio, PipeWire, and ALSA
- Shell Integration - Seamless integration with zsh, tmux, and clipboard
- Anthropic Claude (Haiku/Sonnet) - Fast and accurate
- OpenAI GPT - Industry standard performance
- Ollama - Local, private inference
- Custom APIs - Bring your own AI endpoint
- Voice Activity Detection - Smart silence detection
- Pre-buffering - Captures speech from the very beginning
- Confirmation Mode - Optional safety prompts before execution
- Command History - Automatic zsh history integration
- Multi-format Audio - Supports various sample rates and formats
- Python 3.13+
- A Whisper server running on port 8080
- AI API key (Anthropic, OpenAI, etc.)
- Audio device (microphone)
- Clone and setup:
git clone <repository-url>
cd whispers
pip install -e .
- Configure AI provider:
cp ai_inference_config.json.example ai_inference.json
# Edit ai_inference.json with your API key
- Test audio capture:
./audio_capture.py --list # List audio devices
./audio_capture.py -d 0 # Test with device 0
# Generate commands (safe mode)
./voice-to-command.sh
# Execute commands automatically (use with caution!)
./voice-to-command.sh --execute
Add to your .zshrc
:
source ~/whispers/voice-inject.plugin.zsh
# Enable confirmation prompts (recommended)
export VOICE_CONFIRM=1
export VOICE_TIMEOUT=5
# Monitor and auto-inject commands into tmux
./tmux-voice-inject.sh monitor &
Just speak naturally:
Say This | Gets Converted To |
---|---|
"List all Python files" | find . -name "*.py" |
"Show disk usage" | df -h |
"Count lines in text files" | find . -name "*.txt" -exec wc -l {} + |
"Find large files over 100MB" | find . -size +100M -type f |
"Show running processes" | ps aux |
"Compress this directory" | tar -czf archive.tar.gz . |
"Watch system logs" | tail -f /var/log/syslog |
# Low-latency capture with PipeWire
./capture-lowlat.sh --device 0
# Custom audio parameters
python3 audio_capture.py \
--sample-rate 44100 \
--gain 20 \
--device 2
{
"provider": "anthropic",
"api_key": "your-key-here",
"model": "claude-3-haiku-20240307",
"temperature": 0.0,
"max_tokens": 1000
}
python3 streaming_transcriber.py \
--silence-threshold 0.02 \
--silence-duration 1.5 \
--pre-buffer 0.3 \
--show-levels # Visual audio level display
Microphone → Audio Capture → Whisper → AI Provider
(Low Latency) Server (Claude/GPT)
↓ ↓
Shell/Tmux ← Command Executor ← Autocorrect ← Bash Command
Integration & History & Validate Generation
- High-performance audio capture using sounddevice
- Configurable gain, sample rates, and block sizes
- Device enumeration and selection
- Real-time audio streaming to stdout
- Intelligent voice activity detection
- Pre-buffering for complete speech capture
- Real-time audio level monitoring
- Whisper server communication with retry logic
- Multi-provider AI client (Anthropic, OpenAI, Ollama)
- Configurable prompts and parameters
- Request/response handling with error recovery
- Configuration management
- Intelligent path and filename autocorrection
- Case-insensitive file matching
- Automatic zsh history integration
- Configurable confirmation prompts
# List all audio devices
pactl list short sources
# Test PipeWire/PulseAudio
pw-record --format=s16 --rate=16000 test.wav
# Check device permissions
groups $USER # Should include 'audio'
# Verify server is running
curl http://localhost:8080/health
# Check server logs
docker logs whisper-server
# Test AI inference directly
echo "list files" | python3 ai_inference.py --verbose
# Validate configuration
python3 ai_inference.py --save-config test-config.json
System:
- Linux (Fedora/Ubuntu tested)
- Python 3.13+
- PulseAudio/PipeWire
- Audio input device
Python Dependencies:
sounddevice>=0.5.2
- Audio capturenumpy>=2.3.2
- Audio processingrequests>=2.32.5
- HTTP clientpyperclip>=1.9.0
- Clipboard integration
External Services:
- Whisper server (OpenAI Whisper API compatible)
- AI provider (Anthropic/OpenAI/Ollama)
Contributions welcome! Areas for improvement:
- Additional AI provider integrations
- Windows/macOS audio support
- Web interface for configuration
- Voice command training/customization
- Performance optimizations
MIT License - See LICENSE file for details
- OpenAI for Whisper speech recognition
- Anthropic for Claude AI models
- The sounddevice and NumPy communities
- PipeWire/PulseAudio teams for low-latency audio
Security Note: This tool can execute arbitrary commands. Use confirmation mode (VOICE_CONFIRM=1
) in production environments and review generated commands before execution.