Real-time voice-to-text transcription with hotkey support
Maivi (My AI Voice Input) is a cross-platform desktop application that turns your voice into text using state-of-the-art AI models. Simply press Alt+Q (Option+Q on macOS) to start recording, and press again to stop. Your transcription appears in real-time and is automatically copied to your clipboard.
- 🎤 Hotkey Recording - Toggle recording with Alt+Q (Option+Q on macOS)
- ⚡ Real-time Transcription - See text appear as you speak
- 📋 Clipboard Integration - Automatic copy to clipboard
- 🪟 Floating Overlay - Live transcription in a sleek overlay window
- 🔄 Smart Chunk Merging - Advanced overlap-based merging eliminates duplicates
- 💻 CPU-Only - No GPU required (though GPU acceleration is supported)
- 🌍 High Accuracy - Powered by NVIDIA Parakeet TDT 0.6B model (~6-9% WER)
- 🚀 Fast - ~0.36x RTF (processes 7s audio in 2.5s on CPU)
Download the latest executable for your platform from the Releases page:
| Platform | Download | Size |
|---|---|---|
| 🐧 Linux | maivi-linux | ~300MB |
| 🍎 macOS | maivi-macos | ~300MB |
| 🪟 Windows | maivi-windows.exe | ~300MB |
Quick Start:
Linux/macOS:
# Download and make executable
chmod +x maivi-linux # or maivi-macos
# Run
./maivi-linux # or ./maivi-macosWindows:
- Download
maivi-windows.exe - Double-click to run (or run from command prompt)
First Run Notes:
- The first time you run Maivi, it will download the AI model (~600MB)
- On macOS, you may need to grant permissions: System Settings → Privacy & Security
- FFmpeg installation will be offered if not already installed (optional)
Alternatively, install from PyPI:
pip install maivi --extra-index-url https://download.pytorch.org/whl/cpuSee Installation section for more details.
CPU-only (Recommended - much faster, 100MB vs 2GB+):
pip install maivi --extra-index-url https://download.pytorch.org/whl/cpuOr with GPU support (if you have NVIDIA GPU):
pip install maivi --extra-index-url https://download.pytorch.org/whl/cu121Standard install (may download large CUDA files):
pip install maiviLinux:
sudo apt-get install portaudio19-dev python3-pyaudiomacOS: Grant Maivi microphone, Accessibility, and Input Monitoring permissions the first time you run it (System Settings → Privacy & Security). No additional Homebrew packages are required for audio capture.
Windows:
- PortAudio is usually included with PyAudio
GUI Mode (Recommended):
maiviPress Alt+Q (Option+Q on macOS) to start recording, press Alt+Q again to stop. The transcription will appear in a floating overlay and be copied to your clipboard.
CLI Mode:
# Basic CLI
maivi-cli
# With live terminal UI
maia-cli --show-ui
# Custom parameters
maia-cli --window 10 --slide 5 --show-uiControls:
- Alt+Q (Option+Q on macOS) - Start/stop recording (toggle mode)
- Esc - Exit application
Maia uses a sophisticated streaming architecture:
- Sliding Window Recording - Captures audio in overlapping 7-second chunks every 3 seconds
- Real-time Transcription - Each chunk is transcribed by the NVIDIA Parakeet model
- Smart Merging - Chunks are merged using overlap detection (4-second overlap)
- Live Updates - The UI updates in real-time as transcription progresses
Chunk 1: "hello world how are you"
Chunk 2: "how are you doing today"
^^^^^^^^^^^^^^
Overlap detected → merge!
Result: "hello world how are you doing today"
This approach ensures:
- ✅ No words cut mid-syllable
- ✅ Context preserved for better accuracy
- ✅ Seamless merging without duplicates
- ✅ Fast processing (no queue buildup)
maia-cli --window 7.0 --slide 3.0 --delay 2.0--window: Chunk size in seconds (default: 7.0)- Larger = better quality, slower processing
--slide: Slide interval in seconds (default: 3.0)- Smaller = more overlap, higher CPU usage
- Rule: Must be >
window × 0.36to avoid queue buildup
--delay: Processing start delay in seconds (default: 2.0)
# Speed adjustment (experimental)
maia-cli --speed 1.5
# Custom UI width
maia-cli --show-ui --ui-width 50
# Disable pause detection
maia-cli --no-pause-breaks
# Stream to file (for voice commands)
maia-cli --output-file transcription.txtFor developers who want to build executables:
# Install build dependencies
pip install maivi[build]
# Build executable
pyinstaller --onefile \
--name maivi \
--add-data "src/maivi:maivi" \
--hidden-import=nemo \
--hidden-import=nemo.collections.asr \
--hidden-import=PySide6 \
src/maivi/__main__.pyExecutables are automatically built via GitHub Actions for each release. See .github/workflows/build-executables.yml for the full build configuration.
# Clone repository
git clone https://github.com/MaximeRivest/maivi.git
cd maivi
# Install in development mode
pip install -e .[dev]
# Run tests
pytestmaia/
├── src/maia/
│ ├── __init__.py
│ ├── __main__.py # GUI entry point
│ ├── core/
│ │ ├── streaming_recorder.py
│ │ ├── chunk_merger.py
│ │ └── pause_detector.py
│ ├── gui/
│ │ └── qt_gui.py
│ ├── cli/
│ │ ├── cli.py
│ │ ├── server.py
│ │ └── terminal_ui.py
│ └── utils/
├── tests/
├── docs/
├── pyproject.toml
├── README.md
└── LICENSE
This is expected behavior when there are long pauses (5+ seconds of silence). The system adds "..." gap markers to indicate the pause.
Check that processing time < slide interval:
- Processing:
window_seconds × 0.36(RTF) - Should be <
slide_seconds - Default:
7 × 0.36 = 2.52s < 3s✅
The first run downloads the NVIDIA Parakeet model (~600MB) from HuggingFace. If download fails:
- Check internet connection
- Verify HuggingFace is accessible
- Clear cache:
rm -rf ~/.cache/huggingface/
If the GUI crashes on Linux:
# Check Qt installation
python -c "from PySide6 import QtWidgets; print('Qt OK')"
# Fall back to CLI mode
maia-cli --show-uiMemory:
- Model: ~2GB RAM
- Audio buffer: ~1MB
- Total: ~2.5GB RAM
CPU:
- Idle: <5% CPU
- Recording: 30-40% of 1 core
- Transcription: 100% of 1 core (during processing)
Latency:
- First transcription: 2s (start delay)
- Updates: Every 3s (slide interval)
- Completion: 1-3s after recording stops
Accuracy:
- Model WER: ~5-8%
- Overlap merging: <1% word loss
- Total effective WER: ~6-9%
v0.2 - Platform Support:
- Test and verify macOS support
- Test and verify Windows support
- Platform-specific installers (.app, .exe)
v0.3 - Features:
- Configurable hotkeys via GUI
- Multi-language support
- Custom model selection
- Voice commands support
v0.4 - Optimization:
- GPU acceleration (CUDA)
- Export formats (JSON, SRT)
- Text editor integration
- Plugin system
MIT License - see LICENSE file for details.
- Built with NVIDIA NeMo ASR toolkit
- Uses Parakeet TDT 0.6B model
- GUI powered by PySide6
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Made with ❤️ by Maxime Rivest