Automatically process meeting videos to extract key moments, topics, and takeaways. The tool generates a navigable summary webpage with video chapters.
- Extracts audio from video files
- Transcribes speech to text using insanely-fast-whisper
- Uses Whisper large-v3 model by default for best accuracy
- Optimized for both NVIDIA GPUs and Apple Silicon
- Flash Attention 2 support for NVIDIA GPUs
- Word-level timestamp support
- Analyzes content to identify:
- Major topics with timestamps
- Key moments within each topic
- Actionable takeaways
- Adds chapter markers to the video
- Generates an interactive HTML summary
- Clickable timestamps for video navigation
- Organized by topics
- Highlights key moments and takeaways
- Python 3.11+
- ffmpeg
- NVIDIA GPU or Apple Silicon Mac
- Conda (for environment management)
-
Install ffmpeg:
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt-get install ffmpeg # Windows # Download from https://ffmpeg.org/download.html
-
Set up the conda environment:
# Create and activate the conda environment conda env create -f environment.yml conda activate video-auto-index
This will automatically install all dependencies, including insanely-fast-whisper.
-
Set up your Anthropic API key:
export ANTHROPIC_API_KEY='your-api-key'
Process a video file:
python -m src.main [video path] [--output-dir output] [--device-id DEVICE]
Device options:
- Default: Automatically uses MPS on Apple Silicon, CPU/CUDA on other systems
--device-id 0
: Force CPU/CUDA device--device-id mps
: Force MPS device on Apple Silicon
For example:
# Use default device (auto-detected)
python -m src.main /path/to/video.mp4 --output-dir output
# Force CPU/CUDA device
python -m src.main /path/to/video.mp4 --output-dir output --device-id 0
# Force MPS device on Apple Silicon
python -m src.main /path/to/video.mp4 --output-dir output --device-id mps
This will:
- Extract audio from the video
- Transcribe the audio using insanely-fast-whisper
- Analyze the meeting content for topics and key moments
- Generate an HTML summary page
The output directory will contain:
audio.wav
: Extracted audio fileaudio_transcript.json
: Transcribed speech with timestampsaudio_subtitles.srt
: Generated subtitlesmeeting_analysis.json
: Extracted topics, moments, and takeaways
Final web output is stored in:
<video_base_path>/<video_filename>_summary.html
The analysis JSON follows this structure:
[
{
"topic": "Topic description",
"timestamp": "HH:MM:SS,mmm",
"key_moments": [
{
"description": "Key moment description",
"timestamp": "HH:MM:SS,mmm"
}
],
"takeaways": [
"Actionable takeaway 1",
"Actionable takeaway 2"
]
}
]
The project is organized into modular components:
video_processor.py
: Handles video/audio operationstranscriber.py
: Speech-to-text conversion using insanely-fast-whisperkey_moments.py
: AI content analysisweb_generator.py
: HTML summary generationmain.py
: Pipeline orchestration
Each component can be run independently, allowing for flexible processing pipelines.
The project includes a comprehensive test suite covering all components:
Run all tests:
pytest
Run with coverage report:
pytest --cov=src tests/
Run specific test categories:
# Unit tests only
pytest -v -m "not integration"
# Integration tests only
pytest -v -m "integration"
test_video_processor.py
: Tests for video and audio processingtest_transcriber.py
: Tests for speech-to-text conversiontest_key_moments.py
: Tests for content analysis with API mockingtest_web_generator.py
: Tests for HTML generationtest_main.py
: Integration tests for the full pipeline
The test suite includes:
- Unit tests for each component
- Integration tests for the full pipeline
- API mocking for external services
- Fixture-based test data
- Error handling verification
- Edge case validation
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request