A high-performance, production-ready speech-to-text API service based on the faster-whisper library.
- Efficient transcription using the CTranslate2-based faster-whisper implementation
- Support for all Whisper models (tiny, base, small, medium, large-v2, large-v3, distil)
- REST API with comprehensive endpoints
- Background task processing
- Word-level timestamps
- Voice Activity Detection (VAD) for filtering silence
- Batched inference for faster processing
- Docker and Docker Compose support for easy deployment
- CUDA support for GPU acceleration
- Docker and Docker Compose (recommended)
- NVIDIA GPU with CUDA support (optional but recommended for performance)
- Python 3.8+ (if not using Docker)
-
Clone this repository:
git clone https://github.com/msjd78/Whisper-FastAPI-Transcription-Service.git cd Whisper-FastAPI-Transcription-Service -
Start the service:
docker-compose up -d
-
The service will be available at
http://localhost:8000.
-
Install dependencies:
pip install -r requirements.txt
or
pip install fastapi uvicorn python-multipart faster-whisper pydantic
-
Start the service:
uvicorn app:app --host 0.0.0.0 --port 8000 --reload
After starting the API, you can access the UI at:
http://localhost:8000/static/index.htmlThis provides a user-friendly interface for transcription.
POST /api/transcribe
Upload an audio file for transcription. Returns a task ID that can be used to check the status.
- Content-Type: multipart/form-data
- Body:
- file: Audio file (mp3, wav, etc.)
- options (optional): JSON string with transcription options
{
"model_size": "large-v3",
"device": "cuda",
"compute_type": "float16",
"language": "en",
"batch_size": 16,
"beam_size": 5,
"word_timestamps": true,
"vad_filter": true,
"vad_parameters": {
"min_silence_duration_ms": 500
},
"condition_on_previous_text": true,
"use_batched_mode": true
}GET /api/tasks/{task_id}
Check the status of a transcription task.
GET /api/tasks?limit=10&status=completed
List transcription tasks with optional filtering.
DELETE /api/tasks/{task_id}
Delete a transcription task and its associated data.
GET /api/health
Check if the service is running properly.
A client script is provided for testing the API:
python client.py http://localhost:8000 path/to/audio.mp3 output.txtThe service can be configured using environment variables:
PORT: Port to run the service on (default: 8000)HOST: Host to bind to (default: 0.0.0.0)MODEL_DIR: Directory to store downloaded models (default: None)
For best performance:
- Use GPU acceleration with CUDA
- Use the batched mode for faster processing
- Choose the appropriate model size for your needs:
tiny,base,small: Lower quality but fastermedium: Good balance between quality and speedlarge-v3: Highest quality but slowerdistil-large-v3: Comparable to large with faster processing time
