Skip to content

RavenMuse/whisper-openapi

 
 

Repository files navigation

Whisper ASR OpenAI API

Whisper ASR for daily dialogue with standardized OpenAI API speech interface. Whisper ASR is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.

Quick Usage

Docker

Build by yourself

docker-compose up -d

Run with image

  • CPU
docker run -d -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest
  • GPU
docker run -d --gpus all -p 9000:9000 \
  -e ASR_MODEL=base \
  -e ASR_ENGINE=openai_whisper \
  onerahmet/openai-whisper-asr-webservice:latest-gpu

Cache

To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:

docker run -d -p 9000:9000 \
  -v $PWD/cache:/root/.cache/ \
  onerahmet/openai-whisper-asr-webservice:latest

Test

  • Download english speech media sample
wget -O test.map3 https://www.cambridgeenglish.org/images/153149-movers-sample-listening-test-vol2.mp3
  • Test Asr Api by media sample
curl http://172.25.0.2:9000/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F '[email protected]' \
  -F model="whisper-1"

Key Features

  • Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
  • Multiple output formats (text, JSON, VTT, SRT, TSV)
  • Word-level timestamps support
  • Voice activity detection (VAD) filtering
  • Speaker diarization (with WhisperX)
  • FFmpeg integration for broad audio/video format support
  • GPU acceleration support
  • Configurable model loading/unloading
  • REST API with Swagger documentation

Environment Variables

Key configuration options:

  • ASR_ENGINE: Engine selection (openai_whisper, faster_whisper, whisperx)
  • ASR_MODEL: Model selection (tiny, base, small, medium, large-v3, etc.)
  • ASR_MODEL_PATH: Custom path to store/load models
  • ASR_DEVICE: Device selection (cuda, cpu)
  • MODEL_IDLE_TIMEOUT: Timeout for model unloading

Development

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync --upgrade

# Export ffmpeg
export PATH="${PATH}:${PWD}/bin"

# Run service
uv run webservice.py --host 0.0.0.0 --port 9000

After starting the service, visit http://localhost:9000 or http://0.0.0.0:9000 in your browser to access the Swagger UI documentation and try out the API endpoints.

Credits

  • This software uses libraries from the FFmpeg project under the LGPLv2.1

About

Whisper ASR OpenAI API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.0%
  • Dockerfile 1.0%