Skip to content

Releases: KoljaB/RealtimeSTT

v0.3.104

03 May 21:48
Compare
Choose a tag to compare

RealtimeSTT 0.3.104

Features & Improvements

  • New parameter: start_callback_in_new_thread
    If set to True, all callback functions will be executed in a new thread.
    This can be useful if the callback function is blocking and you want to avoid blocking the realtimestt application thread.

v0.3.103

19 Apr 20:36
Compare
Choose a tag to compare

RealtimeSTT 0.3.103

Features & Improvements

  • Thread‑safe IPC: Introduce SafePipe to replace mp.Pipe, hopefully ensuring robust inter-process communication (needs more tests).
  • Audio normalization: New normalize_audio option scales input to –0.95 dBFS for consistent transcription quality.
  • Callback overhaul: All event callbacks (VAD, wake‑word, turn detection, recording, realtime updates) now run asynchronously via helper threads.
  • Wake word & VAD: Add wakeword_backend config and faster_whisper_vad_filter flag; improved error messages when misconfigured.
  • Rich metadata: Embed nanosecond‑precision timestamps in both client and server, serialized as formatted strings.
  • CLI enhancements: --faster_whisper_vad_filter and --debug_websockets flags give finer control over server behavior.
  • Testing updates: Adjusted parameters in realtimestt_test and added a new type_into_textbox.py example.

v0.3.101

11 Apr 12:50
Compare
Choose a tag to compare

RealtimeSTT 0.3.101

✨ Features & Improvements

  • Enhanced Real-time Responsiveness: Real-time transcription processing now intelligently pauses immediately when VAD detects silence, reducing latency and unnecessary work before the final transcription.
  • Client Connection Robustness: Using a more accurate WebSocket-based server check.
  • Remote Wake Word Delay Config: Clients can now configure the wake_word_activation_delay on the server.
  • Updated OpenAI Example: Refreshed the openai_voice_interface.py example with the latest OpenAI API, EdgeEngine TTS, configuration flags, and graceful shutdown.

v0.3.100

23 Mar 11:03
Compare
Choose a tag to compare

RealtimeSTT 0.3.100

New VAD callbacks on_vad_start and on_vad_stop

  • triggering on VAD presence
  • reverted functionality of on_vad_detect_start, on_vad_detect_stop back to: triggered when the system starts/stops detecting for VAD presence

v0.3.99

21 Mar 19:10
Compare
Choose a tag to compare

RealtimeSTT 0.3.99

1. Enhanced Logging Configuration

  • Introduced a dedicated named logger realtimestt instead of using the root logger.
  • Added structured logging with handlers for both console (level set by user) and file (always DEBUG).
  • Logging no longer propagates to the root logger by default (logger.propagate = False).

2. Added possibility to disable Faster-Whisper VAD Filter

  • Added faster_whisper_vad_filter parameter (default: True) to enable voice activity detection (VAD) from the faster_whisper library.
  • Improves robustness against background noise at the cost of additional GPU resources.
  • Integrated into both real-time and main transcription workflows.

3. Audio Worker Improvements

  • Added improved, detailed debug logging for audio device initialization, sample rate handling, and resampling.

4. VAD Callback Adjustments

  • fixes #215
  • Moved on_vad_detect_start and on_vad_detect_stop callbacks to trigger directly during voice activity checks instead of state transitions.
  • Ensures callbacks align more accurately with actual speech/silence events.

v0.3.98

10 Mar 22:42
Compare
Choose a tag to compare

RealtimeSTT 0.3.98

  • minor fix for pypi wheel

v0.3.97

10 Mar 20:35
Compare
Choose a tag to compare

RealtimeSTT 0.3.97

v0.3.95

15 Feb 16:38
Compare
Choose a tag to compare

RealtimeSTT 0.3.95

  • better warmup (using audio file)
  • merged #200

v0.3.94

23 Jan 20:26
Compare
Choose a tag to compare

RealtimeSTT 0.3.94

  • New Parameters for stop-method of AudioToTextRecorder:
    • backdate_stop_seconds (float, default=0.0):

      • Description: Specifies the number of seconds to backdate the stop time when ending a recording.
      • Usage: When invoking stop() due to a wake word detection or a speaker diarization change event, this parameter compensates for any latency, ensuring that only relevant audio is included in the recording and transcription.
    • backdate_resume_seconds (float, default=0.0):

      • Description: Specifies the number of seconds to backdate the resume time when restarting listening after a recording has stopped.
      • Usage: Typically set to the same value as backdate_stop_seconds, this parameter allows for fine-tuning.

v0.3.93

18 Dec 18:19
Compare
Choose a tag to compare
  • fix for stt-server (got broken by webservers dependency upgrade because of an api change)
  • added initial_prompt_realtime to AudioToTextRecorder to be able to give different prompts to final and realtime model
  • added new parameters to client/server (download root, batch sizes)