Releases: KoljaB/RealtimeSTT
Releases · KoljaB/RealtimeSTT
v0.3.104
RealtimeSTT 0.3.104
Features & Improvements
- New parameter: start_callback_in_new_thread
If set to True, all callback functions will be executed in a new thread.
This can be useful if the callback function is blocking and you want to avoid blocking the realtimestt application thread.
v0.3.103
RealtimeSTT 0.3.103
Features & Improvements
- Thread‑safe IPC: Introduce
SafePipe
to replacemp.Pipe
, hopefully ensuring robust inter-process communication (needs more tests). - Audio normalization: New
normalize_audio
option scales input to –0.95 dBFS for consistent transcription quality. - Callback overhaul: All event callbacks (VAD, wake‑word, turn detection, recording, realtime updates) now run asynchronously via helper threads.
- Wake word & VAD: Add
wakeword_backend
config andfaster_whisper_vad_filter
flag; improved error messages when misconfigured. - Rich metadata: Embed nanosecond‑precision timestamps in both client and server, serialized as formatted strings.
- CLI enhancements:
--faster_whisper_vad_filter
and--debug_websockets
flags give finer control over server behavior. - Testing updates: Adjusted parameters in
realtimestt_test
and added a newtype_into_textbox.py
example.
v0.3.101
RealtimeSTT 0.3.101
✨ Features & Improvements
- Enhanced Real-time Responsiveness: Real-time transcription processing now intelligently pauses immediately when VAD detects silence, reducing latency and unnecessary work before the final transcription.
- Client Connection Robustness: Using a more accurate WebSocket-based server check.
- Remote Wake Word Delay Config: Clients can now configure the
wake_word_activation_delay
on the server. - Updated OpenAI Example: Refreshed the
openai_voice_interface.py
example with the latest OpenAI API,EdgeEngine
TTS, configuration flags, and graceful shutdown.
v0.3.100
v0.3.99
RealtimeSTT 0.3.99
1. Enhanced Logging Configuration
- Introduced a dedicated named logger
realtimestt
instead of using the root logger. - Added structured logging with handlers for both console (level set by user) and file (always DEBUG).
- Logging no longer propagates to the root logger by default (
logger.propagate = False
).
2. Added possibility to disable Faster-Whisper VAD Filter
- Added
faster_whisper_vad_filter
parameter (default:True
) to enable voice activity detection (VAD) from thefaster_whisper
library. - Improves robustness against background noise at the cost of additional GPU resources.
- Integrated into both real-time and main transcription workflows.
3. Audio Worker Improvements
- Added improved, detailed debug logging for audio device initialization, sample rate handling, and resampling.
4. VAD Callback Adjustments
- fixes #215
- Moved
on_vad_detect_start
andon_vad_detect_stop
callbacks to trigger directly during voice activity checks instead of state transitions. - Ensures callbacks align more accurately with actual speech/silence events.
v0.3.98
v0.3.97
v0.3.95
v0.3.94
RealtimeSTT 0.3.94
- New Parameters for stop-method of AudioToTextRecorder:
-
backdate_stop_seconds
(float, default=0.0):- Description: Specifies the number of seconds to backdate the stop time when ending a recording.
- Usage: When invoking
stop()
due to a wake word detection or a speaker diarization change event, this parameter compensates for any latency, ensuring that only relevant audio is included in the recording and transcription.
-
backdate_resume_seconds
(float, default=0.0):- Description: Specifies the number of seconds to backdate the resume time when restarting listening after a recording has stopped.
- Usage: Typically set to the same value as
backdate_stop_seconds
, this parameter allows for fine-tuning.
-
v0.3.93
- fix for stt-server (got broken by webservers dependency upgrade because of an api change)
- added initial_prompt_realtime to AudioToTextRecorder to be able to give different prompts to final and realtime model
- added new parameters to client/server (download root, batch sizes)