Skip to content

Conversation

@artshcherbina
Copy link

@artshcherbina artshcherbina commented Dec 16, 2023

Thank you for you great work!

I've added some simple logic to detect silence, and process only real voice input.
I start transcription, if more then 1 second of silence is passed (You may need to tune --pressure-t to your microphone).
You may tune silence duration with --silence_t argument.
Transcribed text is copied to the clipboard (currenly only on linux, xclip is used).
The code can be further improved, if it seems reasonable.

----------
----------
----------
----------
9---------
99---------
99---------
3992---------
93992---------
93999---------
994999---------
9994999---------
69992994---------
769992994---------
9-9995999---------
-9-9996999---------
--9-9996999---------
--795999-995---------
---694999-995---------
----694999-996---------
-----929998999---------
------929999899---------
-------929999799---------
-------593999-998---------
--------493999-999---------
---------929999699---------
 Here is some English text
----------
----------
----------
----------
----------
----------
6---------
96---------
99---------
999---------
2997---------
9-997---------
99-997---------
7999992---------
-6998992---------
9-6998992---------
92-992998---------
492-993999---------
799-5997993---------
6799-5997993---------
59492-994999---------
459493-995999---------
4459493-995999---------
-755899-4995994---------
2-755889-4995994---------
-2-755989-4995994---------
--33559594-997999---------
---33559594-998999---------
---22754979-3994995---------
----22754979-3993995---------
------2754969-3993995---------
------32659596-999999---------
-------32658596-999999---------
--------32658696-899999---------
---------3744959-2992996---------
 During silence nothing is detected.
----------
----------
----------
----------

@artshcherbina artshcherbina changed the title auto transcription implemented silence removal for transcription implemented Dec 16, 2023
@jensdraht1999
Copy link

@artshcherbina Hello, I think, this is good, but I wanted to asked, if this can be implemented also to be in the .srt file and so on? And if yes, what happens, does the timestamps correctly align with the original file?

I implemented something similar in Windows via ffmpeg and silence detection.

  1. Split the audio in 1 second samples.
    2.) If the audio is too low (50db), then regard it as silence.
    3.) Replace all audio samples, that are regarded as silent with a sound sample (for example white noise / or any other voice or silent audio)

However, this did not improve anything unfortunately.

The sad thing is, that Whisper is trying to transcribe literally everything.

Do you any audio samples, I could try to test out with my script, I have a few scripts, which I would like to test out and share with the community, if they are working good or better.

@artshcherbina
Copy link
Author

Hello, @jensdraht1999 .
I haven't worked with .srt files.
Currently, I don't have any samples.
You may record some samples with your mic for testing

@jensdraht1999
Copy link

Hello, @jensdraht1999 . I haven't worked with .srt files. Currently, I don't have any samples. You may record some samples with your mic for testing

I wonder, if you could resolve the rest of the issues and then merge it.

@artshcherbina
Copy link
Author

My changes break some default behavior.
So I have no plan to merge this PR now.

@artshcherbina artshcherbina marked this pull request as draft September 25, 2024 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants