-
Notifications
You must be signed in to change notification settings - Fork 4.2k
tests : add script to benchmark whisper.cpp on LibriSpeech corpus #2999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <[email protected]>
@@ -0,0 +1,25 @@ | |||
Code in this directory is adapted from OpenAI Whisper project | |||
(https://github.com/openai/whisper) and carries the following | |||
copyright and license. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in LICENSE
, the normalizer implementation in the
tests/normalizer/
subfolder was ported from the upstream.
-
We need this to get a comparable WER score. See this notebook
about how OpenAI evaluate their speech recognition models. -
The reason why I commited these files to this reposjitory is to minimize the
dependencies we need to run the benchmark script.
pip install openai-whisper
requires a full PyTorch libraries, so it's heavy.
WHISPER_FLAGS = --no-prints --threads 8 --language en --output-txt | ||
``` | ||
|
||
Check out `eval.mk` for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This README
file describes how to perform the benchmark tests.
Confirmed to work on Ubuntu 24.04 and Amazon Linux 2023.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try this out locally and take a closer look as well soon.
Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <[email protected]>
Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great!
There are 2 things that we can improve on:
-
The dataset seems to contain only relatively short speech segments. I think it would be good to have a dataset with a bit longer samples (i.e. a few minutes) in order to exercise the rolling window transcription that Whisper does
-
The current implementation loads and unloads the entire model for each sample. This is very inefficient. Instead, it should utilize the
whisper-server
to start it once and send all the samples via HTTP request. This will make the benchmark much faster.
For now we can merge and improve on these later.
@ggerganov @danbev Thank you! I'm glad that it helps this project. |
…ml-org#2999) * tests : add script to benchmark whisper.cpp on LibriSpeech corpus LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <[email protected]> * Document how to prepare `whisper-cli` and model files Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <[email protected]> * tests : Simplify how to set up Python environment Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <[email protected]> --------- Signed-off-by: Fujimoto Seiji <[email protected]>
LibriSpeech is a widely-used benchmark dataset for training and
testing speech recognition models.
This adds a set of scripts to measure the recognition accuracy of
whisper.cpp models, following the common benchmark standards.