Skip to content

tests : add script to benchmark whisper.cpp on LibriSpeech corpus #2999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 4, 2025

Conversation

fujimotos
Copy link
Contributor

LibriSpeech is a widely-used benchmark dataset for training and
testing speech recognition models.

This adds a set of scripts to measure the recognition accuracy of
whisper.cpp models, following the common benchmark standards.

LibriSpeech is a widely-used benchmark dataset for training and
testing speech recognition models.

This adds a set of scripts to measure the recognition accuracy of
whisper.cpp models, following the common benchmark standards.

Signed-off-by: Fujimoto Seiji <[email protected]>
@@ -0,0 +1,25 @@
Code in this directory is adapted from OpenAI Whisper project
(https://github.com/openai/whisper) and carries the following
copyright and license.
Copy link
Contributor Author

@fujimotos fujimotos Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in LICENSE, the normalizer implementation in the
tests/normalizer/ subfolder was ported from the upstream.

  • We need this to get a comparable WER score. See this notebook
    about how OpenAI evaluate their speech recognition models.

  • The reason why I commited these files to this reposjitory is to minimize the
    dependencies we need to run the benchmark script.

pip install openai-whisper requires a full PyTorch libraries, so it's heavy.

WHISPER_FLAGS = --no-prints --threads 8 --language en --output-txt
```

Check out `eval.mk` for more details.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README file describes how to perform the benchmark tests.

Confirmed to work on Ubuntu 24.04 and Amazon Linux 2023.

Copy link
Collaborator

@danbev danbev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try this out locally and take a closer look as well soon.

Feedback from Daniel Bevenius.

This adds a short code example how to prepare the `whisper-cli`
command, to make the initial setup step a little bit clearer.

Signed-off-by: Fujimoto Seiji <[email protected]>
Based on a feedback from Georgi Gerganov.

Instead of setting up a virtual environment in Makefile, let users
set up the Python environment. This is better since users may have
their own preferred workflow/toolkit.

Signed-off-by: Fujimoto Seiji <[email protected]>
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great!

There are 2 things that we can improve on:

  • The dataset seems to contain only relatively short speech segments. I think it would be good to have a dataset with a bit longer samples (i.e. a few minutes) in order to exercise the rolling window transcription that Whisper does

  • The current implementation loads and unloads the entire model for each sample. This is very inefficient. Instead, it should utilize the whisper-server to start it once and send all the samples via HTTP request. This will make the benchmark much faster.

For now we can merge and improve on these later.

@ggerganov ggerganov requested a review from danbev April 4, 2025 15:24
@ggerganov ggerganov merged commit 448f3d3 into ggml-org:master Apr 4, 2025
71 of 98 checks passed
@fujimotos fujimotos deleted the sf/librispeech branch April 5, 2025 00:39
@fujimotos
Copy link
Contributor Author

@ggerganov @danbev Thank you! I'm glad that it helps this project.

fujimotos added a commit to fujimotos/whisper.cpp that referenced this pull request Apr 20, 2025
…ml-org#2999)

* tests : add script to benchmark whisper.cpp on LibriSpeech corpus

LibriSpeech is a widely-used benchmark dataset for training and
testing speech recognition models.

This adds a set of scripts to measure the recognition accuracy of
whisper.cpp models, following the common benchmark standards.

Signed-off-by: Fujimoto Seiji <[email protected]>

* Document how to prepare `whisper-cli` and model files

Feedback from Daniel Bevenius.

This adds a short code example how to prepare the `whisper-cli`
command, to make the initial setup step a little bit clearer.

Signed-off-by: Fujimoto Seiji <[email protected]>

* tests : Simplify how to set up Python environment

Based on a feedback from Georgi Gerganov.

Instead of setting up a virtual environment in Makefile, let users
set up the Python environment. This is better since users may have
their own preferred workflow/toolkit.

Signed-off-by: Fujimoto Seiji <[email protected]>

---------

Signed-off-by: Fujimoto Seiji <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants