Releases: ggml-org/whisper.cpp
v1.7.6
Overview
- Add initial VAD support - feedback welcome and appreciated
- Metal FA improvements
M2 Ultra
Flash Attention ON:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 1 | 7.72 | 1.05 | 0.32 | 0.01 | dc8dda6 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 1 | 8.20 | 0.98 | 0.31 | 0.01 | dc8dda6 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 1 | 8.13 | 0.99 | 0.31 | 0.01 | dc8dda6 |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 1 | 7.96 | 0.93 | 0.30 | 0.01 | dc8dda6 |
M2 ULTRA | METAL | base | 1 | 1 | 13.52 | 1.39 | 0.35 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | base-q5_0 | 1 | 1 | 14.88 | 1.31 | 0.34 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | base-q5_1 | 1 | 1 | 14.76 | 1.33 | 0.34 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | base-q8_0 | 1 | 1 | 14.04 | 1.28 | 0.34 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | small | 1 | 1 | 38.78 | 2.72 | 0.67 | 0.04 | dc8dda6 |
M2 ULTRA | METAL | small-q5_0 | 1 | 1 | 44.01 | 2.64 | 0.69 | 0.05 | dc8dda6 |
M2 ULTRA | METAL | small-q5_1 | 1 | 1 | 44.02 | 2.66 | 0.69 | 0.05 | dc8dda6 |
M2 ULTRA | METAL | small-q8_0 | 1 | 1 | 40.79 | 2.49 | 0.67 | 0.05 | dc8dda6 |
M2 ULTRA | METAL | medium | 1 | 1 | 104.48 | 5.57 | 1.61 | 0.10 | dc8dda6 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 1 | 122.24 | 5.00 | 1.58 | 0.12 | dc8dda6 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 1 | 121.99 | 5.02 | 1.59 | 0.12 | dc8dda6 |
M2 ULTRA | METAL | medium-q8_0 | 1 | 1 | 111.68 | 4.99 | 1.52 | 0.11 | dc8dda6 |
M2 ULTRA | METAL | medium-dis | 1 | 1 | 93.23 | 0.87 | 0.21 | 0.01 | dc8dda6 |
M2 ULTRA | METAL | large-v2 | 1 | 1 | 189.82 | 8.36 | 2.35 | 0.19 | dc8dda6 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 1 | 225.73 | 7.34 | 2.40 | 0.22 | dc8dda6 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 1 | 225.88 | 7.60 | 2.40 | 0.22 | dc8dda6 |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 1 | 203.55 | 7.32 | 2.26 | 0.20 | dc8dda6 |
M2 ULTRA | METAL | large-v2-dis | 1 | 1 | 168.20 | 0.98 | 0.24 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | large-v3-turbo | 1 | 1 | 170.22 | 1.46 | 0.37 | 0.03 | dc8dda6 |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 1 | 201.88 | 1.27 | 0.38 | 0.04 | dc8dda6 |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 1 | 182.37 | 1.24 | 0.36 | 0.03 | dc8dda6 |
Flash Attention OFF:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 0 | 10.15 | 1.20 | 0.36 | 0.01 | dc8dda6 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 0 | 10.21 | 1.15 | 0.39 | 0.01 | dc8dda6 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 0 | 9.26 | 1.15 | 0.38 | 0.01 | dc8dda6 |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 0 | 9.00 | 1.12 | 0.37 | 0.01 | dc8dda6 |
M2 ULTRA | METAL | base | 1 | 0 | 15.77 | 1.73 | 0.45 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | base-q5_0 | 1 | 0 | 16.90 | 1.63 | 0.44 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | base-q5_1 | 1 | 0 | 16.93 | 1.64 | 0.44 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | base-q8_0 | 1 | 0 | 16.13 | 1.63 | 0.43 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | small | 1 | 0 | 45.15 | 3.45 | 0.92 | 0.05 | dc8dda6 |
M2 ULTRA | METAL | small-q5_0 | 1 | 0 | 50.63 | 3.36 | 0.94 | 0.06 | dc8dda6 |
M2 ULTRA | METAL | small-q5_1 | 1 | 0 | 50.56 | 3.36 | 0.94 | 0.06 | dc8dda6 |
M2 ULTRA | METAL | small-q8_0 | 1 | 0 | 47.52 | 3.20 | 0.92 | 0.05 | dc8dda6 |
M2 ULTRA | METAL | medium | 1 | 0 | 122.55 | 7.38 | 1.95 | 0.12 | dc8dda6 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 0 | 140.61 | 6.73 | 2.02 | 0.14 | dc8dda6 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 0 | 140.48 | 6.76 | 2.04 | 0.14 | dc8dda6 |
M2 ULTRA | METAL | medium-q8_0 | 1 | 0 | 131.00 | 6.57 | 1.96 | 0.13 | dc8dda6 |
M2 ULTRA | METAL | medium-dis | 1 | 0 | 110.85 | 1.00 | 0.24 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | large-v2 | 1 | 0 | 222.28 | 10.96 | 3.03 | 0.21 | dc8dda6 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 0 | 258.64 | 9.79 | 3.04 | 0.25 | dc8dda6 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 0 | 258.32 | 9.87 | 3.05 | 0.24 | dc8dda6 |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 0 | 236.55 | 9.61 | 2.87 | 0.23 | dc8dda6 |
M2 ULTRA | METAL | large-v2-dis | 1 | 0 | 199.84 | 1.14 | 0.27 | 0.02 | dc8dda6 |
M2 ULTRA | METAL | large-v3-turbo | 1 | 0 | 201.52 | 1.77 | 0.45 | 0.03 | dc8dda6 |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 0 | 233.14 | 1.56 | 0.47 | 0.04 | dc8dda6 |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 0 | 214.23 | 1.53 | 0.44 | 0.04 | dc8dda6 |
What's Changed
- docs : add xcframework section to README.md [no ci] by @danbev in #2997
- sync : ggml by @ggerganov in #2992
- whisper.wasm : fix unknown language issue by @danbev in #3000
- examples : update server.py to match github pages app [no ci] by @danbev in #3004
- rename : ggerganov -> ggml-org by @ggerganov in #3005
- whisper : fix "bench-all outputs an invalid result on larger models" by @fujimotos in #3002
- tests : add script to benchmark whisper.cpp on LibriSpeech corpus by @fujimotos in #2999
- ruby : Change homepage URI in Ruby gemspec by @KitaitiMakoto in #3007
- fix dead link to models in readme by @gregsadetsky in #3006
- Update uri.rb by @Olli in #3016
- Update ruby_whisper_params.c by @Olli in #3022
- xcf : use check for visionos build version by @danbev in #3021
- Fix README.md by @ekaitz-zarraga in #3024
- docs : document how to use 'WHISPER_FFMPEG' build option by @fujimotos in #3029
- whisper : reduce delta_min from 1000ms to 100ms by @ggerganov in #3028
- support max_context api for addon.node by @buxuku in #3025
- Update README.md to note newer NVIDIA GPUs by @jeffklassen in #3031
- ruby: use CMake in build process by @KitaitiMakoto in #3043
- examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp by @fujimotos in #3038
- feat: Add no-context option to server by @sachaarbonel in #3045
- ruby : make Ruby bindings installed with build options by @KitaitiMakoto in #3056
- examples : add HEAPU8 to exported runtime methods by @danbev in #3062
- ci : disable freeBSD job in build.yml by @danbev in #3064
- coreml : set convert_to="mlprogram" in convert by @danbev in #3060
- sync : ggml by @ggerganov in #3071
- ci : enable bindings java job by @danbev in #3070
- ruby : add encoder begin callback related methods by @KitaitiMakoto in #3076
- Fix deprecated FFmpeg functions by @Podre-Henrique in #3073
- Add Moore Threads GPU support and update GitHub workflow for MUSA build by @yeahdongcn in #3069
- ci : disable publishing of java binding [no ci] by @danbev in #3086
- talk-llama : sync llama.cpp by @ggerganov in #3084
- whisper : remove empty .gitmodules file [no ci] by @danbev in #3085
- feat: expose language detection probabilities to server example by @sachaarbonel in #3044
- whisper : fix grammar advance stack warning by @danbev in #3087
- ggml : suppress Windows compiler warnings by @danbev in #3075
- make : fix samples glob pattern by @ggerganov in #3100
- ruby : ignore "Downloading" output in test_log_suppress by @danbev in #3106
- server : add --no-gpu option to print usage output by @danbev in #30...
v1.7.5
Overview
This is a relatively big update with various build and CI improvements especially for iOS and WASM. There are also some performance gains, especially for the Metal backend and probably for Arm-based devices.
Big shoutout to @danbev for stepping up and completing the maintenance roadmap for this release!
Mobile examples
All mobile examples have been refreshed. The iOS examples specifically are now much easier to build thanks to the new XCFramework workflow. This should simplify significantly integration of whisper.cpp
in 3rd party iOS and macOS apps. CoreML build and convert instructions have also been updated.
WASM examples
The WASM examples are now automatically updated on each new commit and hosted in Github Pages at
https://ggml.ai/whisper.cpp/
Problems with CORS rules should be resolved.
Some performance numbers for this release:
M2 Ultra
Flash Attention ON:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 1 | 7.82 | 1.31 | 0.35 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 1 | 8.32 | 1.28 | 0.37 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 1 | 8.21 | 1.28 | 0.37 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 1 | 7.97 | 1.23 | 0.36 | 0.01 | ad4e350 |
M2 ULTRA | METAL | base | 1 | 1 | 13.96 | 1.80 | 0.42 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_0 | 1 | 1 | 15.19 | 1.75 | 0.42 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_1 | 1 | 1 | 15.09 | 1.75 | 0.42 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q8_0 | 1 | 1 | 14.45 | 1.70 | 0.41 | 0.02 | ad4e350 |
M2 ULTRA | METAL | small | 1 | 1 | 40.08 | 3.54 | 0.86 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q5_0 | 1 | 1 | 45.07 | 3.51 | 0.88 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q5_1 | 1 | 1 | 45.05 | 3.52 | 0.88 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q8_0 | 1 | 1 | 42.04 | 3.34 | 0.85 | 0.05 | ad4e350 |
M2 ULTRA | METAL | medium | 1 | 1 | 107.20 | 7.28 | 1.79 | 0.11 | ad4e350 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 1 | 125.02 | 6.67 | 1.83 | 0.12 | ad4e350 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 1 | 124.83 | 6.70 | 1.84 | 0.12 | ad4e350 |
M2 ULTRA | METAL | medium-q8_0 | 1 | 1 | 114.56 | 6.53 | 1.79 | 0.11 | ad4e350 |
M2 ULTRA | METAL | medium-dis | 1 | 1 | 95.96 | 1.01 | 0.23 | 0.01 | ad4e350 |
M2 ULTRA | METAL | large-v2 | 1 | 1 | 194.29 | 10.57 | 2.67 | 0.20 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 1 | 230.74 | 9.57 | 2.73 | 0.23 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 1 | 229.97 | 9.69 | 2.74 | 0.23 | ad4e350 |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 1 | 208.11 | 9.37 | 2.60 | 0.21 | ad4e350 |
M2 ULTRA | METAL | large-v2-dis | 1 | 1 | 172.72 | 1.12 | 0.26 | 0.02 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo | 1 | 1 | 174.46 | 1.74 | 0.42 | 0.03 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 1 | 205.78 | 1.54 | 0.42 | 0.04 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 1 | 186.33 | 1.50 | 0.40 | 0.03 | ad4e350 |
Flash Attention OFF:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 0 | 8.74 | 1.20 | 0.36 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 0 | 10.30 | 1.15 | 0.38 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 0 | 10.71 | 1.13 | 0.38 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 0 | 9.97 | 1.12 | 0.37 | 0.01 | ad4e350 |
M2 ULTRA | METAL | base | 1 | 0 | 16.77 | 1.71 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_0 | 1 | 0 | 16.92 | 1.63 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_1 | 1 | 0 | 16.84 | 1.63 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q8_0 | 1 | 0 | 16.12 | 1.63 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | small | 1 | 0 | 45.29 | 3.44 | 0.92 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q5_0 | 1 | 0 | 50.43 | 3.34 | 0.94 | 0.06 | ad4e350 |
M2 ULTRA | METAL | small-q5_1 | 1 | 0 | 50.49 | 3.35 | 0.93 | 0.06 | ad4e350 |
M2 ULTRA | METAL | small-q8_0 | 1 | 0 | 47.37 | 3.20 | 0.91 | 0.05 | ad4e350 |
M2 ULTRA | METAL | medium | 1 | 0 | 122.81 | 7.39 | 1.99 | 0.12 | ad4e350 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 0 | 140.62 | 6.73 | 2.03 | 0.14 | ad4e350 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 0 | 140.44 | 6.74 | 2.04 | 0.14 | ad4e350 |
M2 ULTRA | METAL | medium-q8_0 | 1 | 0 | 131.05 | 6.54 | 1.95 | 0.13 | ad4e350 |
M2 ULTRA | METAL | medium-dis | 1 | 0 | 110.95 | 0.99 | 0.24 | 0.02 | ad4e350 |
M2 ULTRA | METAL | large-v2 | 1 | 0 | 222.19 | 10.93 | 3.01 | 0.21 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 0 | 258.47 | 9.75 | 3.01 | 0.25 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 0 | 258.40 | 9.85 | 3.01 | 0.24 | ad4e350 |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 0 | 236.68 | 9.61 | 2.85 | 0.23 | ad4e350 |
M2 ULTRA | METAL | large-v2-dis | 1 | 0 | 199.28 | 1.12 | 0.27 | 0.02 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo | 1 | 0 | 201.49 | 1.76 | 0.45 | 0.03 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 0 | 233.70 | 1.55 | 0.46 | 0.04 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 0 | 214.20 | 1.51 | 0.44 | 0.04 | ad4e350 |
M4 Max
Flash Attention ON:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M4 Max | METAL | tiny | 1 | 1 | 15.22 | 0.89 | 0.26 | 0.01 | ad4e350 |
M4 Max | METAL | tiny-q8_0 | 1 | 1 | 14.70 | 0.86 | 0.26 | 0.01 | ad4e350 |
M4 Max | METAL | base | 1 | 1 | 25.33 | 1.36 | 0.30 | 0.02 | ad4e350 |
M4 Max | METAL | base-q8_0 | 1 | 1 | 21.27 | 1.31 | 0.30 | 0.02 | ad4e350 |
M4 Max | METAL | small | 1 | 1 | 58.43 | 2.78 | 0.60 | 0.05 | ad4e350 |
M4 Max | METAL | small-q8_0 | 1 | 1 | 60.26 | 2.39 | 0.60 | 0.05 | ad4e350 |
M4 Max | METAL | medium | 1 | 1 | 169.73 | 6.03 | 1.31 | 0.14 | ad4e350 |
M4 Max | METAL | medium-q8_0 | 1 | 1 | 176.61 | 4.99 | 1.31 | 0.14 | ad4e350 |
M4 Max | METAL | large-v2 | 1 | 1 | 316.18 | 9.60 | 2.08 | 0.24 | ad4e350 |
M4 Max | METAL | large-v2-q8_0 | 1 | 1 | 329.59 | 7.55 | 2.08 | 0.25 | ad4e350 |
Flash Attention OFF:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M4 Max | METAL | tiny | 1 | 0 | 13.12 | 0.87 | 0.29 | 0.01 | ad4e350 |
M4 Max | METAL | tiny-q8_0 | 1 | 0 | 15.90 | 0.88 | 0.31 | 0.01 | ad4e350 |
M4 Max | METAL | base | 1 | 0 | 23.10 | 1.42 | 0.34 | 0.02 | ad4e350 |
M4 Max | METAL | base-q8_0 | 1 | 0 | 27.25 | 1.31 | 0.34 | 0.02 | ad4e350 |
M4 Max | METAL | small | 1 | 0 | 71.76 | 3.02 | 0.70 | 0.06 | ad4e350 |
M4 Max | METAL | small-q8_0 | 1 | 0 | 73.88 | 2.60 | 0.71 | 0.06 | ad4e350 |
M4 Max | METAL | medium | 1 | 0 | 208.22 | 6.94 | 1.55 | 0.16 | ad4e350 |
M4 Max | METAL | medium-q8_0 | 1 | 0 | 214.65 | 5.90 | 1.57 | 0.17 | ad4e350 |
M4 Max | METAL | large-v2 | 1 | 0 | 381.72 | 11.28 | 2.51 | 0.29 | ad4e350 |
M4 Max | METAL | large-v2-q8_0 | 1 | 0 | 394.97 | 8.90 | 2.45 | 0.30 | ad4e350 |
V100
Flash Attention ON:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
V100 | AVX2 CUDA | tiny | 8 | 1 | 4.01 | 0.90 | 0.25 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | tiny-q5_1 | 8 | 1 | 4.12 | 0.88 | 0.18 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | base | 8 | 1 | 7.00 | 1.30 | 0.35 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | base-q5_1 | 8 | 1 | 7.22 | 1.21 | 0.26 | 0.02 | ad4e350 |
V100 | AVX2 CUDA | small | 8 | 1 | 18.68 | 2.39 | 0.69 | 0.03 | ad4e350 |
V100 | AVX2 CUDA | small-q5_1 | 8 | 1 | 19.38 | 2.32 | 0.51 | 0.03 | ad4e350 |
V100 | AVX2 CUDA | medium | 8 | 1 | 53.17 | 5.15 | 1.45 | 0.06 | ad4e350 |
V100 | AVX2 CUDA | medium-q5_0 | 8 ... |
b2365
android.java : re-add ggml source updates (#2975) This commit updates the ggml source to include the new unary and binary operations. I merged https://github.com/ggerganov/whisper.cpp/pull/2958 which seems to have overwritten the changes to the ggml source which were added in https://github.com/ggerganov/whisper.cpp/pull/2972. Sorry about this.
v1.7.4
Overview
Minor release with mostly build fixes.
What's Changed
- whisper : rename binaries + fix install by @ggerganov in #2648
- feat(server): Add option to suppress non-speech tokens by @sachaarbonel in #2649
- whisper : rename suppress_non_speech_tokens to suppress_nst by @ggerganov in #2653
- feat: expose no-speech probability in segment by @sachaarbonel in #2654
- ruby : bug fix on callbacks and no_speech_prob by @KitaitiMakoto in #2656
- Add no_speech_thold to cli by @alubbe in #2663
- Add --suppress_nst support to cli by @alubbe in #2664
- ruby : Fix of C++ header guard name, model URI support, type signature and more by @KitaitiMakoto in #2683
- Enable Windows cublas build by @niksedk in #2676
- docs: replace Core ML with OpenVINO by @konosky in #2686
- rename ggml-cpu-aarch64.c to .cpp by @ego in #2687
- readme : fix real-time audio input example build instructions by @samueldurantes in #2692
- sync : ggml by @ggerganov in #2699
- cli : fix segfault on missing argument by @redzic in #2700
New Contributors
- @sachaarbonel made their first contribution in #2649
- @alubbe made their first contribution in #2663
- @niksedk made their first contribution in #2676
- @konosky made their first contribution in #2686
- @ego made their first contribution in #2687
- @samueldurantes made their first contribution in #2692
- @redzic made their first contribution in #2700
Full Changelog: v1.7.3...v1.7.4
v1.7.3
Overview
- Massive performance improvements for the Metal backend, especially for beams > 1 and for quantized models
- Reduce hallucinations during silence by @jkarthic in #2629
- Implement no_speech_thold by @jkarthic in #2625
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | Metal | tiny | 1 | 1 | 7.90 | 1.26 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_0 | 1 | 1 | 8.44 | 1.23 | 0.36 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_1 | 1 | 1 | 8.26 | 1.27 | 0.37 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q8_0 | 1 | 1 | 8.03 | 1.21 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | base | 1 | 1 | 13.77 | 1.80 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_0 | 1 | 1 | 15.02 | 1.72 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_1 | 1 | 1 | 14.93 | 1.74 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q8_0 | 1 | 1 | 14.26 | 1.68 | 0.41 | 0.02 | ed733e8 |
M2 Ultra | Metal | small | 1 | 1 | 39.76 | 3.54 | 0.85 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_0 | 1 | 1 | 45.07 | 3.47 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_1 | 1 | 1 | 44.82 | 3.49 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q8_0 | 1 | 1 | 41.79 | 3.30 | 0.84 | 0.05 | ed733e8 |
M2 Ultra | Metal | medium | 1 | 1 | 106.73 | 7.28 | 1.78 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-q5_0 | 1 | 1 | 124.43 | 6.63 | 1.83 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q5_1 | 1 | 1 | 124.19 | 6.70 | 1.84 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q8_0 | 1 | 1 | 113.88 | 6.52 | 1.75 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-dis | 1 | 1 | 94.97 | 0.97 | 0.22 | 0.01 | ed733e8 |
M2 Ultra | Metal | large-v2 | 1 | 1 | 193.33 | 10.53 | 2.65 | 0.20 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_0 | 1 | 1 | 229.22 | 9.52 | 2.72 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_1 | 1 | 1 | 229.40 | 9.62 | 2.73 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q8_0 | 1 | 1 | 207.30 | 9.36 | 2.59 | 0.21 | ed733e8 |
M2 Ultra | Metal | large-v2-dis | 1 | 1 | 171.43 | 1.09 | 0.25 | 0.02 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo | 1 | 1 | 173.45 | 1.73 | 0.41 | 0.03 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q5_0 | 1 | 1 | 205.52 | 1.52 | 0.42 | 0.04 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q8_0 | 1 | 1 | 185.90 | 1.48 | 0.40 | 0.03 | ed733e8 |
What's Changed
- sync : ggml by @ggerganov in #2573
- ruby : Follow source tree change by @KitaitiMakoto in #2580
- Add
q8_0
models todownload-ggml-model.sh
by @mrienstra in #2589 - ruby : Add low-level methods to transcribe by @KitaitiMakoto in #2585
- sync : ggml by @ggerganov in #2608
- ruby : Sync whisper.cpp and model download feature by @KitaitiMakoto in #2617
- Fix typo in
download-ggml-model.sh
by @mrienstra in #2623 - Add Missing Include Directory for ggml-cpu in whisper.android CMakeLists by @Thamster in #2624
- fix: prevent division by zero in soft_max vulkan shader by @gn64 in #2633
- cmake : fix "amd64" processor string by @ggerganov in #2638
- Fix typo in Java Binding README by @crummyh in #2637
- Fix hallucinations during silence by @jkarthic in #2629
- Implement no_speech_thold by @jkarthic in #2625
- Improve consistency in stream exameple README commands by @crummyh in #2642
- ruby : Add no_speech_thold by @KitaitiMakoto in #2641
- sync : ggml by @ggerganov in #2639
- ci : msys enable SDL2 build by @ggerganov in #2635
New Contributors
- @Thamster made their first contribution in #2624
- @gn64 made their first contribution in #2633
- @crummyh made their first contribution in #2637
- @jkarthic made their first contribution in #2629
Full Changelog: v1.7.2...v1.7.3
v1.7.3-pre
Overview
Massive performance improvements for the Metal backend, especially for beams > 1. Especially for quantized models.
Setting as "pre-release" since there have been major changes to the build system (now using CMake) and I wan't to gather some feedback about how well the project builds now on various platforms. Please leave comments in the discussion to help fix any remaining issues before the official release.
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | Metal | tiny | 1 | 1 | 7.90 | 1.26 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_0 | 1 | 1 | 8.44 | 1.23 | 0.36 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_1 | 1 | 1 | 8.26 | 1.27 | 0.37 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q8_0 | 1 | 1 | 8.03 | 1.21 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | base | 1 | 1 | 13.77 | 1.80 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_0 | 1 | 1 | 15.02 | 1.72 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_1 | 1 | 1 | 14.93 | 1.74 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q8_0 | 1 | 1 | 14.26 | 1.68 | 0.41 | 0.02 | ed733e8 |
M2 Ultra | Metal | small | 1 | 1 | 39.76 | 3.54 | 0.85 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_0 | 1 | 1 | 45.07 | 3.47 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_1 | 1 | 1 | 44.82 | 3.49 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q8_0 | 1 | 1 | 41.79 | 3.30 | 0.84 | 0.05 | ed733e8 |
M2 Ultra | Metal | medium | 1 | 1 | 106.73 | 7.28 | 1.78 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-q5_0 | 1 | 1 | 124.43 | 6.63 | 1.83 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q5_1 | 1 | 1 | 124.19 | 6.70 | 1.84 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q8_0 | 1 | 1 | 113.88 | 6.52 | 1.75 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-dis | 1 | 1 | 94.97 | 0.97 | 0.22 | 0.01 | ed733e8 |
M2 Ultra | Metal | large-v2 | 1 | 1 | 193.33 | 10.53 | 2.65 | 0.20 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_0 | 1 | 1 | 229.22 | 9.52 | 2.72 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_1 | 1 | 1 | 229.40 | 9.62 | 2.73 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q8_0 | 1 | 1 | 207.30 | 9.36 | 2.59 | 0.21 | ed733e8 |
M2 Ultra | Metal | large-v2-dis | 1 | 1 | 171.43 | 1.09 | 0.25 | 0.02 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo | 1 | 1 | 173.45 | 1.73 | 0.41 | 0.03 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q5_0 | 1 | 1 | 205.52 | 1.52 | 0.42 | 0.04 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q8_0 | 1 | 1 | 185.90 | 1.48 | 0.40 | 0.03 | ed733e8 |
What's Changed
- sync : ggml by @ggerganov in #2573
- ruby : Follow source tree change by @KitaitiMakoto in #2580
- Add
q8_0
models todownload-ggml-model.sh
by @mrienstra in #2589 - ruby : Add low-level methods to transcribe by @KitaitiMakoto in #2585
- sync : ggml by @ggerganov in #2608
Full Changelog: v1.7.2...v1.7.3-pre
v1.7.2
Overview
- Various improvements in the Metal backend
- Fix extra memory usage for large samples
- Remove limit for
ggml_context
(i.e. more beams and processors are supported)
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 1 | 9.51 | 1.39 | 0.41 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 1 | 9.57 | 1.41 | 0.42 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 1 | 8.74 | 1.39 | 0.42 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q8_0 | 1 | 1 | 8.36 | 1.33 | 0.41 | 0.01 | 83ac284 |
M2 Ultra | METAL | base | 1 | 1 | 14.27 | 1.90 | 0.63 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q5_0 | 1 | 1 | 15.50 | 1.90 | 0.65 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q5_1 | 1 | 1 | 15.67 | 1.88 | 0.65 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q8_0 | 1 | 1 | 14.69 | 1.81 | 0.63 | 0.02 | 83ac284 |
M2 Ultra | METAL | small | 1 | 1 | 40.85 | 3.77 | 1.43 | 0.05 | 83ac284 |
M2 Ultra | METAL | small-q5_0 | 1 | 1 | 45.99 | 3.90 | 1.52 | 0.05 | 83ac284 |
M2 Ultra | METAL | small-q5_1 | 1 | 1 | 46.19 | 3.83 | 1.50 | 0.06 | 83ac284 |
M2 Ultra | METAL | small-q8_0 | 1 | 1 | 42.90 | 3.65 | 1.46 | 0.05 | 83ac284 |
M2 Ultra | METAL | medium | 1 | 1 | 109.01 | 7.59 | 3.24 | 0.11 | 83ac284 |
M2 Ultra | METAL | medium-q5_0 | 1 | 1 | 126.78 | 7.55 | 3.45 | 0.13 | 83ac284 |
M2 Ultra | METAL | medium-q5_1 | 1 | 1 | 127.71 | 7.39 | 3.43 | 0.13 | 83ac284 |
M2 Ultra | METAL | medium-q8_0 | 1 | 1 | 115.97 | 7.21 | 3.35 | 0.12 | 83ac284 |
M2 Ultra | METAL | medium-dis | 1 | 1 | 97.74 | 1.06 | 0.36 | 0.01 | 83ac284 |
M2 Ultra | METAL | large-v2 | 1 | 1 | 196.99 | 11.29 | 5.06 | 0.20 | 83ac284 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 1 | 233.88 | 10.83 | 5.56 | 0.24 | 83ac284 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 1 | 234.03 | 10.73 | 5.46 | 0.24 | 83ac284 |
M2 Ultra | METAL | large-v2-q8_0 | 1 | 1 | 210.83 | 10.29 | 5.23 | 0.22 | 83ac284 |
M2 Ultra | METAL | large-v2-dis | 1 | 1 | 175.37 | 1.18 | 0.42 | 0.02 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo | 1 | 1 | 177.35 | 1.85 | 0.73 | 0.03 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 1 | 209.31 | 1.69 | 0.80 | 0.04 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo-q8_0 | 1 | 1 | 189.55 | 1.64 | 0.75 | 0.03 | 83ac284 |
What's Changed
- Added OpenVino init on state by @sandrohanea in #2464
- Updating the Quick start by @stsfaroz in #2475
- max_length from max_target_positions by @CrispStrobe in #2477
- Add dtw preset for large-v3-turbo by @rotemdan in #2481
- make : fix GGML_VULKAN=1 build by @ggerganov in #2485
- Add Vulkan notice in README.md by @toboil-features in #2488
- Fix Ruby binding building by @KitaitiMakoto in #2484
- Update of README.md by @toboil-features in #2489
- whisper: fix index overflow by @Josscii in #2505
- ruby : Add Metal support by @KitaitiMakoto in #2516
- ruby: New segment callback by @KitaitiMakoto in #2506
- ruby : add more APIs by @KitaitiMakoto in #2518
- ruby: fix installation test by @KitaitiMakoto in #2519
- When DTW timestamps are enabled, defer new_segment_callback until after DTW compute step by @jettoblack in #2515
- ci : fix openblas build by @ggerganov in #2511
- whisper : reduce ggml_context usage by @ggerganov in #2525
- sync : ggml by @ggerganov in #2528
- passing samples_padded by ref to the threads. by @vinmisra in #2534
- fix ffmpeg v5 build by @stsydow in #2543
- fix: ggml-vulkan logs by @thewh1teagle in #2547
- Fix the instructions on the Ruby binding by @wilsonsilva in #2548
- whisper.swiftui : add model download list & bench methods by @jhen0409 in #2546
- ruby : Add more API by @KitaitiMakoto in #2551
- Fix building workflow for linux/arm64 container by @rai62 in #2555
- sync : ggml by @ggerganov in #2561
- whisper.swiftui : switch Mac dest to Mac (Designed for iPad) by @jhen0409 in #2562
- ci : use local ggml by @ggerganov in #2567
- sycl: fix example build by @stsydow in #2570
New Contributors
- @stsfaroz made their first contribution in #2475
- @CrispStrobe made their first contribution in #2477
- @toboil-features made their first contribution in #2488
- @KitaitiMakoto made their first contribution in #2484
- @Josscii made their first contribution in #2505
- @jettoblack made their first contribution in #2515
- @vinmisra made their first contribution in #2534
- @stsydow made their first contribution in #2543
- @wilsonsilva made their first contribution in #2548
- @rai62 made their first contribution in #2555
Full Changelog: v1.7.1...v1.7.2
v1.7.2-pre
Overview
This is a pre-release since I think there have been some reports about memory leaks which I haven't had the time to investigate and confirm. If these are resolved in the next days, will add them to the official 1.7.2
release next week.
- Various improvements in the Metal backend
- Fix extra memory usage for large samples
- Remove limit for
ggml_context
(i.e. more beams and processors are supported)
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 1 | 9.51 | 1.39 | 0.41 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 1 | 9.57 | 1.41 | 0.42 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 1 | 8.74 | 1.39 | 0.42 | 0.01 | 83ac284 |
M2 Ultra | METAL | tiny-q8_0 | 1 | 1 | 8.36 | 1.33 | 0.41 | 0.01 | 83ac284 |
M2 Ultra | METAL | base | 1 | 1 | 14.27 | 1.90 | 0.63 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q5_0 | 1 | 1 | 15.50 | 1.90 | 0.65 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q5_1 | 1 | 1 | 15.67 | 1.88 | 0.65 | 0.02 | 83ac284 |
M2 Ultra | METAL | base-q8_0 | 1 | 1 | 14.69 | 1.81 | 0.63 | 0.02 | 83ac284 |
M2 Ultra | METAL | small | 1 | 1 | 40.85 | 3.77 | 1.43 | 0.05 | 83ac284 |
M2 Ultra | METAL | small-q5_0 | 1 | 1 | 45.99 | 3.90 | 1.52 | 0.05 | 83ac284 |
M2 Ultra | METAL | small-q5_1 | 1 | 1 | 46.19 | 3.83 | 1.50 | 0.06 | 83ac284 |
M2 Ultra | METAL | small-q8_0 | 1 | 1 | 42.90 | 3.65 | 1.46 | 0.05 | 83ac284 |
M2 Ultra | METAL | medium | 1 | 1 | 109.01 | 7.59 | 3.24 | 0.11 | 83ac284 |
M2 Ultra | METAL | medium-q5_0 | 1 | 1 | 126.78 | 7.55 | 3.45 | 0.13 | 83ac284 |
M2 Ultra | METAL | medium-q5_1 | 1 | 1 | 127.71 | 7.39 | 3.43 | 0.13 | 83ac284 |
M2 Ultra | METAL | medium-q8_0 | 1 | 1 | 115.97 | 7.21 | 3.35 | 0.12 | 83ac284 |
M2 Ultra | METAL | medium-dis | 1 | 1 | 97.74 | 1.06 | 0.36 | 0.01 | 83ac284 |
M2 Ultra | METAL | large-v2 | 1 | 1 | 196.99 | 11.29 | 5.06 | 0.20 | 83ac284 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 1 | 233.88 | 10.83 | 5.56 | 0.24 | 83ac284 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 1 | 234.03 | 10.73 | 5.46 | 0.24 | 83ac284 |
M2 Ultra | METAL | large-v2-q8_0 | 1 | 1 | 210.83 | 10.29 | 5.23 | 0.22 | 83ac284 |
M2 Ultra | METAL | large-v2-dis | 1 | 1 | 175.37 | 1.18 | 0.42 | 0.02 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo | 1 | 1 | 177.35 | 1.85 | 0.73 | 0.03 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 1 | 209.31 | 1.69 | 0.80 | 0.04 | 83ac284 |
M2 Ultra | METAL | large-v3-turbo-q8_0 | 1 | 1 | 189.55 | 1.64 | 0.75 | 0.03 | 83ac284 |
What's Changed
- Added OpenVino init on state by @sandrohanea in #2464
- Updating the Quick start by @stsfaroz in #2475
- max_length from max_target_positions by @CrispStrobe in #2477
- Add dtw preset for large-v3-turbo by @rotemdan in #2481
- make : fix GGML_VULKAN=1 build by @ggerganov in #2485
- Add Vulkan notice in README.md by @toboil-features in #2488
- Fix Ruby binding building by @KitaitiMakoto in #2484
- Update of README.md by @toboil-features in #2489
- whisper: fix index overflow by @Josscii in #2505
- ruby : Add Metal support by @KitaitiMakoto in #2516
- ruby: New segment callback by @KitaitiMakoto in #2506
- ruby : add more APIs by @KitaitiMakoto in #2518
- ruby: fix installation test by @KitaitiMakoto in #2519
- When DTW timestamps are enabled, defer new_segment_callback until after DTW compute step by @jettoblack in #2515
- ci : fix openblas build by @ggerganov in #2511
- whisper : reduce ggml_context usage by @ggerganov in #2525
- sync : ggml by @ggerganov in #2528
- passing samples_padded by ref to the threads. by @vinmisra in #2534
- fix ffmpeg v5 build by @stsydow in #2543
- fix: ggml-vulkan logs by @thewh1teagle in #2547
- Fix the instructions on the Ruby binding by @wilsonsilva in #2548
- whisper.swiftui : add model download list & bench methods by @jhen0409 in #2546
- ruby : Add more API by @KitaitiMakoto in #2551
- Fix building workflow for linux/arm64 container by @rai62 in #2555
- sync : ggml by @ggerganov in #2561
- whisper.swiftui : switch Mac dest to Mac (Designed for iPad) by @jhen0409 in #2562
New Contributors
- @stsfaroz made their first contribution in #2475
- @CrispStrobe made their first contribution in #2477
- @toboil-features made their first contribution in #2488
- @KitaitiMakoto made their first contribution in #2484
- @Josscii made their first contribution in #2505
- @jettoblack made their first contribution in #2515
- @vinmisra made their first contribution in #2534
- @stsydow made their first contribution in #2543
- @wilsonsilva made their first contribution in #2548
- @rai62 made their first contribution in #2555
Full Changelog: v1.7.1...v1.7.2-pre
v1.7.1
Overview
- Fix Vulkan crashes
- Performance stats for Vulkan on RTX 2060
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | VULKAN | tiny | 1 | 0 | 30.38 | 1.37 | 1.04 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_0 | 1 | 0 | 20.98 | 1.38 | 0.99 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_1 | 1 | 0 | 20.74 | 1.30 | 0.96 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | base | 1 | 0 | 44.69 | 1.59 | 1.78 | 0.09 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_0 | 1 | 0 | 39.72 | 2.11 | 1.72 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_1 | 1 | 0 | 39.45 | 2.01 | 1.63 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | small | 1 | 0 | 160.02 | 3.53 | 4.64 | 0.23 | 9f346d0 |
RTX 2060 | VULKAN | small-q5_0 | 1 | 0 | 141.52 | 4.54 | 4.44 | 0.20 | 9f346d0 |
RTX 2060 | VULKAN | small-q5_1 | 1 | 0 | 141.03 | 4.63 | 4.18 | 0.20 | 9f346d0 |
RTX 2060 | VULKAN | medium | 1 | 0 | 472.66 | 7.55 | 11.35 | 0.56 | 9f346d0 |
RTX 2060 | VULKAN | medium-q5_0 | 1 | 0 | 395.55 | 9.81 | 10.64 | 0.49 | 9f346d0 |
RTX 2060 | VULKAN | medium-q5_1 | 1 | 0 | 398.85 | 10.16 | 10.15 | 0.50 | 9f346d0 |
RTX 2060 | VULKAN | medium-dis | 1 | 0 | 427.26 | 1.26 | 1.20 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | large-v2 | 1 | 0 | 924.60 | 12.36 | 18.56 | 1.01 | 9f346d0 |
RTX 2060 | VULKAN | large-v2-q5_0 | 1 | 0 | 774.21 | 17.25 | 17.17 | 0.85 | 9f346d0 |
RTX 2060 | VULKAN | large-v2-q5_1 | 1 | 0 | 779.75 | 17.44 | 16.27 | 0.85 | 9f346d0 |
RTX 2060 | VULKAN | large-v2-dis | 1 | 0 | 833.35 | 1.38 | 1.56 | 0.10 | 9f346d0 |
RTX 2060 | VULKAN | large-v3-turbo | 1 | 0 | 839.90 | 2.11 | 2.70 | 0.16 | 9f346d0 |
RTX 2060 | VULKAN | large-v3-turbo-q5_0 | 1 | 0 | 705.49 | 3.22 | 2.53 | 0.14 | 9f346d0 |
What's Changed
- Retry allocation with fallback flags by @SRHMorris in #2451
New Contributors
- @SRHMorris made their first contribution in #2451
Full Changelog: v1.7.0...v1.7.1
Binaries
https://github.com/ggerganov/whisper.cpp/actions/runs/11213279590
v1.7.0
Overview
- Fix crashes with high number of beams
- Reduce overal VRAM usage
- Optimize Encoder performance
Some performance numbers for this release:
M2 Ultra
Flash Attention ON:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 1 | 8.37 | 1.44 | 0.48 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 1 | 9.81 | 1.46 | 0.50 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 1 | 8.80 | 1.47 | 0.50 | 0.01 | 6a94163 |
M2 Ultra | METAL | base | 1 | 1 | 16.11 | 1.96 | 0.74 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_0 | 1 | 1 | 16.38 | 1.99 | 0.78 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_1 | 1 | 1 | 16.72 | 2.00 | 0.77 | 0.02 | 6a94163 |
M2 Ultra | METAL | small | 1 | 1 | 41.26 | 3.88 | 1.66 | 0.05 | 6a94163 |
M2 Ultra | METAL | small-q5_0 | 1 | 1 | 46.91 | 4.02 | 1.76 | 0.06 | 6a94163 |
M2 Ultra | METAL | small-q5_1 | 1 | 1 | 47.05 | 4.00 | 1.73 | 0.06 | 6a94163 |
M2 Ultra | METAL | medium | 1 | 1 | 111.29 | 7.79 | 3.63 | 0.11 | 6a94163 |
M2 Ultra | METAL | medium-q5_0 | 1 | 1 | 129.78 | 7.71 | 3.85 | 0.13 | 6a94163 |
M2 Ultra | METAL | medium-q5_1 | 1 | 1 | 129.29 | 7.71 | 3.87 | 0.13 | 6a94163 |
M2 Ultra | METAL | medium-dis | 1 | 1 | 99.27 | 1.09 | 0.43 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v2 | 1 | 1 | 198.81 | 11.54 | 5.59 | 0.20 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 1 | 236.18 | 11.12 | 6.11 | 0.24 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 1 | 235.88 | 11.14 | 6.01 | 0.24 | 6a94163 |
M2 Ultra | METAL | large-v2-dis | 1 | 1 | 177.41 | 1.21 | 0.48 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo | 1 | 1 | 178.92 | 1.89 | 0.83 | 0.03 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 1 | 211.44 | 1.73 | 0.90 | 0.04 | 6a94163 |
Flash Attention OFF:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 0 | 10.04 | 1.37 | 0.50 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 0 | 10.02 | 1.36 | 0.53 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 0 | 11.08 | 1.37 | 0.53 | 0.01 | 6a94163 |
M2 Ultra | METAL | base | 1 | 0 | 17.84 | 1.93 | 0.77 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_0 | 1 | 0 | 18.57 | 1.92 | 0.81 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_1 | 1 | 0 | 18.66 | 1.93 | 0.82 | 0.02 | 6a94163 |
M2 Ultra | METAL | small | 1 | 0 | 48.26 | 3.95 | 1.73 | 0.05 | 6a94163 |
M2 Ultra | METAL | small-q5_0 | 1 | 0 | 53.68 | 3.99 | 1.85 | 0.06 | 6a94163 |
M2 Ultra | METAL | small-q5_1 | 1 | 0 | 53.86 | 4.00 | 1.82 | 0.06 | 6a94163 |
M2 Ultra | METAL | medium | 1 | 0 | 130.09 | 8.01 | 3.82 | 0.13 | 6a94163 |
M2 Ultra | METAL | medium-q5_0 | 1 | 0 | 148.18 | 7.92 | 4.11 | 0.14 | 6a94163 |
M2 Ultra | METAL | medium-q5_1 | 1 | 0 | 147.95 | 7.94 | 4.11 | 0.14 | 6a94163 |
M2 Ultra | METAL | medium-dis | 1 | 0 | 116.97 | 1.11 | 0.42 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v2 | 1 | 0 | 232.43 | 12.34 | 5.87 | 0.22 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 0 | 269.72 | 11.68 | 6.44 | 0.26 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 0 | 269.71 | 11.82 | 6.36 | 0.26 | 6a94163 |
M2 Ultra | METAL | large-v2-dis | 1 | 0 | 209.25 | 1.25 | 0.48 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo | 1 | 0 | 211.09 | 1.98 | 0.84 | 0.03 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 0 | 244.23 | 1.81 | 0.92 | 0.04 | 6a94163 |
Ryzen 9 5950X + RTX 2060
Flash Attention ON:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | AVX2 CUDA | tiny | 1 | 1 | 7.35 | 0.78 | 0.24 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_0 | 1 | 1 | 6.45 | 0.67 | 0.14 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_1 | 1 | 1 | 6.39 | 0.66 | 0.14 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | base | 1 | 1 | 10.20 | 0.88 | 0.30 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_0 | 1 | 1 | 11.38 | 0.92 | 0.21 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_1 | 1 | 1 | 11.76 | 0.91 | 0.20 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | small | 1 | 1 | 33.06 | 2.00 | 0.56 | 0.03 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_0 | 1 | 1 | 35.84 | 1.84 | 0.43 | 0.04 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_1 | 1 | 1 | 36.89 | 1.82 | 0.42 | 0.04 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium | 1 | 1 | 90.65 | 4.54 | 1.13 | 0.08 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_0 | 1 | 1 | 104.01 | 3.80 | 0.91 | 0.10 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_1 | 1 | 1 | 107.98 | 3.72 | 0.87 | 0.10 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-dis | 1 | 1 | 79.08 | 0.68 | 0.17 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2 | 1 | 1 | 162.00 | 7.52 | 1.92 | 0.14 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_0 | 1 | 1 | 184.59 | 5.64 | 1.50 | 0.16 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_1 | 1 | 1 | 193.85 | 5.55 | 1.44 | 0.17 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-dis | 1 | 1 | 140.75 | 0.84 | 0.37 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo | 1 | 1 | 143.38 | 1.29 | 0.36 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo-q5_0 | 1 | 1 | 163.30 | 0.93 | 0.28 | 0.03 | 6a94163 |
Flash Attention OFF:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | AVX2 CUDA | tiny | 1 | 0 | 12.49 | 0.87 | 0.23 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_0 | 1 | 0 | 10.65 | 0.78 | 0.19 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_1 | 1 | 0 | 10.82 | 0.77 | 0.19 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base | 1 | 0 | 18.97 | 1.04 | 0.34 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_0 | 1 | 0 | 20.22 | 1.09 | 0.27 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_1 | 1 | 0 | 20.48 | 1.07 | 0.27 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | small | 1 | 0 | 59.52 | 2.37 | 0.70 | 0.05 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_0 | 1 | 0 | 62.98 | 2.23 | 0.60 | 0.06 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_1 | 1 | 0 | 63.64 | 2.21 | 0.59 | 0.06 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium | 1 | 0 | 161.53 | 5.36 | 1.53 | 0.13 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_0 | 1 | 0 | 174.96 | 4.64 | 1.32 | 0.15 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_1 | 1 | 0 | 178.42 | 4.57 | 1.29 | 0.15 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-dis | 1 | 0 | 149.65 | 0.75 | 0.20 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2 | 1 | 0 | 280.55 | 8.74 | 2.51 | 0.23 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_0 | 1 | 0 | 306.87 | 6.92 | 2.08 | 0.25 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_1 | 1 | 0 | 314.25 | 6.82 | 2.02 | 0.26 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-dis | 1 | 0 | 259.39 | 0.91 | 0.37 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo | 1 | 0 | 261.83 | 1.44 | 0.41 | 0.04 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo-q5_0 | 1 | 0 | 282.99 | 1.09 | 0.33 | 0.04 | 6a94163 |
Vulkan:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | VULKAN | tiny | 1 | 0 | 30.38 | 1.37 | 1.04 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_0 | 1 | 0 | 20.98 | 1.38 | 0.99 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_1 | 1 | 0 | 20.74 | 1.30 | 0.96 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | base | 1 | 0 | 44.69 | 1.59 | 1.78 | 0.09 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_0 | 1 | 0 | 39.72 | 2.11 | 1.72 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_1 | 1 | 0 | 39.45 | 2.01 | 1.63 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | small | 1 | 0 | 160.02 | 3.53 | 4.64 | 0.23 | 9f346d0 |
RTX 2060 | VULKAN | small-q5_0 | 1 | 0 | 141.52 | 4.54 | 4.44 | 0.20 | 9f346d0 |
RTX 2060 | VULKA... |