Releases · ggml-org/whisper.cpp

25 Jun 13:50

ggerganov

v1.7.6

a8d002c

v1.7.6 Latest

Latest

Overview

Add initial VAD support - feedback welcome and appreciated
Metal FA improvements

M2 Ultra

Flash Attention ON:

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 ULTRA	METAL	tiny	1	1	7.72	1.05	0.32	0.01	`dc8dda6`
M2 ULTRA	METAL	tiny-q5_0	1	1	8.20	0.98	0.31	0.01	`dc8dda6`
M2 ULTRA	METAL	tiny-q5_1	1	1	8.13	0.99	0.31	0.01	`dc8dda6`
M2 ULTRA	METAL	tiny-q8_0	1	1	7.96	0.93	0.30	0.01	`dc8dda6`
M2 ULTRA	METAL	base	1	1	13.52	1.39	0.35	0.02	`dc8dda6`
M2 ULTRA	METAL	base-q5_0	1	1	14.88	1.31	0.34	0.02	`dc8dda6`
M2 ULTRA	METAL	base-q5_1	1	1	14.76	1.33	0.34	0.02	`dc8dda6`
M2 ULTRA	METAL	base-q8_0	1	1	14.04	1.28	0.34	0.02	`dc8dda6`
M2 ULTRA	METAL	small	1	1	38.78	2.72	0.67	0.04	`dc8dda6`
M2 ULTRA	METAL	small-q5_0	1	1	44.01	2.64	0.69	0.05	`dc8dda6`
M2 ULTRA	METAL	small-q5_1	1	1	44.02	2.66	0.69	0.05	`dc8dda6`
M2 ULTRA	METAL	small-q8_0	1	1	40.79	2.49	0.67	0.05	`dc8dda6`
M2 ULTRA	METAL	medium	1	1	104.48	5.57	1.61	0.10	`dc8dda6`
M2 ULTRA	METAL	medium-q5_0	1	1	122.24	5.00	1.58	0.12	`dc8dda6`
M2 ULTRA	METAL	medium-q5_1	1	1	121.99	5.02	1.59	0.12	`dc8dda6`
M2 ULTRA	METAL	medium-q8_0	1	1	111.68	4.99	1.52	0.11	`dc8dda6`
M2 ULTRA	METAL	medium-dis	1	1	93.23	0.87	0.21	0.01	`dc8dda6`
M2 ULTRA	METAL	large-v2	1	1	189.82	8.36	2.35	0.19	`dc8dda6`
M2 ULTRA	METAL	large-v2-q5_0	1	1	225.73	7.34	2.40	0.22	`dc8dda6`
M2 ULTRA	METAL	large-v2-q5_1	1	1	225.88	7.60	2.40	0.22	`dc8dda6`
M2 ULTRA	METAL	large-v2-q8_0	1	1	203.55	7.32	2.26	0.20	`dc8dda6`
M2 ULTRA	METAL	large-v2-dis	1	1	168.20	0.98	0.24	0.02	`dc8dda6`
M2 ULTRA	METAL	large-v3-turbo	1	1	170.22	1.46	0.37	0.03	`dc8dda6`
M2 ULTRA	METAL	large-v3-turbo-q5_0	1	1	201.88	1.27	0.38	0.04	`dc8dda6`
M2 ULTRA	METAL	large-v3-turbo-q8_0	1	1	182.37	1.24	0.36	0.03	`dc8dda6`

Flash Attention OFF:

CPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M2 ULTRA	METAL	tiny	1	10.15	1.20	0.36	0.01	`dc8dda6`
M2 ULTRA	METAL	tiny-q5_0	1	10.21	1.15	0.39	0.01	`dc8dda6`
M2 ULTRA	METAL	tiny-q5_1	1	9.26	1.15	0.38	0.01	`dc8dda6`
M2 ULTRA	METAL	tiny-q8_0	1	9.00	1.12	0.37	0.01	`dc8dda6`
M2 ULTRA	METAL	base	1	15.77	1.73	0.45	0.02	`dc8dda6`
M2 ULTRA	METAL	base-q5_0	1	16.90	1.63	0.44	0.02	`dc8dda6`
M2 ULTRA	METAL	base-q5_1	1	16.93	1.64	0.44	0.02	`dc8dda6`
M2 ULTRA	METAL	base-q8_0	1	16.13	1.63	0.43	0.02	`dc8dda6`
M2 ULTRA	METAL	small	1	45.15	3.45	0.92	0.05	`dc8dda6`
M2 ULTRA	METAL	small-q5_0	1	50.63	3.36	0.94	0.06	`dc8dda6`
M2 ULTRA	METAL	small-q5_1	1	50.56	3.36	0.94	0.06	`dc8dda6`
M2 ULTRA	METAL	small-q8_0	1	47.52	3.20	0.92	0.05	`dc8dda6`
M2 ULTRA	METAL	medium	1	122.55	7.38	1.95	0.12	`dc8dda6`
M2 ULTRA	METAL	medium-q5_0	1	140.61	6.73	2.02	0.14	`dc8dda6`
M2 ULTRA	METAL	medium-q5_1	1	140.48	6.76	2.04	0.14	`dc8dda6`
M2 ULTRA	METAL	medium-q8_0	1	131.00	6.57	1.96	0.13	`dc8dda6`
M2 ULTRA	METAL	medium-dis	1	110.85	1.00	0.24	0.02	`dc8dda6`
M2 ULTRA	METAL	large-v2	1	222.28	10.96	3.03	0.21	`dc8dda6`
M2 ULTRA	METAL	large-v2-q5_0	1	258.64	9.79	3.04	0.25	`dc8dda6`
M2 ULTRA	METAL	large-v2-q5_1	1	258.32	9.87	3.05	0.24	`dc8dda6`
M2 ULTRA	METAL	large-v2-q8_0	1	236.55	9.61	2.87	0.23	`dc8dda6`
M2 ULTRA	METAL	large-v2-dis	1	199.84	1.14	0.27	0.02	`dc8dda6`
M2 ULTRA	METAL	large-v3-turbo	1	201.52	1.77	0.45	0.03	`dc8dda6`
M2 ULTRA	METAL	large-v3-turbo-q5_0	1	233.14	1.56	0.47	0.04	`dc8dda6`
M2 ULTRA	METAL	large-v3-turbo-q8_0	1	214.23	1.53	0.44	0.04	`dc8dda6`

What's Changed

docs : add xcframework section to README.md [no ci] by @danbev in #2997
sync : ggml by @ggerganov in #2992
whisper.wasm : fix unknown language issue by @danbev in #3000
examples : update server.py to match github pages app [no ci] by @danbev in #3004
rename : ggerganov -> ggml-org by @ggerganov in #3005
whisper : fix "bench-all outputs an invalid result on larger models" by @fujimotos in #3002
tests : add script to benchmark whisper.cpp on LibriSpeech corpus by @fujimotos in #2999
ruby : Change homepage URI in Ruby gemspec by @KitaitiMakoto in #3007
fix dead link to models in readme by @gregsadetsky in #3006
Update uri.rb by @Olli in #3016
Update ruby_whisper_params.c by @Olli in #3022
xcf : use check for visionos build version by @danbev in #3021
Fix README.md by @ekaitz-zarraga in #3024
docs : document how to use 'WHISPER_FFMPEG' build option by @fujimotos in #3029
whisper : reduce delta_min from 1000ms to 100ms by @ggerganov in #3028
support max_context api for addon.node by @buxuku in #3025
Update README.md to note newer NVIDIA GPUs by @jeffklassen in #3031
ruby: use CMake in build process by @KitaitiMakoto in #3043
examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp by @fujimotos in #3038
feat: Add no-context option to server by @sachaarbonel in #3045
ruby : make Ruby bindings installed with build options by @KitaitiMakoto in #3056
examples : add HEAPU8 to exported runtime methods by @danbev in #3062
ci : disable freeBSD job in build.yml by @danbev in #3064
coreml : set convert_to="mlprogram" in convert by @danbev in #3060
sync : ggml by @ggerganov in #3071
ci : enable bindings java job by @danbev in #3070
ruby : add encoder begin callback related methods by @KitaitiMakoto in #3076
Fix deprecated FFmpeg functions by @Podre-Henrique in #3073
Add Moore Threads GPU support and update GitHub workflow for MUSA build by @yeahdongcn in #3069
ci : disable publishing of java binding [no ci] by @danbev in #3086
talk-llama : sync llama.cpp by @ggerganov in #3084
whisper : remove empty .gitmodules file [no ci] by @danbev in #3085
feat: expose language detection probabilities to server example by @sachaarbonel in #3044
whisper : fix grammar advance stack warning by @danbev in #3087
ggml : suppress Windows compiler warnings by @danbev in #3075
make : fix samples glob pattern by @ggerganov in #3100
ruby : ignore "Downloading" output in test_log_suppress by @danbev in #3106
server : add --no-gpu option to print usage output by @danbev in #30...

Contributors

Olli, glaszig, and 25 other contributors

Assets 10

1 Join discussion

02 Apr 14:34

ggerganov

v1.7.5

51c6961

v1.7.5

Overview

This is a relatively big update with various build and CI improvements especially for iOS and WASM. There are also some performance gains, especially for the Metal backend and probably for Arm-based devices.

Big shoutout to @danbev for stepping up and completing the maintenance roadmap for this release!

Mobile examples

All mobile examples have been refreshed. The iOS examples specifically are now much easier to build thanks to the new XCFramework workflow. This should simplify significantly integration of whisper.cpp in 3rd party iOS and macOS apps. CoreML build and convert instructions have also been updated.

WASM examples

The WASM examples are now automatically updated on each new commit and hosted in Github Pages at
https://ggml.ai/whisper.cpp/

Problems with CORS rules should be resolved.

Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 ULTRA	METAL	tiny	1	1	7.82	1.31	0.35	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q5_0	1	1	8.32	1.28	0.37	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q5_1	1	1	8.21	1.28	0.37	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q8_0	1	1	7.97	1.23	0.36	0.01	`ad4e350`
M2 ULTRA	METAL	base	1	1	13.96	1.80	0.42	0.02	`ad4e350`
M2 ULTRA	METAL	base-q5_0	1	1	15.19	1.75	0.42	0.02	`ad4e350`
M2 ULTRA	METAL	base-q5_1	1	1	15.09	1.75	0.42	0.02	`ad4e350`
M2 ULTRA	METAL	base-q8_0	1	1	14.45	1.70	0.41	0.02	`ad4e350`
M2 ULTRA	METAL	small	1	1	40.08	3.54	0.86	0.05	`ad4e350`
M2 ULTRA	METAL	small-q5_0	1	1	45.07	3.51	0.88	0.05	`ad4e350`
M2 ULTRA	METAL	small-q5_1	1	1	45.05	3.52	0.88	0.05	`ad4e350`
M2 ULTRA	METAL	small-q8_0	1	1	42.04	3.34	0.85	0.05	`ad4e350`
M2 ULTRA	METAL	medium	1	1	107.20	7.28	1.79	0.11	`ad4e350`
M2 ULTRA	METAL	medium-q5_0	1	1	125.02	6.67	1.83	0.12	`ad4e350`
M2 ULTRA	METAL	medium-q5_1	1	1	124.83	6.70	1.84	0.12	`ad4e350`
M2 ULTRA	METAL	medium-q8_0	1	1	114.56	6.53	1.79	0.11	`ad4e350`
M2 ULTRA	METAL	medium-dis	1	1	95.96	1.01	0.23	0.01	`ad4e350`
M2 ULTRA	METAL	large-v2	1	1	194.29	10.57	2.67	0.20	`ad4e350`
M2 ULTRA	METAL	large-v2-q5_0	1	1	230.74	9.57	2.73	0.23	`ad4e350`
M2 ULTRA	METAL	large-v2-q5_1	1	1	229.97	9.69	2.74	0.23	`ad4e350`
M2 ULTRA	METAL	large-v2-q8_0	1	1	208.11	9.37	2.60	0.21	`ad4e350`
M2 ULTRA	METAL	large-v2-dis	1	1	172.72	1.12	0.26	0.02	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo	1	1	174.46	1.74	0.42	0.03	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo-q5_0	1	1	205.78	1.54	0.42	0.04	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo-q8_0	1	1	186.33	1.50	0.40	0.03	`ad4e350`

Flash Attention OFF:

CPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M2 ULTRA	METAL	tiny	1	8.74	1.20	0.36	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q5_0	1	10.30	1.15	0.38	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q5_1	1	10.71	1.13	0.38	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q8_0	1	9.97	1.12	0.37	0.01	`ad4e350`
M2 ULTRA	METAL	base	1	16.77	1.71	0.44	0.02	`ad4e350`
M2 ULTRA	METAL	base-q5_0	1	16.92	1.63	0.44	0.02	`ad4e350`
M2 ULTRA	METAL	base-q5_1	1	16.84	1.63	0.44	0.02	`ad4e350`
M2 ULTRA	METAL	base-q8_0	1	16.12	1.63	0.44	0.02	`ad4e350`
M2 ULTRA	METAL	small	1	45.29	3.44	0.92	0.05	`ad4e350`
M2 ULTRA	METAL	small-q5_0	1	50.43	3.34	0.94	0.06	`ad4e350`
M2 ULTRA	METAL	small-q5_1	1	50.49	3.35	0.93	0.06	`ad4e350`
M2 ULTRA	METAL	small-q8_0	1	47.37	3.20	0.91	0.05	`ad4e350`
M2 ULTRA	METAL	medium	1	122.81	7.39	1.99	0.12	`ad4e350`
M2 ULTRA	METAL	medium-q5_0	1	140.62	6.73	2.03	0.14	`ad4e350`
M2 ULTRA	METAL	medium-q5_1	1	140.44	6.74	2.04	0.14	`ad4e350`
M2 ULTRA	METAL	medium-q8_0	1	131.05	6.54	1.95	0.13	`ad4e350`
M2 ULTRA	METAL	medium-dis	1	110.95	0.99	0.24	0.02	`ad4e350`
M2 ULTRA	METAL	large-v2	1	222.19	10.93	3.01	0.21	`ad4e350`
M2 ULTRA	METAL	large-v2-q5_0	1	258.47	9.75	3.01	0.25	`ad4e350`
M2 ULTRA	METAL	large-v2-q5_1	1	258.40	9.85	3.01	0.24	`ad4e350`
M2 ULTRA	METAL	large-v2-q8_0	1	236.68	9.61	2.85	0.23	`ad4e350`
M2 ULTRA	METAL	large-v2-dis	1	199.28	1.12	0.27	0.02	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo	1	201.49	1.76	0.45	0.03	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo-q5_0	1	233.70	1.55	0.46	0.04	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo-q8_0	1	214.20	1.51	0.44	0.04	`ad4e350`

M4 Max

Flash Attention ON:

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M4 Max	METAL	tiny	1	1	15.22	0.89	0.26	0.01	`ad4e350`
M4 Max	METAL	tiny-q8_0	1	1	14.70	0.86	0.26	0.01	`ad4e350`
M4 Max	METAL	base	1	1	25.33	1.36	0.30	0.02	`ad4e350`
M4 Max	METAL	base-q8_0	1	1	21.27	1.31	0.30	0.02	`ad4e350`
M4 Max	METAL	small	1	1	58.43	2.78	0.60	0.05	`ad4e350`
M4 Max	METAL	small-q8_0	1	1	60.26	2.39	0.60	0.05	`ad4e350`
M4 Max	METAL	medium	1	1	169.73	6.03	1.31	0.14	`ad4e350`
M4 Max	METAL	medium-q8_0	1	1	176.61	4.99	1.31	0.14	`ad4e350`
M4 Max	METAL	large-v2	1	1	316.18	9.60	2.08	0.24	`ad4e350`
M4 Max	METAL	large-v2-q8_0	1	1	329.59	7.55	2.08	0.25	`ad4e350`

Flash Attention OFF:

CPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M4 Max	METAL	tiny	1	13.12	0.87	0.29	0.01	`ad4e350`
M4 Max	METAL	tiny-q8_0	1	15.90	0.88	0.31	0.01	`ad4e350`
M4 Max	METAL	base	1	23.10	1.42	0.34	0.02	`ad4e350`
M4 Max	METAL	base-q8_0	1	27.25	1.31	0.34	0.02	`ad4e350`
M4 Max	METAL	small	1	71.76	3.02	0.70	0.06	`ad4e350`
M4 Max	METAL	small-q8_0	1	73.88	2.60	0.71	0.06	`ad4e350`
M4 Max	METAL	medium	1	208.22	6.94	1.55	0.16	`ad4e350`
M4 Max	METAL	medium-q8_0	1	214.65	5.90	1.57	0.17	`ad4e350`
M4 Max	METAL	large-v2	1	381.72	11.28	2.51	0.29	`ad4e350`
M4 Max	METAL	large-v2-q8_0	1	394.97	8.90	2.45	0.30	`ad4e350`

V100

Flash Attention ON:

GPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
V100	AVX2 CUDA	tiny	8	1	4.01	0.90	0.25	0.01	`ad4e350`
V100	AVX2 CUDA	tiny-q5_1	8	1	4.12	0.88	0.18	0.01	`ad4e350`
V100	AVX2 CUDA	base	8	1	7.00	1.30	0.35	0.01	`ad4e350`
V100	AVX2 CUDA	base-q5_1	8	1	7.22	1.21	0.26	0.02	`ad4e350`
V100	AVX2 CUDA	small	8	1	18.68	2.39	0.69	0.03	`ad4e350`
V100	AVX2 CUDA	small-q5_1	8	1	19.38	2.32	0.51	0.03	`ad4e350`
V100	AVX2 CUDA	medium	8	1	53.17	5.15	1.45	0.06	`ad4e350`
V100	AVX2 CUDA	medium-q5_0	8 ...

Contributors

mdestagnol, KitaitiMakoto, and 27 other contributors

Assets 3

9 Join discussion

31 Mar 15:04

github-actions

b2365

e153b8e

b2365

android.java : re-add ggml source updates (#2975)

This commit updates the ggml source to include the new unary and binary
operations. I merged https://github.com/ggerganov/whisper.cpp/pull/2958
which seems to have overwritten the changes to the ggml source which
were added in https://github.com/ggerganov/whisper.cpp/pull/2972.

Sorry about this.

Assets 3

06 Jan 13:16

ggerganov

v1.7.4

8a9ad78

v1.7.4

Overview

Minor release with mostly build fixes.

What's Changed

whisper : rename binaries + fix install by @ggerganov in #2648
feat(server): Add option to suppress non-speech tokens by @sachaarbonel in #2649
whisper : rename suppress_non_speech_tokens to suppress_nst by @ggerganov in #2653
feat: expose no-speech probability in segment by @sachaarbonel in #2654
ruby : bug fix on callbacks and no_speech_prob by @KitaitiMakoto in #2656
Add no_speech_thold to cli by @alubbe in #2663
Add --suppress_nst support to cli by @alubbe in #2664
ruby : Fix of C++ header guard name, model URI support, type signature and more by @KitaitiMakoto in #2683
Enable Windows cublas build by @niksedk in #2676
docs: replace Core ML with OpenVINO by @konosky in #2686
rename ggml-cpu-aarch64.c to .cpp by @ego in #2687
readme : fix real-time audio input example build instructions by @samueldurantes in #2692
sync : ggml by @ggerganov in #2699
cli : fix segfault on missing argument by @redzic in #2700

New Contributors

@sachaarbonel made their first contribution in #2649
@alubbe made their first contribution in #2663
@niksedk made their first contribution in #2676
@konosky made their first contribution in #2686
@ego made their first contribution in #2687
@samueldurantes made their first contribution in #2692
@redzic made their first contribution in #2700

Full Changelog: v1.7.3...v1.7.4

Contributors

KitaitiMakoto, niksedk, and 7 other contributors

Assets 2

8 Join discussion

18 Dec 16:15

ggerganov

v1.7.3

3de9dee

v1.7.3

Overview

Massive performance improvements for the Metal backend, especially for beams > 1 and for quantized models
Reduce hallucinations during silence by @jkarthic in #2629
Implement no_speech_thold by @jkarthic in #2625

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	Metal	tiny	1	1	7.90	1.26	0.35	0.01	`ed733e8`
M2 Ultra	Metal	tiny-q5_0	1	1	8.44	1.23	0.36	0.01	`ed733e8`
M2 Ultra	Metal	tiny-q5_1	1	1	8.26	1.27	0.37	0.01	`ed733e8`
M2 Ultra	Metal	tiny-q8_0	1	1	8.03	1.21	0.35	0.01	`ed733e8`
M2 Ultra	Metal	base	1	1	13.77	1.80	0.42	0.02	`ed733e8`
M2 Ultra	Metal	base-q5_0	1	1	15.02	1.72	0.42	0.02	`ed733e8`
M2 Ultra	Metal	base-q5_1	1	1	14.93	1.74	0.42	0.02	`ed733e8`
M2 Ultra	Metal	base-q8_0	1	1	14.26	1.68	0.41	0.02	`ed733e8`
M2 Ultra	Metal	small	1	1	39.76	3.54	0.85	0.05	`ed733e8`
M2 Ultra	Metal	small-q5_0	1	1	45.07	3.47	0.87	0.05	`ed733e8`
M2 Ultra	Metal	small-q5_1	1	1	44.82	3.49	0.87	0.05	`ed733e8`
M2 Ultra	Metal	small-q8_0	1	1	41.79	3.30	0.84	0.05	`ed733e8`
M2 Ultra	Metal	medium	1	1	106.73	7.28	1.78	0.11	`ed733e8`
M2 Ultra	Metal	medium-q5_0	1	1	124.43	6.63	1.83	0.12	`ed733e8`
M2 Ultra	Metal	medium-q5_1	1	1	124.19	6.70	1.84	0.12	`ed733e8`
M2 Ultra	Metal	medium-q8_0	1	1	113.88	6.52	1.75	0.11	`ed733e8`
M2 Ultra	Metal	medium-dis	1	1	94.97	0.97	0.22	0.01	`ed733e8`
M2 Ultra	Metal	large-v2	1	1	193.33	10.53	2.65	0.20	`ed733e8`
M2 Ultra	Metal	large-v2-q5_0	1	1	229.22	9.52	2.72	0.23	`ed733e8`
M2 Ultra	Metal	large-v2-q5_1	1	1	229.40	9.62	2.73	0.23	`ed733e8`
M2 Ultra	Metal	large-v2-q8_0	1	1	207.30	9.36	2.59	0.21	`ed733e8`
M2 Ultra	Metal	large-v2-dis	1	1	171.43	1.09	0.25	0.02	`ed733e8`
M2 Ultra	Metal	large-v3-turbo	1	1	173.45	1.73	0.41	0.03	`ed733e8`
M2 Ultra	Metal	large-v3-turbo-q5_0	1	1	205.52	1.52	0.42	0.04	`ed733e8`
M2 Ultra	Metal	large-v3-turbo-q8_0	1	1	185.90	1.48	0.40	0.03	`ed733e8`

What's Changed

sync : ggml by @ggerganov in #2573
ruby : Follow source tree change by @KitaitiMakoto in #2580
Add q8_0 models to download-ggml-model.sh by @mrienstra in #2589
ruby : Add low-level methods to transcribe by @KitaitiMakoto in #2585
sync : ggml by @ggerganov in #2608
ruby : Sync whisper.cpp and model download feature by @KitaitiMakoto in #2617
Fix typo in download-ggml-model.sh by @mrienstra in #2623
Add Missing Include Directory for ggml-cpu in whisper.android CMakeLists by @Thamster in #2624
fix: prevent division by zero in soft_max vulkan shader by @gn64 in #2633
cmake : fix "amd64" processor string by @ggerganov in #2638
Fix typo in Java Binding README by @crummyh in #2637
Fix hallucinations during silence by @jkarthic in #2629
Implement no_speech_thold by @jkarthic in #2625
Improve consistency in stream exameple README commands by @crummyh in #2642
ruby : Add no_speech_thold by @KitaitiMakoto in #2641
sync : ggml by @ggerganov in #2639
ci : msys enable SDL2 build by @ggerganov in #2635

New Contributors

@Thamster made their first contribution in #2624
@gn64 made their first contribution in #2633
@crummyh made their first contribution in #2637
@jkarthic made their first contribution in #2629

Full Changelog: v1.7.2...v1.7.3

Contributors

Thamster, KitaitiMakoto, and 5 other contributors

Assets 2

1 Join discussion

09 Dec 09:34

ggerganov

v1.7.3-pre

ed733e8

v1.7.3-pre Pre-release

Pre-release

Overview

Massive performance improvements for the Metal backend, especially for beams > 1. Especially for quantized models.
Setting as "pre-release" since there have been major changes to the build system (now using CMake) and I wan't to gather some feedback about how well the project builds now on various platforms. Please leave comments in the discussion to help fix any remaining issues before the official release.

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	Metal	tiny	1	1	7.90	1.26	0.35	0.01	`ed733e8`
M2 Ultra	Metal	tiny-q5_0	1	1	8.44	1.23	0.36	0.01	`ed733e8`
M2 Ultra	Metal	tiny-q5_1	1	1	8.26	1.27	0.37	0.01	`ed733e8`
M2 Ultra	Metal	tiny-q8_0	1	1	8.03	1.21	0.35	0.01	`ed733e8`
M2 Ultra	Metal	base	1	1	13.77	1.80	0.42	0.02	`ed733e8`
M2 Ultra	Metal	base-q5_0	1	1	15.02	1.72	0.42	0.02	`ed733e8`
M2 Ultra	Metal	base-q5_1	1	1	14.93	1.74	0.42	0.02	`ed733e8`
M2 Ultra	Metal	base-q8_0	1	1	14.26	1.68	0.41	0.02	`ed733e8`
M2 Ultra	Metal	small	1	1	39.76	3.54	0.85	0.05	`ed733e8`
M2 Ultra	Metal	small-q5_0	1	1	45.07	3.47	0.87	0.05	`ed733e8`
M2 Ultra	Metal	small-q5_1	1	1	44.82	3.49	0.87	0.05	`ed733e8`
M2 Ultra	Metal	small-q8_0	1	1	41.79	3.30	0.84	0.05	`ed733e8`
M2 Ultra	Metal	medium	1	1	106.73	7.28	1.78	0.11	`ed733e8`
M2 Ultra	Metal	medium-q5_0	1	1	124.43	6.63	1.83	0.12	`ed733e8`
M2 Ultra	Metal	medium-q5_1	1	1	124.19	6.70	1.84	0.12	`ed733e8`
M2 Ultra	Metal	medium-q8_0	1	1	113.88	6.52	1.75	0.11	`ed733e8`
M2 Ultra	Metal	medium-dis	1	1	94.97	0.97	0.22	0.01	`ed733e8`
M2 Ultra	Metal	large-v2	1	1	193.33	10.53	2.65	0.20	`ed733e8`
M2 Ultra	Metal	large-v2-q5_0	1	1	229.22	9.52	2.72	0.23	`ed733e8`
M2 Ultra	Metal	large-v2-q5_1	1	1	229.40	9.62	2.73	0.23	`ed733e8`
M2 Ultra	Metal	large-v2-q8_0	1	1	207.30	9.36	2.59	0.21	`ed733e8`
M2 Ultra	Metal	large-v2-dis	1	1	171.43	1.09	0.25	0.02	`ed733e8`
M2 Ultra	Metal	large-v3-turbo	1	1	173.45	1.73	0.41	0.03	`ed733e8`
M2 Ultra	Metal	large-v3-turbo-q5_0	1	1	205.52	1.52	0.42	0.04	`ed733e8`
M2 Ultra	Metal	large-v3-turbo-q8_0	1	1	185.90	1.48	0.40	0.03	`ed733e8`

What's Changed

sync : ggml by @ggerganov in #2573
ruby : Follow source tree change by @KitaitiMakoto in #2580
Add q8_0 models to download-ggml-model.sh by @mrienstra in #2589
ruby : Add low-level methods to transcribe by @KitaitiMakoto in #2585
sync : ggml by @ggerganov in #2608

Full Changelog: v1.7.2...v1.7.3-pre

Contributors

KitaitiMakoto, mrienstra, and ggerganov

Assets 2

9 Join discussion

19 Nov 16:55

ggerganov

v1.7.2

6266a9f

v1.7.2

Overview

Various improvements in the Metal backend
Fix extra memory usage for large samples
Remove limit for ggml_context (i.e. more beams and processors are supported)

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	METAL	tiny	1	1	9.51	1.39	0.41	0.01	`83ac284`
M2 Ultra	METAL	tiny-q5_0	1	1	9.57	1.41	0.42	0.01	`83ac284`
M2 Ultra	METAL	tiny-q5_1	1	1	8.74	1.39	0.42	0.01	`83ac284`
M2 Ultra	METAL	tiny-q8_0	1	1	8.36	1.33	0.41	0.01	`83ac284`
M2 Ultra	METAL	base	1	1	14.27	1.90	0.63	0.02	`83ac284`
M2 Ultra	METAL	base-q5_0	1	1	15.50	1.90	0.65	0.02	`83ac284`
M2 Ultra	METAL	base-q5_1	1	1	15.67	1.88	0.65	0.02	`83ac284`
M2 Ultra	METAL	base-q8_0	1	1	14.69	1.81	0.63	0.02	`83ac284`
M2 Ultra	METAL	small	1	1	40.85	3.77	1.43	0.05	`83ac284`
M2 Ultra	METAL	small-q5_0	1	1	45.99	3.90	1.52	0.05	`83ac284`
M2 Ultra	METAL	small-q5_1	1	1	46.19	3.83	1.50	0.06	`83ac284`
M2 Ultra	METAL	small-q8_0	1	1	42.90	3.65	1.46	0.05	`83ac284`
M2 Ultra	METAL	medium	1	1	109.01	7.59	3.24	0.11	`83ac284`
M2 Ultra	METAL	medium-q5_0	1	1	126.78	7.55	3.45	0.13	`83ac284`
M2 Ultra	METAL	medium-q5_1	1	1	127.71	7.39	3.43	0.13	`83ac284`
M2 Ultra	METAL	medium-q8_0	1	1	115.97	7.21	3.35	0.12	`83ac284`
M2 Ultra	METAL	medium-dis	1	1	97.74	1.06	0.36	0.01	`83ac284`
M2 Ultra	METAL	large-v2	1	1	196.99	11.29	5.06	0.20	`83ac284`
M2 Ultra	METAL	large-v2-q5_0	1	1	233.88	10.83	5.56	0.24	`83ac284`
M2 Ultra	METAL	large-v2-q5_1	1	1	234.03	10.73	5.46	0.24	`83ac284`
M2 Ultra	METAL	large-v2-q8_0	1	1	210.83	10.29	5.23	0.22	`83ac284`
M2 Ultra	METAL	large-v2-dis	1	1	175.37	1.18	0.42	0.02	`83ac284`
M2 Ultra	METAL	large-v3-turbo	1	1	177.35	1.85	0.73	0.03	`83ac284`
M2 Ultra	METAL	large-v3-turbo-q5_0	1	1	209.31	1.69	0.80	0.04	`83ac284`
M2 Ultra	METAL	large-v3-turbo-q8_0	1	1	189.55	1.64	0.75	0.03	`83ac284`

What's Changed

Added OpenVino init on state by @sandrohanea in #2464
Updating the Quick start by @stsfaroz in #2475
max_length from max_target_positions by @CrispStrobe in #2477
Add dtw preset for large-v3-turbo by @rotemdan in #2481
make : fix GGML_VULKAN=1 build by @ggerganov in #2485
Add Vulkan notice in README.md by @toboil-features in #2488
Fix Ruby binding building by @KitaitiMakoto in #2484
Update of README.md by @toboil-features in #2489
whisper: fix index overflow by @Josscii in #2505
ruby : Add Metal support by @KitaitiMakoto in #2516
ruby: New segment callback by @KitaitiMakoto in #2506
ruby : add more APIs by @KitaitiMakoto in #2518
ruby: fix installation test by @KitaitiMakoto in #2519
When DTW timestamps are enabled, defer new_segment_callback until after DTW compute step by @jettoblack in #2515
ci : fix openblas build by @ggerganov in #2511
whisper : reduce ggml_context usage by @ggerganov in #2525
sync : ggml by @ggerganov in #2528
passing samples_padded by ref to the threads. by @vinmisra in #2534
fix ffmpeg v5 build by @stsydow in #2543
fix: ggml-vulkan logs by @thewh1teagle in #2547
Fix the instructions on the Ruby binding by @wilsonsilva in #2548
whisper.swiftui : add model download list & bench methods by @jhen0409 in #2546
ruby : Add more API by @KitaitiMakoto in #2551
Fix building workflow for linux/arm64 container by @rai62 in #2555
sync : ggml by @ggerganov in #2561
whisper.swiftui : switch Mac dest to Mac (Designed for iPad) by @jhen0409 in #2562
ci : use local ggml by @ggerganov in #2567
sycl: fix example build by @stsydow in #2570

New Contributors

@stsfaroz made their first contribution in #2475
@CrispStrobe made their first contribution in #2477
@toboil-features made their first contribution in #2488
@KitaitiMakoto made their first contribution in #2484
@Josscii made their first contribution in #2505
@jettoblack made their first contribution in #2515
@vinmisra made their first contribution in #2534
@stsydow made their first contribution in #2543
@wilsonsilva made their first contribution in #2548
@rai62 made their first contribution in #2555

Full Changelog: v1.7.1...v1.7.2

Contributors

KitaitiMakoto, wilsonsilva, and 13 other contributors

Assets 2

5 Join discussion

15 Nov 14:05

ggerganov

v1.7.2-pre

f02b40b

v1.7.2-pre Pre-release

Pre-release

Overview

This is a pre-release since I think there have been some reports about memory leaks which I haven't had the time to investigate and confirm. If these are resolved in the next days, will add them to the official 1.7.2 release next week.

Various improvements in the Metal backend
Fix extra memory usage for large samples
Remove limit for ggml_context (i.e. more beams and processors are supported)

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	METAL	tiny	1	1	9.51	1.39	0.41	0.01	`83ac284`
M2 Ultra	METAL	tiny-q5_0	1	1	9.57	1.41	0.42	0.01	`83ac284`
M2 Ultra	METAL	tiny-q5_1	1	1	8.74	1.39	0.42	0.01	`83ac284`
M2 Ultra	METAL	tiny-q8_0	1	1	8.36	1.33	0.41	0.01	`83ac284`
M2 Ultra	METAL	base	1	1	14.27	1.90	0.63	0.02	`83ac284`
M2 Ultra	METAL	base-q5_0	1	1	15.50	1.90	0.65	0.02	`83ac284`
M2 Ultra	METAL	base-q5_1	1	1	15.67	1.88	0.65	0.02	`83ac284`
M2 Ultra	METAL	base-q8_0	1	1	14.69	1.81	0.63	0.02	`83ac284`
M2 Ultra	METAL	small	1	1	40.85	3.77	1.43	0.05	`83ac284`
M2 Ultra	METAL	small-q5_0	1	1	45.99	3.90	1.52	0.05	`83ac284`
M2 Ultra	METAL	small-q5_1	1	1	46.19	3.83	1.50	0.06	`83ac284`
M2 Ultra	METAL	small-q8_0	1	1	42.90	3.65	1.46	0.05	`83ac284`
M2 Ultra	METAL	medium	1	1	109.01	7.59	3.24	0.11	`83ac284`
M2 Ultra	METAL	medium-q5_0	1	1	126.78	7.55	3.45	0.13	`83ac284`
M2 Ultra	METAL	medium-q5_1	1	1	127.71	7.39	3.43	0.13	`83ac284`
M2 Ultra	METAL	medium-q8_0	1	1	115.97	7.21	3.35	0.12	`83ac284`
M2 Ultra	METAL	medium-dis	1	1	97.74	1.06	0.36	0.01	`83ac284`
M2 Ultra	METAL	large-v2	1	1	196.99	11.29	5.06	0.20	`83ac284`
M2 Ultra	METAL	large-v2-q5_0	1	1	233.88	10.83	5.56	0.24	`83ac284`
M2 Ultra	METAL	large-v2-q5_1	1	1	234.03	10.73	5.46	0.24	`83ac284`
M2 Ultra	METAL	large-v2-q8_0	1	1	210.83	10.29	5.23	0.22	`83ac284`
M2 Ultra	METAL	large-v2-dis	1	1	175.37	1.18	0.42	0.02	`83ac284`
M2 Ultra	METAL	large-v3-turbo	1	1	177.35	1.85	0.73	0.03	`83ac284`
M2 Ultra	METAL	large-v3-turbo-q5_0	1	1	209.31	1.69	0.80	0.04	`83ac284`
M2 Ultra	METAL	large-v3-turbo-q8_0	1	1	189.55	1.64	0.75	0.03	`83ac284`

What's Changed

Added OpenVino init on state by @sandrohanea in #2464
Updating the Quick start by @stsfaroz in #2475
max_length from max_target_positions by @CrispStrobe in #2477
Add dtw preset for large-v3-turbo by @rotemdan in #2481
make : fix GGML_VULKAN=1 build by @ggerganov in #2485
Add Vulkan notice in README.md by @toboil-features in #2488
Fix Ruby binding building by @KitaitiMakoto in #2484
Update of README.md by @toboil-features in #2489
whisper: fix index overflow by @Josscii in #2505
ruby : Add Metal support by @KitaitiMakoto in #2516
ruby: New segment callback by @KitaitiMakoto in #2506
ruby : add more APIs by @KitaitiMakoto in #2518
ruby: fix installation test by @KitaitiMakoto in #2519
When DTW timestamps are enabled, defer new_segment_callback until after DTW compute step by @jettoblack in #2515
ci : fix openblas build by @ggerganov in #2511
whisper : reduce ggml_context usage by @ggerganov in #2525
sync : ggml by @ggerganov in #2528
passing samples_padded by ref to the threads. by @vinmisra in #2534
fix ffmpeg v5 build by @stsydow in #2543
fix: ggml-vulkan logs by @thewh1teagle in #2547
Fix the instructions on the Ruby binding by @wilsonsilva in #2548
whisper.swiftui : add model download list & bench methods by @jhen0409 in #2546
ruby : Add more API by @KitaitiMakoto in #2551
Fix building workflow for linux/arm64 container by @rai62 in #2555
sync : ggml by @ggerganov in #2561
whisper.swiftui : switch Mac dest to Mac (Designed for iPad) by @jhen0409 in #2562

New Contributors

@stsfaroz made their first contribution in #2475
@CrispStrobe made their first contribution in #2477
@toboil-features made their first contribution in #2488
@KitaitiMakoto made their first contribution in #2484
@Josscii made their first contribution in #2505
@jettoblack made their first contribution in #2515
@vinmisra made their first contribution in #2534
@stsydow made their first contribution in #2543
@wilsonsilva made their first contribution in #2548
@rai62 made their first contribution in #2555

Full Changelog: v1.7.1...v1.7.2-pre

Contributors

KitaitiMakoto, wilsonsilva, and 13 other contributors

Assets 2

0 Join discussion

07 Oct 10:09

ggerganov

v1.7.1

ebca09a

v1.7.1

Overview

Fix Vulkan crashes
Performance stats for Vulkan on RTX 2060

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	VULKAN	tiny	1	30.38	1.37	1.04	0.05	`9f346d0`
RTX 2060	VULKAN	tiny-q5_0	1	20.98	1.38	0.99	0.05	`9f346d0`
RTX 2060	VULKAN	tiny-q5_1	1	20.74	1.30	0.96	0.05	`9f346d0`
RTX 2060	VULKAN	base	1	44.69	1.59	1.78	0.09	`9f346d0`
RTX 2060	VULKAN	base-q5_0	1	39.72	2.11	1.72	0.08	`9f346d0`
RTX 2060	VULKAN	base-q5_1	1	39.45	2.01	1.63	0.08	`9f346d0`
RTX 2060	VULKAN	small	1	160.02	3.53	4.64	0.23	`9f346d0`
RTX 2060	VULKAN	small-q5_0	1	141.52	4.54	4.44	0.20	`9f346d0`
RTX 2060	VULKAN	small-q5_1	1	141.03	4.63	4.18	0.20	`9f346d0`
RTX 2060	VULKAN	medium	1	472.66	7.55	11.35	0.56	`9f346d0`
RTX 2060	VULKAN	medium-q5_0	1	395.55	9.81	10.64	0.49	`9f346d0`
RTX 2060	VULKAN	medium-q5_1	1	398.85	10.16	10.15	0.50	`9f346d0`
RTX 2060	VULKAN	medium-dis	1	427.26	1.26	1.20	0.08	`9f346d0`
RTX 2060	VULKAN	large-v2	1	924.60	12.36	18.56	1.01	`9f346d0`
RTX 2060	VULKAN	large-v2-q5_0	1	774.21	17.25	17.17	0.85	`9f346d0`
RTX 2060	VULKAN	large-v2-q5_1	1	779.75	17.44	16.27	0.85	`9f346d0`
RTX 2060	VULKAN	large-v2-dis	1	833.35	1.38	1.56	0.10	`9f346d0`
RTX 2060	VULKAN	large-v3-turbo	1	839.90	2.11	2.70	0.16	`9f346d0`
RTX 2060	VULKAN	large-v3-turbo-q5_0	1	705.49	3.22	2.53	0.14	`9f346d0`

What's Changed

Retry allocation with fallback flags by @SRHMorris in #2451

New Contributors

@SRHMorris made their first contribution in #2451

Full Changelog: v1.7.0...v1.7.1

Binaries

https://github.com/ggerganov/whisper.cpp/actions/runs/11213279590

Contributors

SRHMorris

Assets 2

05 Oct 14:15

ggerganov

v1.7.0

6a94163

v1.7.0

Overview

Fix crashes with high number of beams
Reduce overal VRAM usage
Optimize Encoder performance

Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

GPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	METAL	tiny	1	1	8.37	1.44	0.48	0.01	`6a94163`
M2 Ultra	METAL	tiny-q5_0	1	1	9.81	1.46	0.50	0.01	`6a94163`
M2 Ultra	METAL	tiny-q5_1	1	1	8.80	1.47	0.50	0.01	`6a94163`
M2 Ultra	METAL	base	1	1	16.11	1.96	0.74	0.02	`6a94163`
M2 Ultra	METAL	base-q5_0	1	1	16.38	1.99	0.78	0.02	`6a94163`
M2 Ultra	METAL	base-q5_1	1	1	16.72	2.00	0.77	0.02	`6a94163`
M2 Ultra	METAL	small	1	1	41.26	3.88	1.66	0.05	`6a94163`
M2 Ultra	METAL	small-q5_0	1	1	46.91	4.02	1.76	0.06	`6a94163`
M2 Ultra	METAL	small-q5_1	1	1	47.05	4.00	1.73	0.06	`6a94163`
M2 Ultra	METAL	medium	1	1	111.29	7.79	3.63	0.11	`6a94163`
M2 Ultra	METAL	medium-q5_0	1	1	129.78	7.71	3.85	0.13	`6a94163`
M2 Ultra	METAL	medium-q5_1	1	1	129.29	7.71	3.87	0.13	`6a94163`
M2 Ultra	METAL	medium-dis	1	1	99.27	1.09	0.43	0.02	`6a94163`
M2 Ultra	METAL	large-v2	1	1	198.81	11.54	5.59	0.20	`6a94163`
M2 Ultra	METAL	large-v2-q5_0	1	1	236.18	11.12	6.11	0.24	`6a94163`
M2 Ultra	METAL	large-v2-q5_1	1	1	235.88	11.14	6.01	0.24	`6a94163`
M2 Ultra	METAL	large-v2-dis	1	1	177.41	1.21	0.48	0.02	`6a94163`
M2 Ultra	METAL	large-v3-turbo	1	1	178.92	1.89	0.83	0.03	`6a94163`
M2 Ultra	METAL	large-v3-turbo-q5_0	1	1	211.44	1.73	0.90	0.04	`6a94163`

Flash Attention OFF:

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M2 Ultra	METAL	tiny	1	10.04	1.37	0.50	0.01	`6a94163`
M2 Ultra	METAL	tiny-q5_0	1	10.02	1.36	0.53	0.01	`6a94163`
M2 Ultra	METAL	tiny-q5_1	1	11.08	1.37	0.53	0.01	`6a94163`
M2 Ultra	METAL	base	1	17.84	1.93	0.77	0.02	`6a94163`
M2 Ultra	METAL	base-q5_0	1	18.57	1.92	0.81	0.02	`6a94163`
M2 Ultra	METAL	base-q5_1	1	18.66	1.93	0.82	0.02	`6a94163`
M2 Ultra	METAL	small	1	48.26	3.95	1.73	0.05	`6a94163`
M2 Ultra	METAL	small-q5_0	1	53.68	3.99	1.85	0.06	`6a94163`
M2 Ultra	METAL	small-q5_1	1	53.86	4.00	1.82	0.06	`6a94163`
M2 Ultra	METAL	medium	1	130.09	8.01	3.82	0.13	`6a94163`
M2 Ultra	METAL	medium-q5_0	1	148.18	7.92	4.11	0.14	`6a94163`
M2 Ultra	METAL	medium-q5_1	1	147.95	7.94	4.11	0.14	`6a94163`
M2 Ultra	METAL	medium-dis	1	116.97	1.11	0.42	0.02	`6a94163`
M2 Ultra	METAL	large-v2	1	232.43	12.34	5.87	0.22	`6a94163`
M2 Ultra	METAL	large-v2-q5_0	1	269.72	11.68	6.44	0.26	`6a94163`
M2 Ultra	METAL	large-v2-q5_1	1	269.71	11.82	6.36	0.26	`6a94163`
M2 Ultra	METAL	large-v2-dis	1	209.25	1.25	0.48	0.02	`6a94163`
M2 Ultra	METAL	large-v3-turbo	1	211.09	1.98	0.84	0.03	`6a94163`
M2 Ultra	METAL	large-v3-turbo-q5_0	1	244.23	1.81	0.92	0.04	`6a94163`

Ryzen 9 5950X + RTX 2060

Flash Attention ON:

GPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	AVX2 CUDA	tiny	1	1	7.35	0.78	0.24	0.01	`6a94163`
RTX 2060	AVX2 CUDA	tiny-q5_0	1	1	6.45	0.67	0.14	0.01	`6a94163`
RTX 2060	AVX2 CUDA	tiny-q5_1	1	1	6.39	0.66	0.14	0.01	`6a94163`
RTX 2060	AVX2 CUDA	base	1	1	10.20	0.88	0.30	0.01	`6a94163`
RTX 2060	AVX2 CUDA	base-q5_0	1	1	11.38	0.92	0.21	0.02	`6a94163`
RTX 2060	AVX2 CUDA	base-q5_1	1	1	11.76	0.91	0.20	0.02	`6a94163`
RTX 2060	AVX2 CUDA	small	1	1	33.06	2.00	0.56	0.03	`6a94163`
RTX 2060	AVX2 CUDA	small-q5_0	1	1	35.84	1.84	0.43	0.04	`6a94163`
RTX 2060	AVX2 CUDA	small-q5_1	1	1	36.89	1.82	0.42	0.04	`6a94163`
RTX 2060	AVX2 CUDA	medium	1	1	90.65	4.54	1.13	0.08	`6a94163`
RTX 2060	AVX2 CUDA	medium-q5_0	1	1	104.01	3.80	0.91	0.10	`6a94163`
RTX 2060	AVX2 CUDA	medium-q5_1	1	1	107.98	3.72	0.87	0.10	`6a94163`
RTX 2060	AVX2 CUDA	medium-dis	1	1	79.08	0.68	0.17	0.01	`6a94163`
RTX 2060	AVX2 CUDA	large-v2	1	1	162.00	7.52	1.92	0.14	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-q5_0	1	1	184.59	5.64	1.50	0.16	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-q5_1	1	1	193.85	5.55	1.44	0.17	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-dis	1	1	140.75	0.84	0.37	0.02	`6a94163`
RTX 2060	AVX2 CUDA	large-v3-turbo	1	1	143.38	1.29	0.36	0.02	`6a94163`
RTX 2060	AVX2 CUDA	large-v3-turbo-q5_0	1	1	163.30	0.93	0.28	0.03	`6a94163`

Flash Attention OFF:

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	AVX2 CUDA	tiny	1	12.49	0.87	0.23	0.01	`6a94163`
RTX 2060	AVX2 CUDA	tiny-q5_0	1	10.65	0.78	0.19	0.02	`6a94163`
RTX 2060	AVX2 CUDA	tiny-q5_1	1	10.82	0.77	0.19	0.02	`6a94163`
RTX 2060	AVX2 CUDA	base	1	18.97	1.04	0.34	0.02	`6a94163`
RTX 2060	AVX2 CUDA	base-q5_0	1	20.22	1.09	0.27	0.02	`6a94163`
RTX 2060	AVX2 CUDA	base-q5_1	1	20.48	1.07	0.27	0.02	`6a94163`
RTX 2060	AVX2 CUDA	small	1	59.52	2.37	0.70	0.05	`6a94163`
RTX 2060	AVX2 CUDA	small-q5_0	1	62.98	2.23	0.60	0.06	`6a94163`
RTX 2060	AVX2 CUDA	small-q5_1	1	63.64	2.21	0.59	0.06	`6a94163`
RTX 2060	AVX2 CUDA	medium	1	161.53	5.36	1.53	0.13	`6a94163`
RTX 2060	AVX2 CUDA	medium-q5_0	1	174.96	4.64	1.32	0.15	`6a94163`
RTX 2060	AVX2 CUDA	medium-q5_1	1	178.42	4.57	1.29	0.15	`6a94163`
RTX 2060	AVX2 CUDA	medium-dis	1	149.65	0.75	0.20	0.02	`6a94163`
RTX 2060	AVX2 CUDA	large-v2	1	280.55	8.74	2.51	0.23	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-q5_0	1	306.87	6.92	2.08	0.25	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-q5_1	1	314.25	6.82	2.02	0.26	`6a94163`
RTX 2060	AVX2 CUDA	large-v2-dis	1	259.39	0.91	0.37	0.02	`6a94163`
RTX 2060	AVX2 CUDA	large-v3-turbo	1	261.83	1.44	0.41	0.04	`6a94163`
RTX 2060	AVX2 CUDA	large-v3-turbo-q5_0	1	282.99	1.09	0.33	0.04	`6a94163`

Vulkan:

GPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
RTX 2060	VULKAN	tiny	1	0	30.38	1.37	1.04	0.05	`9f346d0`
RTX 2060	VULKAN	tiny-q5_0	1	0	20.98	1.38	0.99	0.05	`9f346d0`
RTX 2060	VULKAN	tiny-q5_1	1	0	20.74	1.30	0.96	0.05	`9f346d0`
RTX 2060	VULKAN	base	1	0	44.69	1.59	1.78	0.09	`9f346d0`
RTX 2060	VULKAN	base-q5_0	1	0	39.72	2.11	1.72	0.08	`9f346d0`
RTX 2060	VULKAN	base-q5_1	1	0	39.45	2.01	1.63	0.08	`9f346d0`
RTX 2060	VULKAN	small	1	0	160.02	3.53	4.64	0.23	`9f346d0`
RTX 2060	VULKAN	small-q5_0	1	0	141.52	4.54	4.44	0.20	`9f346d0`
RTX 2060	VULKA...