Skip to content

Golang bindings are 45 slower than the original C++ binary #421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ilyazub opened this issue Jan 17, 2023 · 4 comments
Closed

Golang bindings are 45 slower than the original C++ binary #421

ilyazub opened this issue Jan 17, 2023 · 4 comments

Comments

@ilyazub
Copy link

ilyazub commented Jan 17, 2023

Golang bindings are 45 times slower than the C++ binary when transcoding samples/jfk.wav using the ggml-tiny.en.bin model.

C++ binary Golang bindings
1.428s 63.919s

C++ and Golang examples are compiled following the readme. I haven't profiled the Golang bindings yet.

Raw results

C++ binary

time ./main -m ./models/ggml-tiny.en.bin -f ./samples/jfk.wav
whisper_init_from_file: loading model from './models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: mem required  =  387.00 MB (+    3.00 MB per decoder)
whisper_model_load: kv self size  =    2.62 MB
whisper_model_load: kv cross size =    8.79 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.58 MB
whisper_model_load: model size    =   73.54 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:07.740]   And so my fellow Americans ask not what your country can do for you
[00:00:07.740 --> 00:00:10.740]   ask what you can do for your country


whisper_print_timings:     load time =   186.77 ms
whisper_print_timings:      mel time =    83.82 ms
whisper_print_timings:   sample time =    16.98 ms
whisper_print_timings:   encode time =   791.23 ms / 197.81 ms per layer
whisper_print_timings:   decode time =   289.46 ms / 72.36 ms per layer
whisper_print_timings:    total time =  1374.19 ms

real    0m1.428s
user    0m4.208s
sys 0m0.280s

Golang bindings

time ./build/go-whisper -model ./models/ggml-tiny.en.bin samples/jfk.wav
whisper_init_from_file: loading model from './models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: mem required  =  387.00 MB (+    3.00 MB per decoder)
whisper_model_load: kv self size  =    2.62 MB
whisper_model_load: kv cross size =    8.79 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.58 MB
whisper_model_load: model size    =   73.54 MB
Loading "./samples/jfk.wav"
  ...processing "./samples/jfk.wav"
[    0s-> 7.74s]  And so my fellow Americans ask not what your country can do for you
[ 7.74s->10.74s]  ask what you can do for your country

real    1m3.919s
user    4m7.851s
sys 0m6.771s
@glaslos
Copy link
Contributor

glaslos commented Jan 23, 2023

Sounds unrealistic to me. I only did one run as the runtimes are so similar that repeating to average the timings seemed unnecessary.

This is what I get if I run the C++ vs Go example against each other:

C++ example

whisper.cpp$ ./main -m ./bindings/go/models/ggml-small.en.bin samples/jfk.wav 
whisper_init_from_file: loading model from './bindings/go/models/ggml-small.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 3
whisper_model_load: adding 1607 extra tokens
whisper_model_load: mem_required  = 1044.00 MB
whisper_model_load: ggml ctx size =  464.56 MB
whisper_model_load: memory size   =   68.48 MB
whisper_model_load: model size    =  464.44 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:08.000]   And so, my fellow Americans, ask not what your country can do for you.
[00:00:08.000 --> 00:00:11.000]   Ask what you can do for your country.

whisper_print_timings:     load time =   186.77 ms
whisper_print_timings:      mel time =    41.27 ms
whisper_print_timings:   sample time =     1.99 ms
whisper_print_timings:   encode time =  2142.41 ms / 178.53 ms per layer
whisper_print_timings:   decode time =   308.09 ms / 25.67 ms per layer
whisper_print_timings:    total time =  2681.03 ms

Go bindings. I build the example to not add Go build time.

whisper.cpp/bindings/go$ time ./examples/go-whisper/main -model=/home/glaslos/workspace/whisper.cpp/bindings/go/models/ggml-small.en.bin samples/jfk.wav 
whisper_init_from_file: loading model from '/home/glaslos/workspace/whisper.cpp/bindings/go/models/ggml-small.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 3
whisper_model_load: adding 1607 extra tokens
whisper_model_load: mem_required  = 1044.00 MB
whisper_model_load: ggml ctx size =  464.56 MB
whisper_model_load: memory size   =   68.48 MB
whisper_model_load: model size    =  464.44 MB
Loading "samples/jfk.wav"
  ...processing "samples/jfk.wav"
[    0s->    8s]  And so, my fellow Americans, ask not what your country can do for you.
[    8s->   11s]  Ask what you can do for your country.

real    0m2.165s
user    0m22.835s
sys     0m0.343s
glaslos@desktop-home:~/workspace/whisper.cpp/bindings/go$ rm examples/go-whisper/main
glaslos@desktop-home:~/workspace/whisper.cpp/bindings/go$ go build -o examples/go-whisper/main examples/go-whisper/*.go
glaslos@desktop-home:~/workspace/whisper.cpp/bindings/go$ go build -o examples/go-whisper/main examples/go-whisper/*.go
glaslos@desktop-home:~/workspace/whisper.cpp/bindings/go$ time ./examples/go-whisper/main -model=/home/glaslos/workspace/whisper.cpp/bindings/go/models/ggml-small.en.bin samples/jfk.wav 
whisper_init_from_file: loading model from '/home/glaslos/workspace/whisper.cpp/bindings/go/models/ggml-small.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 3
whisper_model_load: adding 1607 extra tokens
whisper_model_load: mem_required  = 1044.00 MB
whisper_model_load: ggml ctx size =  464.56 MB
whisper_model_load: memory size   =   68.48 MB
whisper_model_load: model size    =  464.44 MB
Loading "samples/jfk.wav"
  ...processing "samples/jfk.wav"

[    0s->    8s]  And so, my fellow Americans, ask not what your country can do for you.
[    8s->   11s]  Ask what you can do for your country.

whisper_print_timings:     load time =   184.85 ms
whisper_print_timings:      mel time =    15.65 ms
whisper_print_timings:   sample time =     2.28 ms
whisper_print_timings:   encode time =  1251.56 ms / 104.30 ms per layer
whisper_print_timings:   decode time =   987.36 ms / 82.28 ms per layer
whisper_print_timings:    total time =  2472.28 ms

real    0m2.488s
user    0m27.309s
sys     0m0.264s

Here for the tiny model for completion:

C++

whisper_print_timings:     load time =    71.49 ms
whisper_print_timings:      mel time =    40.43 ms
whisper_print_timings:   sample time =     1.93 ms
whisper_print_timings:   encode time =   263.67 ms / 65.92 ms per layer
whisper_print_timings:   decode time =    52.08 ms / 13.02 ms per layer
whisper_print_timings:    total time =   430.05 ms

real    0m0.442s
user    0m1.206s
sys     0m0.033s

Go:

whisper_print_timings:     load time =    71.97 ms
whisper_print_timings:      mel time =    16.79 ms
whisper_print_timings:   sample time =     1.90 ms
whisper_print_timings:   encode time =   194.56 ms / 48.64 ms per layer
whisper_print_timings:   decode time =   103.43 ms / 25.86 ms per layer
whisper_print_timings:    total time =   420.56 ms

real    0m0.436s
user    0m3.148s
sys     0m0.107s

@jaybinks
Copy link
Contributor

jaybinks commented Jan 23, 2023 via email

@glaslos
Copy link
Contributor

glaslos commented Jan 28, 2023

@ilyazub can you run it again with #456 ?

@ggerganov
Copy link
Member

Reopen if issue persists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants