Golang bindings are 45 slower than the original C++ binary

Golang bindings are 45 times slower than the C++ binary when transcoding `samples/jfk.wav` using the `ggml-tiny.en.bin` model.

|C++ binary|Golang bindings|
| --- | --- |
|1.428s|63.919s|

C++ and Golang examples are compiled following the readme. I haven't profiled the Golang bindings yet.

### Raw results

#### C++ binary

```bash
time ./main -m ./models/ggml-tiny.en.bin -f ./samples/jfk.wav
whisper_init_from_file: loading model from './models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: mem required  =  387.00 MB (+    3.00 MB per decoder)
whisper_model_load: kv self size  =    2.62 MB
whisper_model_load: kv cross size =    8.79 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.58 MB
whisper_model_load: model size    =   73.54 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:07.740]   And so my fellow Americans ask not what your country can do for you
[00:00:07.740 --> 00:00:10.740]   ask what you can do for your country


whisper_print_timings:     load time =   186.77 ms
whisper_print_timings:      mel time =    83.82 ms
whisper_print_timings:   sample time =    16.98 ms
whisper_print_timings:   encode time =   791.23 ms / 197.81 ms per layer
whisper_print_timings:   decode time =   289.46 ms / 72.36 ms per layer
whisper_print_timings:    total time =  1374.19 ms

real    0m1.428s
user    0m4.208s
sys 0m0.280s
```

#### Golang bindings

```bash
time ./build/go-whisper -model ./models/ggml-tiny.en.bin samples/jfk.wav
whisper_init_from_file: loading model from './models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: mem required  =  387.00 MB (+    3.00 MB per decoder)
whisper_model_load: kv self size  =    2.62 MB
whisper_model_load: kv cross size =    8.79 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.58 MB
whisper_model_load: model size    =   73.54 MB
Loading "./samples/jfk.wav"
  ...processing "./samples/jfk.wav"
[    0s-> 7.74s]  And so my fellow Americans ask not what your country can do for you
[ 7.74s->10.74s]  ask what you can do for your country

real    1m3.919s
user    4m7.851s
sys 0m6.771s
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Golang bindings are 45 slower than the original C++ binary #421

Raw results

C++ binary

Golang bindings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Golang bindings are 45 slower than the original C++ binary #421

Description

Raw results

C++ binary

Golang bindings

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions