Skip to content

Eval bug: Segmentation fault when using llama-quantize #13380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lifelongeeek opened this issue May 8, 2025 · 0 comments · Fixed by #13539
Closed

Eval bug: Segmentation fault when using llama-quantize #13380

lifelongeeek opened this issue May 8, 2025 · 0 comments · Fixed by #13539

Comments

@lifelongeeek
Copy link

Name and Version

$./llama-cli --version
load_backend: loaded CPU backend from /app/libggml-cpu-icelake.so
version: 5280 (27aa259)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CPU

Hardware

8xRTX3090

Models

Meta Llama-3.2-1B-Instruct-16.gguf

Problem description & steps to reproduce

I tried to quantize F16.gguf llama-3.2-1b models into Q4_K_M. However, I encounter the segmentation fault error. Could you advise how to fix this error?

$:/app# ./llama-quantize models/Llama-3.2-1B-Instruct/Llama-3.2-1B-Instruct-F16.gguf models/Llama-3.2-1B-Instruct/gguf_quantized/ggml-model-Q4_K_M.gguf Q4_K_M```

### First Bad Commit

_No response_

### Relevant log output

```shell
main: build = 5280 (27aa2595)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing 'model/Llama-3.2-1B-Instruct/Llama-3.2-1B-Instruct-F16.gguf' to 'model/Llama-3.2-1B-Instruct/gguf_quantized/ggml-model-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 31 key-value pairs and 147 tensors from model/Llama-3.2-1B-Instruct/Llama-3.2-1B-Instruct-F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 1B
llama_model_loader: - kv   6:                            general.license str              = llama3.2
llama_model_loader: - kv   7:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   8:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   9:                          llama.block_count u32              = 16
llama_model_loader: - kv  10:                       llama.context_length u32              = 131072
llama_model_loader: - kv  11:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv  12:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  13:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  14:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  15:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  16:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:                 llama.attention.key_length u32              = 64
llama_model_loader: - kv  18:               llama.attention.value_length u32              = 64
llama_model_loader: - kv  19:                          general.file_type u32              = 1
llama_model_loader: - kv  20:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  21:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  30:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - type  f32:   34 tensors
llama_model_loader: - type  f16:  113 tensors
Segmentation fault (core dumped)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant