Misc. bug: Llama-Quantize.exe broken on win11 since b5298 , but works on/earlier b5215 #13518

David-AU-github · 2025-05-14T01:29:39Z

Name and Version

Please note that llama-quantize.exe is failing from version b5298 (perhaps earlier) on windows 11 systems.
I have also tested: b5342 , b5361, b5371

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-quantize

Command line

./llama-quantize E:/main-du.gguf i:/llm/David_AU/testfiles/MN-Dark-Universe-MOE-4X12B-Reasoning-Q2_K.gguf Q2_K 8

Problem description & steps to reproduce

ISSUE:

Example:
./llama-quantize E:/main-du.gguf i:/llm/David_AU/testfiles/MN-Dark-Universe-MOE-4X12B-Reasoning-Q2_K.gguf Q2_K 8

(used in powershell)

Generates:

main: build = 5371 (e5c834f)
19.29.30159.0 for Windows AMD64
to 'i:/llm/David_AU/testfiles/MN-Dark-Universe-MOE-4X12B-Reasoning2-Q2_K.gguf' as Q2_K using 8 threads
loaded meta data with 34 key-value pairs and 403 tensors from E:/main-du.gguf (version GGUF V3 (latest))
Dumping metadata keys/values. Note: KV overrides do not apply in this output.
general.architecture str = llama
general.type str = model
general.name str = MN Dark Universe MOE 4X12B Reasoning
general.finetune str = Reasoning
general.basename str = MN-Dark-Universe-MOE
general.size_label str = 4X12B
llama.block_count u32 = 40
llama.context_length u32 = 1024000
llama.embedding_length u32 = 5120
llama.feed_forward_length u32 = 14336
llama.attention.head_count u32 = 32
llama.attention.head_count_kv u32 = 8
llama.rope.freq_base f32 = 1000000.000000
llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama.expert_count u32 = 4
llama.expert_used_count u32 = 4
llama.attention.key_length u32 = 128
llama.attention.value_length u32 = 128
general.file_type u32 = 32
llama.vocab_size u32 = 131072
llama.rope.dimension_count u32 = 128
general.quantization_version u32 = 2
tokenizer.ggml.model str = gpt2
tokenizer.ggml.pre str = tekken
tokenizer.ggml.tokens arr[str,131072] = ["", "~~", "~~", "<|im_start|...
tokenizer.ggml.token_type arr[i32,131072] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
tokenizer.ggml.merges arr[str,269443] = ["─á ─á", "─á t", "e r", "i n", "─á ─...
tokenizer.ggml.bos_token_id u32 = 1
tokenizer.ggml.eos_token_id u32 = 4
tokenizer.ggml.unknown_token_id u32 = 0
tokenizer.ggml.padding_token_id u32 = 1
tokenizer.ggml.add_bos_token bool = false
tokenizer.ggml.add_eos_token bool = false
tokenizer.ggml.add_space_prefix bool = false
f32: 121 tensors
282 tensors

-> DIES here , windows error:
An unhandled win32 exception occured in [5900] llama-quantize.exe

Additional details:
I am downloading the pre-made "exes" from releases.
I tried Cuda 12.4, 11.7, and X64 Cpu.
All same results.

Issue seems to be with loading the source files themselves.
Note that versions b5215 and earlier - no issues.

Also:
./llama-quantize --help
works - no issue.

First Bad Commit

b5298 (perhaps earlier)

Relevant log output

The text was updated successfully, but these errors were encountered:

David-AU-github added the bug-unconfirmed label May 14, 2025

slaren mentioned this issue May 14, 2025

llama : fix quantize with dl backends #13539

Merged

slaren closed this as completed in #13539 May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Llama-Quantize.exe broken on win11 since b5298 , but works on/earlier b5215 #13518

Misc. bug: Llama-Quantize.exe broken on win11 since b5298 , but works on/earlier b5215 #13518

David-AU-github commented May 14, 2025

Misc. bug: Llama-Quantize.exe broken on win11 since b5298 , but works on/earlier b5215 #13518

Misc. bug: Llama-Quantize.exe broken on win11 since b5298 , but works on/earlier b5215 #13518

Comments

David-AU-github commented May 14, 2025

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output