Skip to content

Misc. bug: Llama-Quantize.exe broken on win11 since b5298 , but works on/earlier b5215 #13518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
David-AU-github opened this issue May 14, 2025 · 0 comments · Fixed by #13539
Closed

Comments

@David-AU-github
Copy link

Name and Version

Please note that llama-quantize.exe is failing from version b5298 (perhaps earlier) on windows 11 systems.
I have also tested: b5342 , b5361, b5371

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-quantize

Command line

./llama-quantize E:/main-du.gguf i:/llm/David_AU/testfiles/MN-Dark-Universe-MOE-4X12B-Reasoning-Q2_K.gguf Q2_K 8

Problem description & steps to reproduce

ISSUE:

Example:
./llama-quantize E:/main-du.gguf i:/llm/David_AU/testfiles/MN-Dark-Universe-MOE-4X12B-Reasoning-Q2_K.gguf Q2_K 8

(used in powershell)

Generates:

main: build = 5371 (e5c834f)
main: built with MSVC 19.29.30159.0 for Windows AMD64
main: quantizing 'E:/main-du.gguf' to 'i:/llm/David_AU/testfiles/MN-Dark-Universe-MOE-4X12B-Reasoning2-Q2_K.gguf' as Q2_K using 8 threads
llama_model_loader: loaded meta data with 34 key-value pairs and 403 tensors from E:/main-du.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = MN Dark Universe MOE 4X12B Reasoning
llama_model_loader: - kv 3: general.finetune str = Reasoning
llama_model_loader: - kv 4: general.basename str = MN-Dark-Universe-MOE
llama_model_loader: - kv 5: general.size_label str = 4X12B
llama_model_loader: - kv 6: llama.block_count u32 = 40
llama_model_loader: - kv 7: llama.context_length u32 = 1024000
llama_model_loader: - kv 8: llama.embedding_length u32 = 5120
llama_model_loader: - kv 9: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 10: llama.attention.head_count u32 = 32
llama_model_loader: - kv 11: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 12: llama.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 13: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 14: llama.expert_count u32 = 4
llama_model_loader: - kv 15: llama.expert_used_count u32 = 4
llama_model_loader: - kv 16: llama.attention.key_length u32 = 128
llama_model_loader: - kv 17: llama.attention.value_length u32 = 128
llama_model_loader: - kv 18: general.file_type u32 = 32
llama_model_loader: - kv 19: llama.vocab_size u32 = 131072
llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 23: tokenizer.ggml.pre str = tekken
llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,131072] = ["", "", "", "<|im_start|...
llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,131072] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,269443] = ["─á ─á", "─á t", "e r", "i n", "─á ─...
llama_model_loader: - kv 27: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 4
llama_model_loader: - kv 29: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 30: tokenizer.ggml.padding_token_id u32 = 1
llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 32: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 33: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - type f32: 121 tensors
llama_model_loader: - type bf16: 282 tensors

-> DIES here , windows error:
An unhandled win32 exception occured in [5900] llama-quantize.exe

Additional details:
I am downloading the pre-made "exes" from releases.
I tried Cuda 12.4, 11.7, and X64 Cpu.
All same results.

Issue seems to be with loading the source files themselves.
Note that versions b5215 and earlier - no issues.

Also:
./llama-quantize --help
works - no issue.

First Bad Commit

b5298 (perhaps earlier)

Relevant log output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant