Skip to content

Question regarding the quantization dimension of the weight such as Q4_K format #13377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TheTinyTeddy opened this issue May 8, 2025 · 3 comments

Comments

@TheTinyTeddy
Copy link

From the description of Q4_K:
“4-bit quantization (q). Super-blocks with 8 blocks, each block has 32 weights. Weight formula: w = q * block_scale(6-bit) + block_min(6-bit), resulting in 4.5 bits-per-weight.”

My questions is what is the dimension is the quantization is applied to and what granularity?

So for example, we have the activation X of shape [m,k] and weight W of shape [k,n] where k dimension is the GEMM accumulation dimension for X*W

  1. So do we have block scales of shape [k//32,n] or [k,n//32] or [k//32,n//32] for the weight W?
  2. And whether the block_min has the same shape as the block_scale?

@ggerganov I was wondering if you could comment on this?

Many thanks

@ggerganov
Copy link
Member

The quantization is done along the k dimension.

@TheTinyTeddy
Copy link
Author

The quantization is done along the k dimension.

Thank you for the clarification!

Since the weight is quantized by
W = Q * block_scale + block_min,
so when dequantized during matrix multiplication & accumulation (MMA) shouldn't the MMA result (y=x*Q) contain a bias addition as
y_deq=y*block_scale+x*block_min.
However, from the 'dequantize_row_q4_K' function in ggml-quants.cc, the code
for (int l = 0; l < 32; ++l) *y++ = d1 * (q[l] & 0xF) - m1;
do not seem to contain x*block_min as it should as shown above. Therefore, I was wondering if you could help explaining the MMA dequantization process that has been implemented in llama.cpp?

Many thanks!

@TheTinyTeddy
Copy link
Author

Discussion continue here: #13507

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants