Question regarding the quantization dimension of the weight such as Q4_K format #13377

TheTinyTeddy · 2025-05-08T07:37:29Z

From the description of Q4_K:
“4-bit quantization (q). Super-blocks with 8 blocks, each block has 32 weights. Weight formula: w = q * block_scale(6-bit) + block_min(6-bit), resulting in 4.5 bits-per-weight.”

My questions is what is the dimension is the quantization is applied to and what granularity?

So for example, we have the activation X of shape [m,k] and weight W of shape [k,n] where k dimension is the GEMM accumulation dimension for X*W

So do we have block scales of shape [k//32,n] or [k,n//32] or [k//32,n//32] for the weight W?
And whether the block_min has the same shape as the block_scale?

@ggerganov I was wondering if you could comment on this?

Many thanks

ggerganov · 2025-05-08T11:42:56Z

The quantization is done along the k dimension.

TheTinyTeddy · 2025-05-12T02:24:18Z

The quantization is done along the k dimension.

Thank you for the clarification!

Since the weight is quantized by
W = Q * block_scale + block_min,
so when dequantized during matrix multiplication & accumulation (MMA) shouldn't the MMA result (y=x*Q) contain a bias addition as
y_deq=y*block_scale+x*block_min.
However, from the 'dequantize_row_q4_K' function in ggml-quants.cc, the code
for (int l = 0; l < 32; ++l) *y++ = d1 * (q[l] & 0xF) - m1;
do not seem to contain x*block_min as it should as shown above. Therefore, I was wondering if you could help explaining the MMA dequantization process that has been implemented in llama.cpp?

Many thanks!

TheTinyTeddy · 2025-05-14T09:11:31Z

Discussion continue here: #13507

TheTinyTeddy closed this as completed May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding the quantization dimension of the weight such as Q4_K format #13377

Question regarding the quantization dimension of the weight such as Q4_K format #13377

TheTinyTeddy commented May 8, 2025

ggerganov commented May 8, 2025

TheTinyTeddy commented May 12, 2025

TheTinyTeddy commented May 14, 2025

Question regarding the quantization dimension of the weight such as Q4_K format #13377

Question regarding the quantization dimension of the weight such as Q4_K format #13377

Comments

TheTinyTeddy commented May 8, 2025

ggerganov commented May 8, 2025

TheTinyTeddy commented May 12, 2025

TheTinyTeddy commented May 14, 2025