You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the description of Q4_K:
“4-bit quantization (q). Super-blocks with 8 blocks, each block has 32 weights. Weight formula: w = q * block_scale(6-bit) + block_min(6-bit), resulting in 4.5 bits-per-weight.”
My questions is what is the dimension is the quantization is applied to and what granularity?
So for example, we have the activation X of shape [m,k] and weight W of shape [k,n] where k dimension is the GEMM accumulation dimension for X*W
So do we have block scales of shape [k//32,n] or [k,n//32] or [k//32,n//32] for the weight W?
And whether the block_min has the same shape as the block_scale?
@ggerganov I was wondering if you could comment on this?
Many thanks
The text was updated successfully, but these errors were encountered:
Since the weight is quantized by W = Q * block_scale + block_min,
so when dequantized during matrix multiplication & accumulation (MMA) shouldn't the MMA result (y=x*Q) contain a bias addition as y_deq=y*block_scale+x*block_min.
However, from the 'dequantize_row_q4_K' function in ggml-quants.cc, the code for (int l = 0; l < 32; ++l) *y++ = d1 * (q[l] & 0xF) - m1;
do not seem to contain x*block_min as it should as shown above. Therefore, I was wondering if you could help explaining the MMA dequantization process that has been implemented in llama.cpp?
From the description of Q4_K:
“4-bit quantization (q). Super-blocks with 8 blocks, each block has 32 weights. Weight formula: w = q * block_scale(6-bit) + block_min(6-bit), resulting in 4.5 bits-per-weight.”
My questions is what is the dimension is the quantization is applied to and what granularity?
So for example, we have the activation X of shape [m,k] and weight W of shape [k,n] where k dimension is the GEMM accumulation dimension for X*W
@ggerganov I was wondering if you could comment on this?
Many thanks
The text was updated successfully, but these errors were encountered: