Question About use_adreno_kernels Threshold for Q4 MatMul on Adreno 750 #17733

forforever73 · 2025-12-03T11:56:46Z

forforever73
Dec 3, 2025

@lhez Sorry for taking your time, I’m running a new model on an Adreno 750 GPU and noticed for Q4 weights, using the optimized kernel CL_mul_mat_Ab_Bi_8x4 seems to require that use_adreno_kernels() returns true. However, in my model there are several matmul shapes like:
A: [256, 1280, 1, 1]
B: [256, 512, 1, 1]
→ Output: [1280, 512, 1, 1]
So the kernel falls back to kernel_mul_mat_q4_0_f32_1d_8x_flat. This fallback kernel is about 10× slower on Adreno 750. I experimented by modifying the internal threshold
int64_t threshold_ne0 = 256;
After lowering the threshold, the Adreno kernels are used, performance improves dramatically, and the model’s PPL shows no meaningful change.
So what was the original reasoning behind the use_adreno_kernels() threshold? If I reduce the threshold to 256, is there any potential risk I should be aware of?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question About use_adreno_kernels Threshold for Q4 MatMul on Adreno 750 #17733

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Question About use_adreno_kernels Threshold for Q4 MatMul on Adreno 750 #17733

Uh oh!

forforever73 Dec 3, 2025

Replies: 0 comments

forforever73
Dec 3, 2025