Skip to content

Conversation

@SavicStefan
Copy link
Contributor

For q2_K, added the ACC_TYPE_VEC2 implementation, as seen in PR #16203.

Before (master) NVIDIA GeForce RTX 4060 Ti:
MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1): 190 runs - 5292.84 us/run - 60.13 GFLOP/run - 11.36 TFLOPS
After (PR) NVIDIA GeForce RTX 4060 Ti:
MUL_MAT(type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1):                258 runs -  3887.86 us/run -  60.13 GFLOP/run -  15.47 TFLOPS

Which is around +26% peformance increase on us/run.
I also need to try for other types.

@SavicStefan SavicStefan requested a review from 0cc4m as a code owner November 10, 2025 14:33
@SavicStefan SavicStefan changed the title Vulkan: add q2_K implementation in mul_mmq with ACC_TYPE_VEC2 vulkan: add q2_K implementation in mul_mmq with ACC_TYPE_VEC2 Nov 10, 2025
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant