sycl: addressing non-contiguous src1 mul_mats (nc and batched) #13343
+138
−125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#13308 disabled the mul_mats for non-contiguous matrices since we were producing wrong results.
This PR fixes the non-contiguous mulmats based on #13155.
I had to disable the path for
ggml_sycl_mul_mat_vec_nc
since that doesn't deal with non-contiguous src1.27aa259 broken mul_mat performance (Before disabling non-contiguous src1)
5215b91 the current performance
e317791 this PR
We have lost a little bit of performance due to disabling
ggml_sycl_mul_mat_vec_nc
, but I think it's better to rethink the whole mul_mat dispatch and refactor matrix multiplications.I'd prefer to address non-critical concerns in a different PR due to the massive performance regression we have (assuming I didn't mess up something along the way).
I am not entirely sure if the performance loss in qwen2 tg128 is noise or caused by this PR