ggml : add mrope kernel for metal #13457

ngxson · 2025-05-11T20:33:54Z

This gives x1.5 generation speed for Qwen VL models (tested on Macbook M3 Max)

master branch:

model	size	params	backend	threads	test	t/s
qwen2vl 7B Q8_0	7.54 GiB	7.62 B	Metal,BLAS	10	pp512	526.06 ± 6.63
qwen2vl 7B Q8_0	7.54 GiB	7.62 B	Metal,BLAS	10	tg128	22.83 ± 1.36

This PR:

model	size	params	backend	threads	test	t/s
qwen2vl 7B Q8_0	7.54 GiB	7.62 B	Metal,BLAS	10	pp512	604.64 ± 0.44
qwen2vl 7B Q8_0	7.54 GiB	7.62 B	Metal,BLAS	10	tg128	33.54 ± 0.04

Tested with test-backend-ops -o ROPE, result is OK

ggml : add mrope kernel for metal

82e156d

ngxson requested a review from ggerganov May 11, 2025 20:33

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 11, 2025

ggerganov approved these changes May 12, 2025

View reviewed changes

ngxson merged commit df84919 into ggml-org:master May 12, 2025
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : add mrope kernel for metal #13457

ggml : add mrope kernel for metal #13457

ngxson commented May 11, 2025

ggml : add mrope kernel for metal #13457

ggml : add mrope kernel for metal #13457

Conversation

ngxson commented May 11, 2025