metal : make the FA extra sizes consistent #17143

ggerganov · 2025-11-10T12:19:50Z

While looking into #17033 (comment) found that the warmup in llama-batched-bench trashes the worst-case graph allocation using the Metal backend, causing extra graph allocations later on. The reason is because the extra FA fleeting memory for the small warmup batch ends up being larger than the memory for the worst case estimate.

Fix by making the extra size more correlated with the input shapes.

make -j && ./bin/llama-batched-bench -m ../models/qwen2.5-3b-coder/ggml-model-q8_0.gguf -c 150792 -npp 8192 -ntg 32 -npl 1,2,4,8,16 -kvu -tgs --no-mmap

master

main: n_kv_max = 151040, n_batch = 2048, n_ubatch = 512, flash_attn = -1, is_pp_shared = 0, is_tg_separate = 1, n_gpu_layers = -1, n_threads = 16, n_threads_batch = 16

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
8192	32	1	8224	3.406	2405.30	0.289	110.58	3.695	2225.58
8192	32	2	16448	6.843	2394.42	0.590	108.46	7.433	2212.94
8192	32	4	32896	14.106	2322.90	1.238	103.37	15.345	2143.80
8192	32	8	65792	29.769	2201.46	2.767	92.53	32.536	2022.12
8192	32	16	131584	65.512	2000.74	6.663	76.84	72.175	1823.13

PR

main: n_kv_max = 151040, n_batch = 2048, n_ubatch = 512, flash_attn = -1, is_pp_shared = 0, is_tg_separate = 1, n_gpu_layers = -1, n_threads = 16, n_threads_batch = 16

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
8192	32	1	8224	3.287	2491.90	0.289	110.62	3.577	2299.30
8192	32	2	16448	6.646	2465.21	0.589	108.59	7.235	2273.25
8192	32	4	32896	13.581	2412.81	1.240	103.19	14.821	2219.51
8192	32	8	65792	28.191	2324.72	2.751	93.04	30.942	2126.28
8192	32	16	131584	60.293	2173.91	6.633	77.19	66.927	1966.09

metal : make the FA extra sizes consistent

b8aa540

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Nov 10, 2025

DajanaV mentioned this pull request Nov 10, 2025

UPSTREAM PR #17143: metal : make the FA extra sizes consistent auroralabs-loci/llama.cpp#157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : make the FA extra sizes consistent #17143

metal : make the FA extra sizes consistent #17143

ggerganov commented Nov 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

metal : make the FA extra sizes consistent #17143

Are you sure you want to change the base?

metal : make the FA extra sizes consistent #17143

Conversation

ggerganov commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

master

PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented Nov 10, 2025 •

edited

Loading