Skip to content

metal : reuse graphs #14570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: gg/llama-reuse-graphs
Choose a base branch
from
Draft

Conversation

ggerganov
Copy link
Member

PoC for #14482 (comment)

This is a hacky implementation of the idea in the comment. It works but it does not lead to any measurable improvement. The reason is that we already do the CPU/GPU overlap by submitting the first 128 graph nodes and while their are computing, we prepare and submit the rest of the graph. This is already enough to completely mask the CPU overhead of constructing the Metal graph, so there is no point in adding logic to reuse a previous Metal graph.

@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Jul 7, 2025
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jul 7, 2025
@ggerganov ggerganov mentioned this pull request Jul 7, 2025
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) demo Demonstrate some concept or idea, not intended to be merged ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant