[RFC] ggml: new backend for API Remoting #17072
Open
+7,423
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, I would like to discuss if this work could be integrated in the
llama.cppcodebase.The API Remoting backend/frontend allow escaping the VM isolation, with the help of the
virt-gpuparavirtualization (and thevirglrendererlibrary on the host side).ggml-remotingfrontendis a GGML API implementation, which intercepts the GGML API calls and forwards them to thevirt-gpuvirtual deviceggml-remotingbackendis library loaded byvirglrenderer(PR will be opened soon for discussion), which opens a GGML library and forwards the call received fromvirglrenderer.The code is currently a POC, I will refine it after the first round of feedback.
ggml-RPC. The overall idea is the same, but the transport layer is virtualization aware, which helps limiting the buffer copies.supports_opmethod is implemented in a hacky way: I've copied theggml-metaldefinition to the frontend library, and I expose the few properties required to compute it from theggml-metalbackend. IIRC, this was only needed for the micro-benchmark to work correctly (theggml-rpcsimply returnstrueto avoid this bottleneck).Here is the context behind this PR: