Skip to content

[Executorch][llm] Add ring buffer based kv cache and mask calculation to MHA #10609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 13, 2025
Prev Previous commit
Next Next commit
Update on "[Executorch][llm] Add ring buffer based kv cache and mask …
…calculation to MHA"

Leveraging previous work now we allow MHA to have ring buffer cache. If ring buffer cache is used
then we query the mask from kv cache and use that for sdpa instead of using precalculated mask.

In this process we had to adjsut ring buffer implementation to allow keeping the context of
full sliding window. See code for comment.

Differential Revision: [D73891425](https://our.internmc.facebook.com/intern/diff/D73891425/)

[ghstack-poisoned]
  • Loading branch information
kimishpatel committed May 9, 2025
commit 38b2261bf91ee1b819d1d3f0a13059401804dbd7

This merge commit was added into this branch cleanly.

There are no new changes to show, but you can still view the diff.