kv-cache : improve defrag logic #13497

ggerganov · 2025-05-13T08:09:25Z

Following the optimization in #13493, I realized that the defragmentation can become much better so that it can further improve the Flash Attention masking.

Currently we defrag the following cache like this:

# before defrag
00000000...11111.......2222222....2010212012012....

# after defrag
000000001111122222222010212012012..................

I.e. we only "fill" the holes, but the sequences remain scattered. We can do better like this:

# new defrag
000000000000111111111222222222222..................

By doing so, the FA-vec masking logic will remain effective even after many generations.

The text was updated successfully, but these errors were encountered:

ggerganov added enhancement New feature or request performance Speed related topics roadmap Part of a roadmap project labels May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv-cache : improve defrag logic #13497

kv-cache : improve defrag logic #13497

ggerganov commented May 13, 2025

kv-cache : improve defrag logic #13497

kv-cache : improve defrag logic #13497

Comments

ggerganov commented May 13, 2025