Skip to content

Conversation

@ggerganov
Copy link
Member

Change llama_pos from int32_t to float

This change might seem unnecessary at first as we are used to think about token positions as integers, but technically nothing prevents these to be floats. Also, I'm having some ideas for KV cache compression / context extension tricks and having float positions could turn out to be useful.

Still contemplating if we should merge this, so for now just a draft

@ngxson
Copy link
Collaborator

ngxson commented Feb 23, 2024

+1 For this, I'm wondering if it helps simplifying the code of group attention (self-extend)

@ggerganov
Copy link
Member Author

Not sure if it will become simpler, but one of the things I want to investigate is to apply floating-point division in llama_kv_cache_seq_div() instead of the current integer division. Intuitively, I expect to improve the recall quality

The other idea I want to explore is to merge KV cells into one another via averaging both of the positions and the KV values. Wondering if this can be applied to compress the KV cache data into fewer cells

@mofosyne mofosyne added refactoring Refactoring Review Complexity : High Generally require indepth knowledge of LLMs or GPUs labels May 10, 2024
@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

demo Demonstrate some concept or idea, not intended to be merged refactoring Refactoring Review Complexity : High Generally require indepth knowledge of LLMs or GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants