Dhanushreegowda20

Follow

Dhanushreegowda20

Follow

0 followers · 1 following

Popular repositories Loading

MInference MInference Public

Forked from microsoft/MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python