Skip to content

Tags: xlite-dev/LeetCUDA

Tags

v3.0.13

Toggle v3.0.13's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix flash-attn comments (#354)

* Update README.md

* Update flash_attn_mma_share_kv.cu

* fix flash-attn comments

v3.0.12

Toggle v3.0.12's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
add out_f32x4_shared_bcf_merge_write_row2col(2d) (#339)

v3.0.11

Toggle v3.0.11's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Bugfix: fix a compilation error (#336)

< < < should be <<<

v3.0.10

Toggle v3.0.10's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update FUNDING.yml

v3.0.9

Toggle v3.0.9's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
add triton merge_attn_states zhihu blog (#320)

v3.0.8

Toggle v3.0.8's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update README.md (#311)

v3.0.7

Toggle v3.0.7's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: update pre-commit max-length=80 (#307)

* feat: update pre-commit length=120

* feat: update pre-commit max-length=80

v3.0.6

Toggle v3.0.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update README.md (#299)

v3.0.5

Toggle v3.0.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: optimize merge_attn_states thread block dispatch (#279)

* [Kernel] opt cuda merge_attn_states kernel, part-1

* kernel: optimize merge_attn_states cuda kernel

v3.0.4

Toggle v3.0.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[Docs] Add vLLM + DeepSeek-R1 671B deploy blog