Tags: xlite-dev/LeetCUDA
Toggle v3.0.13's commit message
fix flash-attn comments (#354 )
* Update README.md
* Update flash_attn_mma_share_kv.cu
* fix flash-attn comments
Toggle v3.0.12's commit message
add out_f32x4_shared_bcf_merge_write_row2col(2d) (#339 )
Toggle v3.0.11's commit message
Bugfix: fix a compilation error (#336 )
< < < should be <<<
Toggle v3.0.10's commit message
Toggle v3.0.9's commit message
add triton merge_attn_states zhihu blog (#320 )
Toggle v3.0.8's commit message
Toggle v3.0.7's commit message
feat: update pre-commit max-length=80 (#307 )
* feat: update pre-commit length=120
* feat: update pre-commit max-length=80
Toggle v3.0.6's commit message
Toggle v3.0.5's commit message
feat: optimize merge_attn_states thread block dispatch (#279 )
* [Kernel] opt cuda merge_attn_states kernel, part-1
* kernel: optimize merge_attn_states cuda kernel
Toggle v3.0.4's commit message
[Docs] Add vLLM + DeepSeek-R1 671B deploy blog
You can’t perform that action at this time.