Popular repositories Loading
-
CPM.cu
CPM.cu PublicForked from OpenBMB/CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…
Cuda
-
-
-
cuda_hgemm
cuda_hgemm PublicForked from Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Cuda
-
LeetCUDA
LeetCUDA PublicForked from xlite-dev/LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Cuda
-
sgl-flash-attn
sgl-flash-attn PublicForked from sgl-project/sgl-flash-attn
Fast and memory-efficient exact attention
Python
If the problem persists, check the GitHub status page or contact support.
