Weilun-Hub

Follow

Weilun-Hub

Follow

Achievements

Achievements

Popular repositories Loading

CPM.cu CPM.cu Public

Forked from OpenBMB/CPM.cu

CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…

Cuda
mlx mlx Public

Forked from ml-explore/mlx

MLX: An array framework for Apple silicon

C++
mlx-lm mlx-lm Public

Forked from ml-explore/mlx-lm

Run LLMs with MLX

Python
cuda_hgemm cuda_hgemm Public

Forked from Bruce-Lee-LY/cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda
LeetCUDA LeetCUDA Public

Forked from xlite-dev/LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda
sgl-flash-attn sgl-flash-attn Public

Forked from sgl-project/sgl-flash-attn

Fast and memory-efficient exact attention

Python