wzpsgit

wzpsgit

2 followers · 6 following

Lists (10)

Sort

Stars

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,387 111 Updated Jul 10, 2025

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 1,963 150 Updated Jul 8, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,841 177 Updated Jul 9, 2025

MetaX-MACA / FlashMLA

Forked from deepseek-ai/FlashMLA

Fast and efficient attention method exploration and implementation.

C++ 21 4 Updated Mar 25, 2025

PacktPublishing / LLM-Engineers-Handbook

The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices

Python 3,651 805 Updated Mar 8, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,265 843 Updated Jul 10, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,858 280 Updated May 15, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,641 876 Updated Apr 29, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,531 1,035 Updated Jul 1, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,332 372 Updated Jul 9, 2025

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 620 109 Updated Jul 10, 2025

FlagOpen / FlagScale

FlagScale is a large model toolkit based on open-sourced projects.

Python 322 85 Updated Jul 10, 2025

unslothai / unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 41,771 3,330 Updated Jul 9, 2025

deepseek-ai / DeepSeek-V3

Python 98,154 15,977 Updated Jun 27, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 15,869 2,306 Updated Jul 10, 2025

NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook 14,385 3,353 Updated Aug 12, 2024