-
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedNov 26, 2025 -
FastDeploy Public
Forked from PaddlePaddle/FastDeployHigh-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
Python Apache License 2.0 UpdatedNov 13, 2025 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedOct 14, 2025 -
PaddleNLP Public
Forked from PaddlePaddle/PaddleNLP👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search…
Python Apache License 2.0 UpdatedOct 14, 2025 -
Paddle Public
Forked from PaddlePaddle/PaddlePArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
C++ Apache License 2.0 UpdatedSep 18, 2025 -
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedJul 16, 2025 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedJul 16, 2025 -
-
-
DeepEP Public
Forked from deepseek-ai/DeepEPDeepEP: an efficient expert-parallel communication library
Cuda MIT License UpdatedJul 3, 2025 -
ERNIE Public
Forked from PaddlePaddle/ERNIEThe official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.
Python Apache License 2.0 UpdatedJun 30, 2025 -
-
DeepGEMM Public
Forked from deepseek-ai/DeepGEMMDeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Python MIT License UpdatedMay 29, 2025 -
FlashMLA Public
Forked from deepseek-ai/FlashMLAFlashMLA: Efficient MLA decoding kernels
Cuda MIT License UpdatedApr 29, 2025 -
pplx-kernels Public
Forked from perplexityai/pplx-kernelsPerplexity GPU Kernels
C++ MIT License UpdatedApr 28, 2025 -
QQQ Public
Forked from HandH1998/QQQQQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
Python UpdatedApr 7, 2025 -
DualPipe Public
Forked from deepseek-ai/DualPipeA bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Python MIT License UpdatedFeb 28, 2025 -
-
vattention Public
Forked from microsoft/vattentionDynamic Memory Management for Serving LLMs without PagedAttention
C MIT License UpdatedDec 6, 2024 -
tiny-flash-attention Public
Forked from 66RING/tiny-flash-attentionflash attention tutorial written in python, triton, cuda, cutlass
Cuda UpdatedNov 18, 2024 -
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
Python BSD 3-Clause "New" or "Revised" License UpdatedOct 28, 2024 -
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
C++ Apache License 2.0 UpdatedSep 26, 2024 -
mlc-llm Public
Forked from mlc-ai/mlc-llmUniversal LLM Deployment Engine with ML Compilation
Python Apache License 2.0 UpdatedSep 23, 2024 -
marlin Public
Forked from IST-DASLab/marlinFP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Python Apache License 2.0 UpdatedSep 4, 2024 -
Nanoflow Public
Forked from efeslab/NanoflowA throughput-oriented high-performance serving framework for LLMs
Cuda Apache License 2.0 UpdatedSep 2, 2024 -
fast-hadamard-transform Public
Forked from Dao-AILab/fast-hadamard-transformFast Hadamard transform in CUDA, with a PyTorch interface
C BSD 3-Clause "New" or "Revised" License UpdatedMay 24, 2024 -
Paddle-Inference-Demo Public
Forked from PaddlePaddle/Paddle-Inference-DemoC++ Apache License 2.0 UpdatedMar 14, 2024 -
-
how-to-optim-algorithm-in-cuda Public
Forked from BBuf/how-to-optim-algorithm-in-cudahow to optimize some algorithm in cuda.
Cuda UpdatedJan 27, 2024 -
stable-diffusion-webui Public
Forked from AUTOMATIC1111/stable-diffusion-webuiStable Diffusion web UI
Python GNU Affero General Public License v3.0 UpdatedDec 19, 2023



