carryyu

lzy carryyu

16621258538

6 followers · 7 following

Achievements

sglang Public
Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python Apache License 2.0 Updated Nov 26, 2025
FastDeploy Public
Forked from PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

Python Apache License 2.0 Updated Nov 13, 2025
vllm Public
Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python Apache License 2.0 Updated Oct 14, 2025
PaddleNLP Public
Forked from PaddlePaddle/PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search…

Python Apache License 2.0 Updated Oct 14, 2025
Paddle Public
Forked from PaddlePaddle/Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

C++ Apache License 2.0 Updated Sep 18, 2025
cutlass Public
Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++ Other Updated Jul 16, 2025
flashinfer Public
Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda Apache License 2.0 Updated Jul 16, 2025
AI-4K Public

Python 1 Updated Jul 8, 2025
images Public

Updated Jul 8, 2025
DeepEP Public
Forked from deepseek-ai/DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda MIT License Updated Jul 3, 2025
ERNIE Public
Forked from PaddlePaddle/ERNIE

The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

Python Apache License 2.0 Updated Jun 30, 2025
CUDA-PPT Public
Forked from MARD1NO/CUDA-PPT

Apache License 2.0 Updated May 29, 2025
DeepGEMM Public
Forked from deepseek-ai/DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python MIT License Updated May 29, 2025
FlashMLA Public
Forked from deepseek-ai/FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda MIT License Updated Apr 29, 2025
pplx-kernels Public
Forked from perplexityai/pplx-kernels

Perplexity GPU Kernels

C++ MIT License Updated Apr 28, 2025
QQQ Public
Forked from HandH1998/QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python Updated Apr 7, 2025
DualPipe Public
Forked from deepseek-ai/DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python MIT License Updated Feb 28, 2025
libfabric-efa-demo Public
Forked from abcdabcd987/libfabric-efa-demo

C++ 1 Updated Jan 5, 2025
vattention Public
Forked from microsoft/vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C MIT License Updated Dec 6, 2024
tiny-flash-attention Public
Forked from 66RING/tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda Updated Nov 18, 2024
flash-attention Public
Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python BSD 3-Clause "New" or "Revised" License Updated Oct 28, 2024
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ Apache License 2.0 Updated Sep 26, 2024
mlc-llm Public
Forked from mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python Apache License 2.0 Updated Sep 23, 2024
marlin Public
Forked from IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python Apache License 2.0 Updated Sep 4, 2024
Nanoflow Public
Forked from efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda Apache License 2.0 Updated Sep 2, 2024
fast-hadamard-transform Public
Forked from Dao-AILab/fast-hadamard-transform

Fast Hadamard transform in CUDA, with a PyTorch interface

C BSD 3-Clause "New" or "Revised" License Updated May 24, 2024
Paddle-Inference-Demo Public
Forked from PaddlePaddle/Paddle-Inference-Demo

C++ Apache License 2.0 Updated Mar 14, 2024
cute-gemm Public
Forked from reed-lau/cute-gemm

C++ Updated Feb 29, 2024
how-to-optim-algorithm-in-cuda Public
Forked from BBuf/how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda Updated Jan 27, 2024
stable-diffusion-webui Public
Forked from AUTOMATIC1111/stable-diffusion-webui

Stable Diffusion web UI

Python GNU Affero General Public License v3.0 Updated Dec 19, 2023

lzy carryyu

Achievements

Achievements

sglang Public

Uh oh!

FastDeploy Public

Uh oh!

vllm Public

Uh oh!

PaddleNLP Public

Uh oh!

Paddle Public

Uh oh!

cutlass Public

Uh oh!

flashinfer Public

Uh oh!

AI-4K Public

Uh oh!

images Public

Uh oh!

DeepEP Public

Uh oh!

ERNIE Public

Uh oh!

CUDA-PPT Public

Uh oh!

DeepGEMM Public

Uh oh!

FlashMLA Public

Uh oh!

pplx-kernels Public

Uh oh!

QQQ Public

Uh oh!

DualPipe Public

Uh oh!

libfabric-efa-demo Public

Uh oh!

vattention Public

Uh oh!

tiny-flash-attention Public

Uh oh!

flash-attention Public

Uh oh!

TensorRT-LLM Public

Uh oh!

mlc-llm Public

Uh oh!

marlin Public

Uh oh!

Nanoflow Public

Uh oh!

fast-hadamard-transform Public

Uh oh!

Paddle-Inference-Demo Public

Uh oh!

cute-gemm Public

Uh oh!

how-to-optim-algorithm-in-cuda Public

Uh oh!

stable-diffusion-webui Public

Uh oh!