Junwen-Zhang

Follow

Junwen-Zhang

Follow

1 follower · 7 following

Chongqing University
Shaoxing, Zhejiang, China

Stars

exo-lang / exo

Exocompilation for productive programming of hardware accelerators

Python 679 49 Updated Nov 1, 2025

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 2,160 449 Updated Nov 10, 2025

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,087 733 Updated Oct 31, 2025

caoshiyi / artifacts

29 4 Updated Nov 28, 2024

RRZE-HPC / likwid

Performance monitoring and benchmarking suite

C 1,847 251 Updated Nov 10, 2025

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Python 1,988 307 Updated Nov 7, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 17,521 2,378 Updated Nov 10, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 20,057 3,325 Updated Nov 11, 2025

deepspeedai / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 2,074 187 Updated Jun 30, 2025

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 18,924 2,402 Updated Aug 6, 2024

microsoft / BitNet

Official inference framework for 1-bit LLMs

Python 24,389 1,891 Updated Jun 3, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,652 4,616 Updated Nov 8, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,680 11,165 Updated Nov 11, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,245 422 Updated Nov 10, 2025

deepspeedai / DeepSpeedExamples

Example models using DeepSpeed

Python 6,713 1,108 Updated Oct 15, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,383 450 Updated Aug 2, 2025

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,377 583 Updated Oct 28, 2024

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 723 81 Updated Apr 6, 2025

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 8,418 2,176 Updated Sep 5, 2025

amd / blis

Forked from flame/blis

BLAS-like Library Instantiation Software Framework

C 155 46 Updated Oct 27, 2025

giaf / blasfeo

Basic linear algebra subroutines for embedded optimization

Assembly 384 96 Updated Sep 24, 2025

shadowpa0327 / Palu

[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection

Python 145 9 Updated Feb 20, 2025

tonyzhang617 / nomad-dist

C++ 39 1 Updated Mar 14, 2024

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,558 1,125 Updated Nov 10, 2025

lambda7xx / awesome-AI-system

paper and its code for AI System

334 23 Updated Aug 15, 2025

state-spaces / mamba

Mamba SSM architecture

Python 16,382 1,485 Updated Oct 10, 2025

intel / xFasterTransformer

C++ 431 76 Updated Sep 18, 2025

microsoft / FractalTensor

FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of lists of statically-shaped tensors, referred to as a Fractal…

Python 29 4 Updated Dec 21, 2024

MatthewsResearchGroup / tblis

TBLIS is a library and framework for performing tensor operations, especially tensor contraction, using efficient native algorithms.

C++ 134 36 Updated Oct 1, 2025

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,364 31,107 Updated Nov 10, 2025