Skip to content
View Junwen-Zhang's full-sized avatar
  • Chongqing University
  • Shaoxing, Zhejiang, China

Block or report Junwen-Zhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Exocompilation for productive programming of hardware accelerators

Python 679 49 Updated Nov 1, 2025

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 2,160 449 Updated Nov 10, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,087 733 Updated Oct 31, 2025

Performance monitoring and benchmarking suite

C 1,847 251 Updated Nov 10, 2025

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Python 1,988 307 Updated Nov 7, 2025

Development repository for the Triton language and compiler

MLIR 17,521 2,378 Updated Nov 10, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 20,057 3,325 Updated Nov 11, 2025

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 2,074 187 Updated Jun 30, 2025

Inference Llama 2 in one file of pure C

C 18,924 2,402 Updated Aug 6, 2024

Official inference framework for 1-bit LLMs

Python 24,389 1,891 Updated Jun 3, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,652 4,616 Updated Nov 8, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,680 11,165 Updated Nov 11, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,245 422 Updated Nov 10, 2025

Example models using DeepSpeed

Python 6,713 1,108 Updated Oct 15, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,383 450 Updated Aug 2, 2025

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,377 583 Updated Oct 28, 2024

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 723 81 Updated Apr 6, 2025

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 8,418 2,176 Updated Sep 5, 2025

BLAS-like Library Instantiation Software Framework

C 155 46 Updated Oct 27, 2025

Basic linear algebra subroutines for embedded optimization

Assembly 384 96 Updated Sep 24, 2025

[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection

Python 145 9 Updated Feb 20, 2025
C++ 39 1 Updated Mar 14, 2024

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,558 1,125 Updated Nov 10, 2025

paper and its code for AI System

334 23 Updated Aug 15, 2025

Mamba SSM architecture

Python 16,382 1,485 Updated Oct 10, 2025

FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of lists of statically-shaped tensors, referred to as a Fractal…

Python 29 4 Updated Dec 21, 2024

TBLIS is a library and framework for performing tensor operations, especially tensor contraction, using efficient native algorithms.

C++ 134 36 Updated Oct 1, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,364 31,107 Updated Nov 10, 2025
Next