Skip to content
View tuanhe's full-sized avatar

Block or report tuanhe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 3,626 326 Updated Jun 30, 2025

Nano vLLM

Python 4,769 551 Updated Jun 27, 2025

LangGraph solution template for MCP

Python 512 96 Updated Feb 25, 2025

This is example code for a LangGraph solution that uses a custom made toolkit.

Python 6 Updated Mar 1, 2024

MCP Tools Langraph Integration

Python 44 3 Updated Mar 29, 2025

Code Transformer neural network components piece by piece

Jupyter Notebook 353 179 Updated May 1, 2023

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Python 351 46 Updated Jun 10, 2025

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 897 104 Updated Jun 26, 2025

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,487 553 Updated Jun 23, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 635 66 Updated Apr 6, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 15,646 2,241 Updated Jul 2, 2025

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 516 38 Updated Mar 27, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 57,867 8,043 Updated Jun 30, 2025

Video+code lecture on building nanoGPT from scratch

Python 4,187 641 Updated Aug 13, 2024

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

C 256 27 Updated May 30, 2025

Deep learning inference nodes for ROS / ROS2 with support for NVIDIA Jetson and TensorRT

C++ 940 261 Updated Jul 13, 2024

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 9,816 853 Updated Jun 18, 2025

LLM training in simple, raw C/CUDA

Cuda 27,028 3,108 Updated Jun 26, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 5 1 Updated Mar 5, 2024

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 23 Updated Mar 15, 2024

TinyChatEngine: On-Device LLM Inference Library

C++ 871 89 Updated Jul 4, 2024

Design pattern demo code

C++ 1,102 271 Updated Apr 17, 2024

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 19,013 2,264 Updated Jun 22, 2025

Inference Llama 2 in one file of pure C

C 18,510 2,290 Updated Aug 6, 2024

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …

Python 8,146 702 Updated Jul 2, 2025

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,118 259 Updated Jun 12, 2025

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,884 521 Updated Apr 11, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,188 289 Updated Jun 30, 2025
Next