DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

C 256 27 Updated May 30, 2025

dusty-nv / ros_deep_learning

Deep learning inference nodes for ROS / ROS2 with support for NVIDIA Jetson and TensorRT

C++ 940 261 Updated Jul 13, 2024

QwenLM / Qwen-Agent

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 9,816 853 Updated Jun 18, 2025

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 27,028 3,108 Updated Jun 26, 2025

haileyschoelkopf / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 5 1 Updated Mar 5, 2024

Adlik / smoothquantplus

Forked from mit-han-lab/smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 23 Updated Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tuanhe

Achievements

Achievements

Block or report tuanhe

Stars

huggingface / nanoVLM

GeeeekExplorer / nano-vllm

esxr / langgraph-mcp

jakenolan / langgraph-custom-tools

paulrobello / mcp_langgraph_tools

ajhalthor / Transformer-Neural-Network

AI-Hypercomputer / JetStream

zhihu / ZhiLight

comfyanonymous / ComfyUI_TensorRT

google / gemma.cpp

LLMServe / DistServe

sgl-project / sglang

intel / xFasterTransformer

mit-han-lab / deepcompressor

rasbt / LLMs-from-scratch

karpathy / build-nanogpt

modelscope / dash-infer

dusty-nv / ros_deep_learning

QwenLM / Qwen-Agent

karpathy / llm.c

haileyschoelkopf / vllm

Adlik / smoothquantplus

mit-han-lab / TinyChatEngine

FengJungle / DesignPattern

liguodongiot / llm-action

karpathy / llama2.c

xorbitsai / inference

mit-han-lab / llm-awq

AutoGPTQ / AutoGPTQ

xlite-dev / Awesome-LLM-Inference