TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,924 1,554 Updated Jul 4, 2025

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,227 909 Updated Mar 27, 2024

bytedance / effective_transformer

Running BERT without Padding

C++ 472 54 Updated Mar 18, 2022

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 9,424 1,603 Updated Jul 3, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,640 873 Updated Apr 29, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,245 834 Updated Jul 4, 2025

stepfun-ai / Step-Video-T2V

Python 3,052 315 Updated Mar 17, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 18,185 1,779 Updated Jul 3, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 24,959 2,320 Updated Jul 3, 2025

deepseek-ai / DeepSeek-R1

90,388 11,660 Updated Jun 27, 2025

PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Python 2,183 138 Updated Dec 3, 2024

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 38,807 4,724 Updated Jun 2, 2025

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 1,739 138 Updated Nov 18, 2024

shibing624 / similarities

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包，支持亿级数据文搜文、文搜图、图搜图，python3开发，开箱即用。

Python 858 86 Updated Oct 29, 2024

bentoml / CLIP-API-service

CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search

Jupyter Notebook 65 4 Updated Jan 15, 2024

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,888 522 Updated Apr 11, 2025

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 10,287 1,206 Updated Jul 3, 2025

THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 11,647 1,126 Updated Jun 17, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 82,547 12,256 Updated Jul 4, 2025

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 22,345 1,511 Updated Jun 26, 2025

THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 6,659 564 Updated Jul 4, 2025

yanue / V2rayU

V2rayU,基于v2ray核心的mac版客户端,用于科学上网,使用swift编写,支持trojan,vmess,shadowsocks,socks5等服务协议,支持订阅, 支持二维码,剪贴板导入,手动配置,二维码分享等

19,436 2,936 Updated Jun 20, 2025

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 58,915 5,849 Updated Jul 4, 2025

liucongg / ChatGLM-Finetuning

基于ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B模型，进行下游具体任务微调，涉及Freeze、Lora、P-tuning、全参微调等

Python 2,749 313 Updated Dec 12, 2023

mymusise / ChatGLM-Tuning

基于ChatGLM-6B + LoRA的Fintune方案

Python 3,763 445 Updated Nov 25, 2023

CLUEbenchmark / SuperCLUE

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

3,218 108 Updated Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIlidh LidhCS

Block or report LidhCS

Stars

THUDM / GLM-4.1V-Thinking

bytedance / ByteTransformer

PeterGriffinJin / Search-R1

theworldofagents / Agentic-Reasoning

NVIDIA / TensorRT-LLM