Stars
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
free and open OpenAI Deep Research
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
Transformer related optimization, including BERT, GPT
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
FlashMLA: Efficient MLA decoding kernels
DeepEP: an efficient expert-parallel communication library
Fast and memory-efficient exact attention
Fully open reproduction of DeepSeek-R1
Mixture-of-Experts for Large Vision-Language Models
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。
CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Large Language Model Text Generation Inference
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
V2rayU,基于v2ray核心的mac版客户端,用于科学上网,使用swift编写,支持trojan,vmess,shadowsocks,socks5等服务协议,支持订阅, 支持二维码,剪贴板导入,手动配置,二维码分享等
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
基于ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B模型,进行下游具体任务微调,涉及Freeze、Lora、P-tuning、全参微调等
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese