
Starred repositories
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
Efficient Triton Kernels for LLM Training
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Solve Visual Understanding with Reinforced VLMs
A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM …
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[TPAMI reviewing] Towards Visual Grounding: A Survey
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space
An open source implementation of CLIP.
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
🤯 Lobe Chat - an open-source, modern design AI chat framework. Supports multiple AI providers (OpenAI / Claude 4 / Gemini / DeepSeek / Ollama / Qwen), Knowledge Base (file upload / knowledge manage…
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Efficient Multimodal Large Language Models: A Survey
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
Docker Extension Pack for Visual Studio Code
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…