- Beijing, China
Stars
Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature convergence and unlock greater RL potential.
A minimal implementation of DeepMind's Genie world model
Mobile-Agent: The Powerful GUI Agent Family
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
verl: Volcano Engine Reinforcement Learning for LLMs
Fully open reproduction of DeepSeek-R1
Understanding R1-Zero-Like Training: A Critical Perspective
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
The simplest, fastest repository for training/finetuning medium-sized GPTs.
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

