Stars
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Dexbotic: Open-Source Vision-Language-Action Toolbox
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
Native and Compact Structured Latents for 3D Generation
[AAAI 26 Oral] Official implementation of "FreeGaussian: Annotation-free Control of Articulated Objects via 3D Gaussian Splats with Flow Derivatives"
Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation
The first Interleaved framework for textual reasoning within the visual generation process
The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"
Team Comet's 2025 BEHAVIOR Challenge Codebase
每日arxiv论文更新;Topic:EmbodiedAI,MLLM,Vision- Language- Navigation
"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
Fully Open Framework for Democratized Multimodal Training
Official implementation of HPSv3: Towards Wide-Spectrum Human Preference Score (ICCV2025)
EO: Open-source Unified Embodied Foundation Model Series
Reference PyTorch implementation and models for DINOv3
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team, Alibaba Cloud.
Text-audio foundation model from Boson AI
Easy Data Preparation with latest LLMs-based Operators and Pipelines.

