Skip to content
View yqi19's full-sized avatar
😶
😶

Block or report yqi19

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Python 20,435 3,350 Updated Dec 24, 2025

[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing

Python 3,521 244 Updated Oct 17, 2025

Official Repository for MolmoAct

Python 277 31 Updated Dec 11, 2025

The first Interleaved framework for textual reasoning within the visual generation process

153 1 Updated Nov 21, 2025

Single-file implementation to advance vision-language-action (VLA) models with reinforcement learning.

Python 370 17 Updated Nov 8, 2025

The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”

110 1 Updated Oct 7, 2025

ENACT is a benchmark that evaluates embodied cognition through world modeling from egocentric interaction. It is designed to be simple and have a scalable dataset.

Python 34 1 Updated Nov 27, 2025

A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…

948 23 Updated Dec 24, 2025

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Python 1,146 63 Updated Oct 13, 2025

Github repository for "Internalizing World Models via Self-Play Finetuning for Agentic RL"

Python 32 2 Updated Nov 1, 2025

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]

Python 153 8 Updated Sep 27, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,504 190 Updated Dec 19, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 5,162 1,806 Updated Feb 26, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 11,506 1,155 Updated Nov 21, 2025

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo, and OpenVLA) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)

Jupyter Notebook 252 45 Updated Jun 23, 2025

Official repository for LTX-Video

Python 8,931 838 Updated Oct 25, 2025

[ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

Python 425 20 Updated Dec 6, 2025

Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning

Python 134 12 Updated Aug 1, 2025

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)

Jupyter Notebook 898 164 Updated Dec 20, 2025
Python 214 16 Updated Aug 25, 2025

LIBERO-PRO is the official repository of the LIBERO-PRO — an evaluation extension of the original LIBERO benchmark

Jupyter Notebook 1 Updated Oct 27, 2025

Official repository of LIBERO-plus, a generalized benchmark for in-depth robustness analysis of vision-language-action models.

Python 151 10 Updated Dec 15, 2025

Official implementation of Don’t Blind Your VLA: Aligning Visual Representations for OOD Generalization. https://blind-vla-paper.github.io

Python 48 2 Updated Dec 11, 2025
Python 8 Updated Oct 24, 2025

Official implementation of "Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy."

Jupyter Notebook 120 15 Updated Oct 23, 2025

Are Video Models Ready as Zero-shot Reasoners?

Python 84 4 Updated Nov 24, 2025

RoboMonster: Compositional Generalization of Heterogeneous Embodied Agents

Python 10 Updated Oct 29, 2025

Training VLM agents with multi-turn reinforcement learning

Python 356 42 Updated Dec 1, 2025

[NeurIPS 2025] Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models

Python 6 Updated Nov 10, 2025

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,930 383 Updated Mar 14, 2024
Next