-
Northeastern University
- United States
-
11:32
(UTC -05:00) - https://scholar.google.com/citations?user=UZSbtlsAAAAJ&hl=en
Lists (8)
Sort Name ascending (A-Z)
Stars
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
The first Interleaved framework for textual reasoning within the visual generation process
Single-file implementation to advance vision-language-action (VLA) models with reinforcement learning.
The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”
ENACT is a benchmark that evaluates embodied cognition through world modeling from egocentric interaction. It is designed to be simple and have a scalable dataset.
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Github repository for "Internalizing World Models via Self-Play Finetuning for Agentic RL"
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
HunyuanVideo: A Systematic Framework For Large Video Generation Model
DelinQu / SimplerEnv-OpenVLA
Forked from simpler-env/SimplerEnvEvaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo, and OpenVLA) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
[ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
Zijian007 / LIBERO-PRO
Forked from Zxy-MLlab/LIBERO-PROLIBERO-PRO is the official repository of the LIBERO-PRO — an evaluation extension of the original LIBERO benchmark
Official repository of LIBERO-plus, a generalized benchmark for in-depth robustness analysis of vision-language-action models.
Official implementation of Don’t Blind Your VLA: Aligning Visual Representations for OOD Generalization. https://blind-vla-paper.github.io
Official implementation of "Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy."
RoboMonster: Compositional Generalization of Heterogeneous Embodied Agents
Training VLM agents with multi-turn reinforcement learning
[NeurIPS 2025] Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
