-
Nanyang Technological University
- Singapore
- https://tianxingwu.github.io
Stars
PhysX: Physical-Grounded 3D Asset Generation (NeurIPS 2025, Spotlight)
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
AllTracker is a model for tracking all pixels in a video.
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
StereoPilot Elastic3D StereoWorld BetterDepth BRIDGE BriGeS ChronoDepth Depth Any Video Depth Anything Depth Pro DepthCrafter Distill Any Depth FE2E GRIN M2SVid MASt3R MegaSaM Metric3D Metric-Solve…
[ICCV 2025] SpatialTrackerV2: 3D Point Tracking Made Easy
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
[ICML 2025] Official PyTorch Implementation of "History-Guided Video Diffusion"
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
[ICCV 2025] VisualCloze: A universal image generation framework that can support a wide range of in-domain tasks and generalize to unseen ones. (🔥 🔥 🔥 Merged into offical pipelines of diffusers.)
Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.
[CVPR 2025 Highlight] Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
Official implementation of "E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models"
[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Web-based 3D visualization + Python
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
[CVPR 2024] Code release for "Unsupervised Universal Image Segmentation"
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
Taichi Blender intergration for physics simulation and animation
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.

