-
Communication University of China
- CUC, Beijing, China
Stars
📖 This is a repository for organizing papers, codes, and other resources related to personalized video generation and editing.
[ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
[ICCV 2025] VisualCloze: A universal image generation framework that can support a wide range of in-domain tasks and generalize to unseen ones. (🔥 🔥 🔥 Merged into offical pipelines of diffusers.)
Lets make video diffusion practical!
Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
This is the official implementation of our Señorita-2M [Weights and Dataset] : A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
[CVPR 2025] Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
A collection of papers related to data compression
MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning
Official code of "Edit Transfer: Learning Image Editing via Vision In-Context Relations"
A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community to help implement this model!
A curated list of resources for using LLMs to develop more competitive grant applications.
Writing AI Conference Papers: A Handbook for Beginners
[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
[ICCV 2023] Official PyTorch implementation for the paper "FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model"
one summary of diffusion-based image processing, including restoration, enhancement, coding, quality assessment
[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
Recent LLM-based CV and related works. Welcome to comment/contribute!
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.