Skip to content
View HelenMao's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Communication University of China
  • CUC, Beijing, China

Block or report HelenMao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

📖 This is a repository for organizing papers, codes, and other resources related to personalized video generation and editing.

44 1 Updated Jul 2, 2025

[ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

20 Updated Jun 25, 2025

[ICCV 2025] VisualCloze: A universal image generation framework that can support a wide range of in-domain tasks and generalize to unseen ones. (🔥 🔥 🔥 Merged into offical pipelines of diffusers.)

Python 247 11 Updated Jun 4, 2025

Lets make video diffusion practical!

Python 14,865 1,347 Updated Jun 27, 2025

Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

Python 223 8 Updated Apr 23, 2025

This is the official implementation of our Señorita-2M [Weights and Dataset] : A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Python 56 1 Updated Apr 9, 2025

[CVPR 2025] Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Python 239 8 Updated Jun 3, 2025

A collection of papers related to data compression

76 2 Updated Jul 2, 2025

MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning

Python 213 27 Updated Mar 26, 2025

Official code of "Edit Transfer: Learning Image Editing via Vision In-Context Relations"

Jupyter Notebook 79 1 Updated Jun 6, 2025

A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.

203 6 Updated Jun 13, 2025

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

603 32 Updated Jun 27, 2025

[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

445 10 Updated Jan 17, 2025

Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation

Python 108 Updated Apr 16, 2025

An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community to help implement this model!

Python 60 3 Updated Jun 30, 2025

A curated list of resources for using LLMs to develop more competitive grant applications.

Python 3,565 457 Updated Mar 1, 2024

Writing AI Conference Papers: A Handbook for Beginners

2,596 87 Updated Jun 5, 2025

[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,558 67 Updated Jul 2, 2025

[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.

Python 1,845 188 Updated Apr 8, 2025

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,264 149 Updated Feb 18, 2025

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Python 889 53 Updated Mar 12, 2024

[ICCV 2023] Official PyTorch implementation for the paper "FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model"

Python 302 11 Updated Oct 12, 2023

one summary of diffusion-based image processing, including restoration, enhancement, coding, quality assessment

787 58 Updated Jun 5, 2025

[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models

287 17 Updated Feb 15, 2025

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 11,699 844 Updated Jul 18, 2024

Recent LLM-based CV and related works. Welcome to comment/contribute!

866 36 Updated Mar 8, 2025

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Python 985 45 Updated Jan 17, 2024

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,527 356 Updated May 13, 2025

[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.

Python 1,016 56 Updated Aug 21, 2024
Next