Stars
[ICLR 2025] Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
Official inference repo for FLUX.2 models
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation 🔥
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Using modified BiSeNet for face parsing in PyTorch
🧩 | BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation | Face Parsing | In PyTorch >> ONNX Runtime Inference
Kandinsky 5.0: A family of diffusion models for Video & Image generation
✨ WithAnyone is capable of generating high-quality, controllable, and ID consistent images
A curated list of papers, code, and resources pertaining to object placement.
A curated list of resources for video super-resolution using diffusion models.
Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny conditional de…
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
[SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"
Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" [TMLR 2024]
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[arXiv 2025] Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models (CVPR 2025)
Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
Awesome Unified Multimodal Models
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
OmniGen2: Exploration to Advanced Multimodal Generation.

