Skip to content
View nywang16's full-sized avatar

Block or report nywang16

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICLR 2025] Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Python 301 14 Updated Jul 30, 2025

MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement

Python 433 1 Updated Oct 30, 2025

iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation

Python 159 8 Updated Dec 1, 2025

Official inference repo for FLUX.2 models

Python 1,056 48 Updated Dec 1, 2025

InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation 🔥

Python 146 6 Updated Jul 17, 2024

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 6,365 651 Updated Sep 26, 2024

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

14,794 1,008 Updated Sep 20, 2025

Using modified BiSeNet for face parsing in PyTorch

Python 2,530 490 Updated May 21, 2023

🧩 | BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation | Face Parsing | In PyTorch >> ONNX Runtime Inference

Python 211 39 Updated Nov 7, 2025

Kandinsky 5.0: A family of diffusion models for Video & Image generation

Python 515 32 Updated Dec 2, 2025

✨ WithAnyone is capable of generating high-quality, controllable, and ID consistent images

Python 523 19 Updated Nov 3, 2025

A curated list of papers, code, and resources pertaining to object placement.

108 1 Updated Nov 29, 2025

A curated list of resources for video super-resolution using diffusion models.

144 1 Updated Nov 25, 2025

Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny conditional de…

Python 1,042 84 Updated Nov 21, 2025

This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''

Python 2,427 203 Updated Oct 20, 2025

[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

Python 6,390 753 Updated Feb 19, 2025

[SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"

Python 526 33 Updated Apr 8, 2025

Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" [TMLR 2024]

Jupyter Notebook 640 48 Updated Oct 29, 2024

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 16,915 1,397 Updated Nov 28, 2025

Echo-4o

Jupyter Notebook 314 14 Updated Oct 20, 2025

[arXiv 2025] Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers

Python 49 4 Updated Aug 8, 2025

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models (CVPR 2025)

Python 40 Updated Jun 30, 2025

Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini

Roff 24,054 3,680 Updated Nov 30, 2025

Awesome Unified Multimodal Models

921 28 Updated Aug 17, 2025

https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT

Python 104 6 Updated Nov 1, 2025

An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.

Python 437 14 Updated Dec 2, 2025
Python 574 16 Updated Nov 10, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,375 113 Updated Dec 4, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,825 302 Updated Jun 12, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 3,954 9 Updated Dec 2, 2025
Next