Add VisualCloze #11377

lzyhha · 2025-04-21T12:06:54Z

What does this PR do?

Add VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning, an in-context learning based universal image generation framework, along with corresponding tests and documentation.

Here are some test codes and their results: Model Card.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sayakpaul · 2025-04-21T12:17:13Z

@lzyhha thanks for your contribution. Could you please add some code snippets and results to the thread?

sayakpaul · 2025-04-21T12:17:29Z

Cc: @asomoza as well for testing if possible.

lzyhha · 2025-04-21T12:50:44Z

@lzyhha thanks for your contribution. Could you please add some code snippets and results to the thread?

Hello, here are some test codes and their results: Model Card.

HuggingFaceDocBuilderDev · 2025-04-21T12:59:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

asomoza · 2025-04-21T13:30:18Z

Hi, really nice and thank you for your work. Currently diffusers doesn't have einops as a dependency. Is it possible that you refactor all the rearrange calls to just use a torch equivalent without the need of external libraries?

lzyhha · 2025-04-21T13:56:12Z

Hi, really nice and thank you for your work. Currently diffusers doesn't have einops as a dependency. Is it possible that you refactor all the rearrange calls to just use a torch equivalent without the need of external libraries?

Okay, I will make the necessary modifications. Additionally, I noticed that the call method is not functioning properly in the documentation. Could you please help check the cause?

docs/source/en/api/pipelines/visualcloze.md

Co-authored-by: Álvaro Somoza <[email protected]>

lzyhha · 2025-04-22T01:57:21Z

Hello, we have removed einops from the code while ensuring the correctness of the results. @asomoza

sayakpaul

Thanks for working on this. I just added few minor comments.

I am unsure about self.denoise(). On one hand I see its value but since it deviates from our usual pipeline implementations, I will defer the decision to the other reviewers.

docs/source/en/api/pipelines/visualcloze.md

src/diffusers/pipelines/visualcloze/processor_visualcloze.py

src/diffusers/pipelines/visualcloze/pipeline_visualcloze.py

Co-authored-by: Sayak Paul <[email protected]>

lzyhha · 2025-05-01T11:33:19Z

@a-r-r-o-w Hello, I have upgraded Ruff in my environment to version 0.9.10 and resolved the errors from the previously failed workflows.

lzyhha · 2025-05-03T00:04:03Z

Hello, I have run make fix-copies to address the issue. I’m wondering if there is a way to help us quickly identify and fix remaining problems.

sayakpaul · 2025-05-03T02:47:32Z

@lzyhha thanks for the work! From what I see, most comments are resolved. Not sure which ones you needed our help on. However, I will let @yiyixuxu or @a-r-r-o-w take care of the final merge.

lzyhha · 2025-05-05T07:14:24Z

Hello, I see that “Fast PyTorch Pipeline CPU tests (pull_request)” is failing after 359 minutes, but I can’t figure out the reason. @a-r-r-o-w

sayakpaul · 2025-05-05T08:04:09Z

I don't that failure was caused because of this PR. It was likely infra-related.

a-r-r-o-w · 2025-05-05T09:00:51Z

src/diffusers/pipelines/visualcloze/pipeline_visualcloze_combined.py

+        self.generation_pipe = VisualClozeGenerationPipeline(
+            vae=vae,
+            text_encoder=text_encoder,
+            text_encoder_2=text_encoder_2,
+            tokenizer=tokenizer,
+            tokenizer_2=tokenizer_2,
+            transformer=transformer,
+            scheduler=scheduler,
+            resolution=resolution,
+        )
+        self.upsampling_pipe = VisualClozeUpsamplingPipeline(
+            vae=vae,
+            text_encoder=text_encoder,
+            text_encoder_2=text_encoder_2,
+            tokenizer=tokenizer,
+            tokenizer_2=tokenizer_2,
+            transformer=transformer,
+            scheduler=scheduler,
+        )


@lzyhha The PR looks good to me now that it's separated, but we cannot instantiate a pipeline inside another pipeline. The intention with the refactor was to make the example code look something like:

from diffusers import VisualClozeGenerationPipeline, FluxFillPipeline as VisualClozeUpsamplingPipeline pipe1 = VisualClozeGenerationPipeline.from_pretrained(...) pipe2 = VisualClozeUpsamplingPipeline.from_pretrained(...) <intermediate_results> = pipe1(...) inputs = pipe2.prepare_upsampling(<intermediate_results>) result = pipe2(...) # save results

Would you like to take a stab at refactoring this? If not, I'd be happy to make the changes this week so we can proceed to merge.

We can ignore any failing tests for now. I'll help fix them once the PR is ready for merge

@a-r-r-o-w Hello, I can make the changes, but I still have some questions about the solution and thus need to confirm that again.

I instantiate a pipeline inside another pipeline because I follow the implementation of pipeline_stable_cascade_combined.py, which instantiate two pipelines in another pipeline as follows:

self.prior_pipe = StableCascadePriorPipeline( prior=prior_prior, text_encoder=prior_text_encoder, tokenizer=prior_tokenizer, scheduler=prior_scheduler, image_encoder=prior_image_encoder, feature_extractor=prior_feature_extractor, ) self.decoder_pipe = StableCascadeDecoderPipeline( text_encoder=text_encoder, tokenizer=tokenizer, decoder=decoder, scheduler=scheduler, vqgan=vqgan, )

Instead, if we instantiate two pipelines via from_pretrained as follows. It seems that the same network will be instantiated twice and takes up twice the memory, since we use exactly the same model architecture and weights in both stages.

pipe1 = VisualClozeGenerationPipeline.from_pretrained(...) pipe2 = VisualClozeUpsamplingPipeline.from_pretrained(...)

pipeline_kandinsky2_2_combined also instantiates a pipeline inside another pipeline.

I would like to confirm a way that makes inference more convenient while avoiding unnecessary memory usage.

I see, that's a good point. I was not aware of whether we want to continue maintaining the pipelines that way. Since I'm not sure on the exact approach @yiyixuxu wanted to follow, I now think that what we have currently in the PR is okay (I had something else in mind initially when she asked for two separate pipielines). Let's wait for YiYi to give the PR another look, and I'll then run the example snippets to confirm & merge. Sorry for the confusion 😅

Instead, if we instantiate two pipelines via from_pretrained as follows. It seems that the same network will be instantiated twice and takes up twice the memory, since we use exactly the same model architecture and weights in both stages.

That shouldn't be an issue if/when we modify the implementation to how I described, because we can share the underlying model components during pipeline initialization. Thanks to your comment, I just realized that we probably want to maintain consistency with Stable Cascade and Kandinsky, so will let YiYi review further

Thanks, I now understand that the method you just mentioned doesn’t require additional memory. I’ll make the changes once a final approach is confirmed.

hi @lzyhha @a-r-r-o-w

I think it makes sense to make 3 pipelines no? generation, upsample, and a combined this way user can run the pipelines using the API @a-r-r-o-w suggsted

and yes, yo do not need to take additional memory, either initialize separately or within the combined. with this API, you can use from_pipe to reuse the components

from diffusers import VisualClozeGenerationPipeline, FluxFillPipeline as VisualClozeUpsamplingPipeline pipe = VisualClozeGenerationPipeline.from_pretrained(...) pipe_upsample = VisualClozeUpsamplingPipeline.from_pipe(pipe1) <intermediate_results> = pipe(...) inputs = pipe_upsample.prepare_upsampling(<intermediate_results>) result = pipe_upsample(...) ``

@yiyixuxu As one of the pipelines is FluxFillPipeline, we only need the two here I think. PR should be good to go now, no?

ohh sounds good

a-r-r-o-w · 2025-05-05T09:34:18Z

The following tests seem to be failing:

FAILED tests/pipelines/visualcloze/test_pipeline_visualcloze_combined.py::VisualClozePipelineFastTests::test_callback_cfg - AttributeError: 'VisualClozePipeline' object has no attribute 'guidance_scale'
FAILED tests/pipelines/visualcloze/test_pipeline_visualcloze_combined.py::VisualClozePipelineFastTests::test_save_load_dduf - Failed: Timeout >60.0s
FAILED tests/pipelines/visualcloze/test_pipeline_visualcloze_combined.py::VisualClozePipelineFastTests::test_save_load_local

For (1), it should be adding the @property decorator for guidance scale

For (2), I think you can just disable the dduf test by setting supports_dduf = False

For (3), not really sure what's the problem as there seems to be no error message. I can take a look soon

…astTests

yiyixuxu

awesome job! thank you both @lzyhha @a-r-r-o-w

yiyixuxu · 2025-05-07T17:37:48Z

src/diffusers/pipelines/visualcloze/pipeline_visualcloze_combined.py

+        self.generation_pipe = VisualClozeGenerationPipeline(
+            vae=vae,
+            text_encoder=text_encoder,
+            text_encoder_2=text_encoder_2,
+            tokenizer=tokenizer,
+            tokenizer_2=tokenizer_2,
+            transformer=transformer,
+            scheduler=scheduler,
+            resolution=resolution,
+        )
+        self.upsampling_pipe = VisualClozeUpsamplingPipeline(
+            vae=vae,
+            text_encoder=text_encoder,
+            text_encoder_2=text_encoder_2,
+            tokenizer=tokenizer,
+            tokenizer_2=tokenizer_2,
+            transformer=transformer,
+            scheduler=scheduler,
+        )


ohh sounds good

yiyixuxu · 2025-05-07T17:39:07Z

src/diffusers/pipelines/visualcloze/pipeline_visualcloze_combined.py

+        )
+        if upsampling_strength == 0:
+            # Offload all models
+            self.maybe_free_model_hooks()


why is this needed? we have this step inside the pipeline already and they contain all the components, no?

You are right. I have deleted maybe_free_model_hooks from pipeline_visualcloze_combined.

lzyhha · 2025-05-09T07:39:45Z

Hello, may I kindly ask if there are any new developments. @a-r-r-o-w

a-r-r-o-w · 2025-05-12T18:47:27Z

@lzyhha Sorry for the delay! Taking a final look at the example snippets and will merge after that

a-r-r-o-w

Just some last updates to examples so that they are runnable with copy-paste

docs/source/en/api/pipelines/visualcloze.md

Abhinay1997 · 2025-05-17T13:30:54Z

Hey @lzyhha theres a bug here: https://github.com/lzyhha/diffusers/blob/main/src/diffusers/pipelines/visualcloze/visualcloze_utils.py#L113 please replace self.height with self.resize

lzyhha and others added 6 commits April 20, 2025 23:26

VisualCloze

4746498

Merge branch 'main' of https://github.com/lzyhha/diffusers into main

6507b60

style quality

fbf534e

add docs

c2f5771

add docs

4a5dbe2

Merge branch 'main' into main

7c6b26c

sayakpaul requested a review from a-r-r-o-w April 21, 2025 12:17

lzyhha added 2 commits April 21, 2025 20:54

typo

97943bb

Merge branch 'main' of https://github.com/lzyhha/diffusers

d1459d5

asomoza reviewed Apr 21, 2025

View reviewed changes

docs/source/en/api/pipelines/visualcloze.md Outdated Show resolved Hide resolved

yiyixuxu added this to Diffusers Roadmap 0.35 Apr 21, 2025

github-project-automation bot moved this to In Progress in Diffusers Roadmap 0.35 Apr 21, 2025

lzyhha and others added 3 commits April 22, 2025 08:50

Update docs/source/en/api/pipelines/visualcloze.md

b93152f

Co-authored-by: Álvaro Somoza <[email protected]>

delete einops

c70520c

style quality

7fe072b

Merge branch 'main' into main

c98b148

sayakpaul reviewed Apr 22, 2025

View reviewed changes

lzyhha and others added 6 commits April 22, 2025 20:59

Update src/diffusers/pipelines/visualcloze/pipeline_visualcloze.py

8a6f7b6

Co-authored-by: Sayak Paul <[email protected]>

reorg

0ef118c

Merge branch 'main' of https://github.com/lzyhha/diffusers into main

206d531

refine doc

cef2e24

style quality

060abe8

typo

0ea3343

quality style

50cfa58

lzyhha and others added 2 commits May 1, 2025 20:15

Merge branch 'main' into main

4f6aa77

typo

5f00185

yiyixuxu added the close-to-merge label May 2, 2025

make fix-copies

9de1e78

Merge branch 'main' into main

bb56663

a-r-r-o-w reviewed May 5, 2025

View reviewed changes

lzyhha added 3 commits May 5, 2025 18:28

fix test_callback_cfg and test_save_load_dduf in VisualClozePipelineF…

0cb652a

…astTests

Merge branch 'main' of https://github.com/lzyhha/diffusers

3849692

add EXAMPLE_DOC_STRING to VisualClozeGenerationPipeline

26750b1

yiyixuxu approved these changes May 7, 2025

View reviewed changes

delete maybe_free_model_hooks from pipeline_visualcloze_combined

0ca04f6

a-r-r-o-w approved these changes May 12, 2025

View reviewed changes

docs/source/en/api/pipelines/visualcloze.md Outdated Show resolved Hide resolved

docs/source/en/api/pipelines/visualcloze.md Show resolved Hide resolved

docs/source/en/api/pipelines/visualcloze.md Show resolved Hide resolved

docs/source/en/api/pipelines/visualcloze.md Show resolved Hide resolved

a-r-r-o-w and others added 5 commits May 13, 2025 00:48

Apply suggestions from code review

92132b0

Merge branch 'main' into main

07f5040

fix test_save_load_local test; add reason for skipping cfg test

9abe8b4

more save_load test fixes

10de9b4

fix tests in generation pipeline tests

559d79a

a-r-r-o-w merged commit 4f438de into huggingface:main May 12, 2025
12 checks passed

github-project-automation bot moved this from In Progress to Done in Diffusers Roadmap 0.35 May 12, 2025

nitinmukesh mentioned this pull request May 18, 2025

[Feature] VisualCloze: A universal image generation framework that can support a wide range of in-domain tasks and generalize to unseen ones. mit-han-lab/nunchaku#374

Open

2 tasks

Add VisualCloze #11377

Add VisualCloze #11377

Uh oh!

Conversation

lzyhha commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sayakpaul commented Apr 21, 2025

Uh oh!

sayakpaul commented Apr 21, 2025

Uh oh!

lzyhha commented Apr 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 21, 2025

Uh oh!

asomoza commented Apr 21, 2025

Uh oh!

lzyhha commented Apr 21, 2025

Uh oh!

Uh oh!

lzyhha commented Apr 22, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lzyhha commented May 1, 2025

Uh oh!

lzyhha commented May 3, 2025

Uh oh!

sayakpaul commented May 3, 2025

Uh oh!

lzyhha commented May 5, 2025

Uh oh!

sayakpaul commented May 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lzyhha May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w commented May 5, 2025

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lzyhha commented May 9, 2025

Uh oh!

a-r-r-o-w commented May 12, 2025

lzyhha commented Apr 21, 2025 •

edited

Loading

lzyhha May 5, 2025 •

edited

Loading

yiyixuxu May 5, 2025 •

edited

Loading