Kandinsky2.2 #3903

cene555 · 2023-06-29T20:42:22Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2023-06-29T20:50:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

yiyixuxu · 2023-06-30T16:09:17Z

@cene555
Thanks so much for the PR!!!
didn't expect it, so it's like Christmas comes early 😅🎅🎁

are you still working on the PR, or is it ready for review? let us know

yiyixuxu

great! I left some feedbacks, most nit, we can wait to make changes until @patrickvonplaten gives his review

I think the main to-dos left are:

add docstring examples: I've tested our doc example for text2img, img2img and inpaint, all works great. only ones missing are controlnets
add #copy from statement everywhere (YiYi can help with this)
add tests and doc (YiYi can help with this)
make PriorEmb2Emb work as intended - either add add_noise function to unclip or replace unclip scheduler with ddpm (YiYi can help with this)
I'm not too sure why we need this try statement - can you explain? if any refactor is needed, let me know #3903 (comment)

yiyixuxu · 2023-07-03T15:43:05Z

src/diffusers/__init__.py

@@ -136,6 +136,13 @@
        KandinskyInpaintPipeline,
        KandinskyPipeline,
        KandinskyPriorPipeline,
+        Kandinsky2_2_DecoderControlnetImg2ImgPipeline,


@patrickvonplaten
these Decoder pipelines are main pipelines that contains image generation process - this is consistent with the naming in the original repo and it makes sense I think

however might be a little bit confusing for our users - should we rename them?

Yes I think it would be nicer to remove _Decoder here

yiyixuxu · 2023-07-03T15:50:23Z

src/diffusers/models/embeddings.py

@@ -428,6 +449,46 @@ def forward(self, text_embeds: torch.FloatTensor, image_embeds: torch.FloatTenso

        return time_image_embeds + time_text_embeds

+class ImageTimeEmbedding(nn.Module):


yiyixuxu · 2023-07-03T15:59:36Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_decoder.py

+        self,
+        unet: UNet2DConditionModel,
+        scheduler: DDPMScheduler,
+        vae: VQModel,


I think we should be consistent with the 2.1 in diffusers and name it movq ?

@cene555 is the model a movq here or just a vae?

yiyixuxu · 2023-07-03T16:09:24Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_decoder.py

+                # YiYi notes: only reason this pipeline can't work with unclip scheduler is that can't pass down this argument
+                #             need to use DDPM scheduler instead
+                # prev_timestep=prev_timestep,


Suggested change

# YiYi notes: only reason this pipeline can't work with unclip scheduler is that can't pass down this argument

# need to use DDPM scheduler instead

# prev_timestep=prev_timestep,

we can remove this note now:)

yiyixuxu · 2023-07-03T16:12:31Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_decoder_controlnet.py

+
+class Kandinsky2_2_DecoderControlnetPipeline(DiffusionPipeline):
+    """
+    Pipeline for text-to-image generation using Kandinsky


Suggested change

Pipeline for text-to-image generation using Kandinsky

Pipeline for text-to-image generation using Kandinsky with ControlNet guidance.

yiyixuxu · 2023-07-03T16:27:20Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_decoder_controlnet_img2img.py

+                noise_pred, _ = noise_pred.split(latents.shape[1], dim=1)
+
+            # compute the previous noisy sample x_t -> x_t-1
+            try:


I don't understand this here: do we have a scheduler.step that does not accept generator argument?

in any case we should be very clear about what schedulers this pipeline can work with and be able to work with them

yiyixuxu · 2023-07-03T16:32:32Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_decoder_img2img.py

+
+class Kandinsky2_2_DecoderImg2ImgPipeline(DiffusionPipeline):
+    """
+    Pipeline for text-to-image generation using Kandinsky


Suggested change

Pipeline for text-to-image generation using Kandinsky

Pipeline for image-to-image generation using Kandinsky

yiyixuxu · 2023-07-03T16:33:14Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_decoder_img2img.py

+        text_encoder ([`MultilingualCLIP`]):
+            Frozen text-encoder.
+        tokenizer ([`XLMRobertaTokenizer`]):
+            Tokenizer of class


Suggested change

text_encoder ([`MultilingualCLIP`]):

Frozen text-encoder.

tokenizer ([`XLMRobertaTokenizer`]):

Tokenizer of class

yiyixuxu · 2023-07-03T16:34:46Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_decoder_img2img.py

+        noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
+
+        # get latents
+        try:


need to make sure either it works with ddpm or add add_noise method to unclip scheduler

yiyixuxu · 2023-07-03T16:36:29Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_decoder_inpainting.py

+        text_encoder ([`MultilingualCLIP`]):
+            Frozen text-encoder.
+        tokenizer ([`XLMRobertaTokenizer`]):
+            Tokenizer of class


Suggested change

text_encoder ([`MultilingualCLIP`]):

Frozen text-encoder.

tokenizer ([`XLMRobertaTokenizer`]):

Tokenizer of class

patrickvonplaten · 2023-07-03T16:55:14Z

src/diffusers/models/embeddings.py

+
+        self.num_image_text_embeds = num_image_text_embeds
+        self.image_embeds = nn.Linear(image_embed_dim, self.num_image_text_embeds * cross_attention_dim)
+        self.norm = nn.LayerNorm(cross_attention_dim)


Suggested change

self.norm = nn.LayerNorm(cross_attention_dim)

self.norm = nn.LayerNorm(cross_attention_dim)

patrickvonplaten · 2023-07-03T17:03:43Z

src/diffusers/models/unet_2d_condition.py

@@ -26,7 +26,10 @@
 from .embeddings import (


Changes here look all good to me! would just be nice / important to add some tests and docstrings

patrickvonplaten · 2023-07-03T17:04:56Z

src/diffusers/models/embeddings.py

+        self.input_hint_block = nn.Sequential(
+                    nn.Conv2d(3, 16, 3, padding=1),
+                    nn.SiLU(),
+                    nn.Conv2d(16, 16, 3, padding=1),
+                    nn.SiLU(),
+                    nn.Conv2d(16, 32, 3, padding=1, stride=2),
+                    nn.SiLU(),
+                    nn.Conv2d(32, 32, 3, padding=1),
+                    nn.SiLU(),
+                    nn.Conv2d(32, 96, 3, padding=1, stride=2),
+                    nn.SiLU(),
+                    nn.Conv2d(96, 96, 3, padding=1),
+                    nn.SiLU(),
+                    nn.Conv2d(96, 256, 3, padding=1, stride=2),
+                    nn.SiLU(),
+                    nn.Conv2d(256, 4, 3, padding=1)
+                )


Suggested change

self.input_hint_block = nn.Sequential(

nn.Conv2d(3, 16, 3, padding=1),

nn.SiLU(),

nn.Conv2d(16, 16, 3, padding=1),

nn.SiLU(),

nn.Conv2d(16, 32, 3, padding=1, stride=2),

nn.SiLU(),

nn.Conv2d(32, 32, 3, padding=1),

nn.SiLU(),

nn.Conv2d(32, 96, 3, padding=1, stride=2),

nn.SiLU(),

nn.Conv2d(96, 96, 3, padding=1),

nn.SiLU(),

nn.Conv2d(96, 256, 3, padding=1, stride=2),

nn.SiLU(),

nn.Conv2d(256, 4, 3, padding=1)

)

self.input_hint_block = nn.Sequential(

nn.Conv2d(3, 16, 3, padding=1),

nn.SiLU(),

nn.Conv2d(16, 16, 3, padding=1),

nn.SiLU(),

nn.Conv2d(16, 32, 3, padding=1, stride=2),

nn.SiLU(),

nn.Conv2d(32, 32, 3, padding=1),

nn.SiLU(),

nn.Conv2d(32, 96, 3, padding=1, stride=2),

nn.SiLU(),

nn.Conv2d(96, 96, 3, padding=1),

nn.SiLU(),

nn.Conv2d(96, 256, 3, padding=1, stride=2),

nn.SiLU(),

nn.Conv2d(256, 4, 3, padding=1)

)

patrickvonplaten · 2023-07-03T17:05:08Z

src/diffusers/models/embeddings.py

@@ -375,6 +375,27 @@ def forward(self, text_embeds: torch.FloatTensor, image_embeds: torch.FloatTenso



Changes look good to me!

patrickvonplaten · 2023-07-03T17:05:47Z

src/diffusers/__init__.py

@@ -136,6 +136,13 @@
        KandinskyInpaintPipeline,
        KandinskyPipeline,
        KandinskyPriorPipeline,
+        Kandinsky2_2_DecoderControlnetImg2ImgPipeline,


Suggested change

Kandinsky2_2_DecoderControlnetImg2ImgPipeline,

KandinskyControlnetV22Img2ImgPipeline,

patrickvonplaten · 2023-07-03T17:07:51Z

src/diffusers/__init__.py

+        Kandinsky2_2_DecoderControlnetPipeline,
+        Kandinsky2_2_DecoderImg2ImgPipeline,
+        Kandinsky2_2_DecoderPipeline,
+        Kandinsky2_2PriorEmb2EmbPipeline,
+        Kandinsky2_2PriorPipeline,
+        Kandinsky2_2_DecoderInpaintPipeline,


Suggested change

Kandinsky2_2_DecoderControlnetPipeline,

Kandinsky2_2_DecoderImg2ImgPipeline,

Kandinsky2_2_DecoderPipeline,

Kandinsky2_2PriorEmb2EmbPipeline,

Kandinsky2_2PriorPipeline,

Kandinsky2_2_DecoderInpaintPipeline,

KandinskyV22ControlnetPipeline,

KandinskyV22Img2ImgPipeline,

KandinskyV22DecoderPipeline,

KandinskyV22PriorEmb2EmbPipeline,

KandinskyV22PriorPipeline,

KandinskyV22InpaintPipeline,

cene555 added 3 commits June 29, 2023 23:20

Kandinsky2_2

c40e286

fix init kandinsky2_2

c082fdd

kandinsky2_2 fix inpainting

392cff0

yiyixuxu reviewed Jul 3, 2023

View reviewed changes

patrickvonplaten reviewed Jul 3, 2023

View reviewed changes

Update scheduling_unclip.py

8e6134d

yiyixuxu mentioned this pull request Jul 4, 2023

Kandinsky_v22_yiyi #3936

Merged

3 tasks

cene555 added 3 commits July 6, 2023 13:26

Update pipeline_kandinsky2_2_decoder.py

2e2d825

Update pipeline_kandinsky2_2_decoder.py

45671a4

Update pipeline_kandinsky2_2_decoder_controlnet_img2img.py

fdc0357

patrickvonplaten closed this Jul 6, 2023

		@@ -428,6 +449,46 @@ def forward(self, text_embeds: torch.FloatTensor, image_embeds: torch.FloatTenso

		return time_image_embeds + time_text_embeds

		class ImageTimeEmbedding(nn.Module):

	# YiYi notes: only reason this pipeline can't work with unclip scheduler is that can't pass down this argument
	# need to use DDPM scheduler instead
	# prev_timestep=prev_timestep,

	Pipeline for text-to-image generation using Kandinsky
	Pipeline for text-to-image generation using Kandinsky with ControlNet guidance.

	Pipeline for text-to-image generation using Kandinsky
	Pipeline for image-to-image generation using Kandinsky

	self.norm = nn.LayerNorm(cross_attention_dim)
	self.norm = nn.LayerNorm(cross_attention_dim)

		@@ -375,6 +375,27 @@ def forward(self, text_embeds: torch.FloatTensor, image_embeds: torch.FloatTenso

	Kandinsky2_2_DecoderControlnetImg2ImgPipeline,
	KandinskyControlnetV22Img2ImgPipeline,

Kandinsky2.2 #3903

Kandinsky2.2 #3903

Uh oh!

Conversation

cene555 commented Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 29, 2023

Uh oh!

yiyixuxu commented Jun 30, 2023

Uh oh!

yiyixuxu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Jul 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cene555 commented Jun 29, 2023 •

edited

Loading

yiyixuxu left a comment •

edited

Loading

patrickvonplaten Jul 3, 2023 •

edited

Loading