Skip to content

Kandinsky2.2 #3903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Kandinsky2.2 #3903

wants to merge 7 commits into from

Conversation

cene555
Copy link
Contributor

@cene555 cene555 commented Jun 29, 2023

What does this PR do?

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@yiyixuxu
Copy link
Collaborator

@cene555
Thanks so much for the PR!!!
didn't expect it, so it's like Christmas comes early 😅🎅🎁

are you still working on the PR, or is it ready for review? let us know

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! I left some feedbacks, most nit, we can wait to make changes until @patrickvonplaten gives his review

I think the main to-dos left are:

  • add docstring examples: I've tested our doc example for text2img, img2img and inpaint, all works great. only ones missing are controlnets
  • add #copy from statement everywhere (YiYi can help with this)
  • add tests and doc (YiYi can help with this)
  • make PriorEmb2Emb work as intended - either add add_noise function to unclip or replace unclip scheduler with ddpm (YiYi can help with this)
  • I'm not too sure why we need this try statement - can you explain? if any refactor is needed, let me know #3903 (comment)

@@ -136,6 +136,13 @@
KandinskyInpaintPipeline,
KandinskyPipeline,
KandinskyPriorPipeline,
Kandinsky2_2_DecoderControlnetImg2ImgPipeline,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten
these Decoder pipelines are main pipelines that contains image generation process - this is consistent with the naming in the original repo and it makes sense I think

however might be a little bit confusing for our users - should we rename them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it would be nicer to remove _Decoder here

@@ -428,6 +449,46 @@ def forward(self, text_embeds: torch.FloatTensor, image_embeds: torch.FloatTenso

return time_image_embeds + time_text_embeds

class ImageTimeEmbedding(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

self,
unet: UNet2DConditionModel,
scheduler: DDPMScheduler,
vae: VQModel,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be consistent with the 2.1 in diffusers and name it movq ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cene555 is the model a movq here or just a vae?

Comment on lines +298 to +300
# YiYi notes: only reason this pipeline can't work with unclip scheduler is that can't pass down this argument
# need to use DDPM scheduler instead
# prev_timestep=prev_timestep,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# YiYi notes: only reason this pipeline can't work with unclip scheduler is that can't pass down this argument
# need to use DDPM scheduler instead
# prev_timestep=prev_timestep,

we can remove this note now:)


class Kandinsky2_2_DecoderControlnetPipeline(DiffusionPipeline):
"""
Pipeline for text-to-image generation using Kandinsky
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Pipeline for text-to-image generation using Kandinsky
Pipeline for text-to-image generation using Kandinsky with ControlNet guidance.

noise_pred, _ = noise_pred.split(latents.shape[1], dim=1)

# compute the previous noisy sample x_t -> x_t-1
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this here: do we have a scheduler.step that does not accept generator argument?

in any case we should be very clear about what schedulers this pipeline can work with and be able to work with them


class Kandinsky2_2_DecoderImg2ImgPipeline(DiffusionPipeline):
"""
Pipeline for text-to-image generation using Kandinsky
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Pipeline for text-to-image generation using Kandinsky
Pipeline for image-to-image generation using Kandinsky

Comment on lines +62 to +65
text_encoder ([`MultilingualCLIP`]):
Frozen text-encoder.
tokenizer ([`XLMRobertaTokenizer`]):
Tokenizer of class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
text_encoder ([`MultilingualCLIP`]):
Frozen text-encoder.
tokenizer ([`XLMRobertaTokenizer`]):
Tokenizer of class

noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)

# get latents
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to make sure either it works with ddpm or add add_noise method to unclip scheduler

Comment on lines +191 to +194
text_encoder ([`MultilingualCLIP`]):
Frozen text-encoder.
tokenizer ([`XLMRobertaTokenizer`]):
Tokenizer of class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
text_encoder ([`MultilingualCLIP`]):
Frozen text-encoder.
tokenizer ([`XLMRobertaTokenizer`]):
Tokenizer of class


self.num_image_text_embeds = num_image_text_embeds
self.image_embeds = nn.Linear(image_embed_dim, self.num_image_text_embeds * cross_attention_dim)
self.norm = nn.LayerNorm(cross_attention_dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.norm = nn.LayerNorm(cross_attention_dim)
self.norm = nn.LayerNorm(cross_attention_dim)

@@ -26,7 +26,10 @@
from .embeddings import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here look all good to me! would just be nice / important to add some tests and docstrings

Comment on lines +469 to +485
self.input_hint_block = nn.Sequential(
nn.Conv2d(3, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 32, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(32, 32, 3, padding=1),
nn.SiLU(),
nn.Conv2d(32, 96, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(96, 96, 3, padding=1),
nn.SiLU(),
nn.Conv2d(96, 256, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(256, 4, 3, padding=1)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.input_hint_block = nn.Sequential(
nn.Conv2d(3, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 32, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(32, 32, 3, padding=1),
nn.SiLU(),
nn.Conv2d(32, 96, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(96, 96, 3, padding=1),
nn.SiLU(),
nn.Conv2d(96, 256, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(256, 4, 3, padding=1)
)
self.input_hint_block = nn.Sequential(
nn.Conv2d(3, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 16, 3, padding=1),
nn.SiLU(),
nn.Conv2d(16, 32, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(32, 32, 3, padding=1),
nn.SiLU(),
nn.Conv2d(32, 96, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(96, 96, 3, padding=1),
nn.SiLU(),
nn.Conv2d(96, 256, 3, padding=1, stride=2),
nn.SiLU(),
nn.Conv2d(256, 4, 3, padding=1)
)

@@ -375,6 +375,27 @@ def forward(self, text_embeds: torch.FloatTensor, image_embeds: torch.FloatTenso

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me!

@@ -136,6 +136,13 @@
KandinskyInpaintPipeline,
KandinskyPipeline,
KandinskyPriorPipeline,
Kandinsky2_2_DecoderControlnetImg2ImgPipeline,
Copy link
Contributor

@patrickvonplaten patrickvonplaten Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kandinsky2_2_DecoderControlnetImg2ImgPipeline,
KandinskyControlnetV22Img2ImgPipeline,

Comment on lines +140 to +145
Kandinsky2_2_DecoderControlnetPipeline,
Kandinsky2_2_DecoderImg2ImgPipeline,
Kandinsky2_2_DecoderPipeline,
Kandinsky2_2PriorEmb2EmbPipeline,
Kandinsky2_2PriorPipeline,
Kandinsky2_2_DecoderInpaintPipeline,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kandinsky2_2_DecoderControlnetPipeline,
Kandinsky2_2_DecoderImg2ImgPipeline,
Kandinsky2_2_DecoderPipeline,
Kandinsky2_2PriorEmb2EmbPipeline,
Kandinsky2_2PriorPipeline,
Kandinsky2_2_DecoderInpaintPipeline,
KandinskyV22ControlnetPipeline,
KandinskyV22Img2ImgPipeline,
KandinskyV22DecoderPipeline,
KandinskyV22PriorEmb2EmbPipeline,
KandinskyV22PriorPipeline,
KandinskyV22InpaintPipeline,

@yiyixuxu yiyixuxu mentioned this pull request Jul 4, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants