Skip to content

Add ddpm kandinsky #3783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 21, 2023
Merged

Add ddpm kandinsky #3783

merged 6 commits into from
Jun 21, 2023

Conversation

yiyixuxu
Copy link
Collaborator

This PR will refactor Kandinsky pipelines to work with both ddpm and ddim scheduler

@sayakpaul @patrickvonplaten
currently the scheduler config in Kandinsky repo is for ddim, is there anyway we can also save ddpm scheduler config some where? instead having to manually create the ddpm config when user wants swap it the scheduler?

from diffusers import KandinskyPipeline, KandinskyPriorPipeline, DDPMScheduler

import torch
import numpy as np

device = "cuda"

# # inputs
prompt= "red cat, 4k photo"

# # create prior 
pipe_prior = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior")
pipe_prior.to("cuda")

# use prior to generate image_emb based on our prompt
generator = torch.Generator(device=device).manual_seed(0)
out = pipe_prior(prompt, num_inference_steps=5, generator=generator, )
image_emb = out.image_embeds
zero_image_emb = out.negative_image_embeds


# create diffuser pipeline
model_dtype = torch.float16
pipe = KandinskyPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype =model_dtype )
pipe.to(device)

ddpm_scheduler = DDPMScheduler(
  clip_sample= True,
  clip_sample_range=2.0,
  sample_max_value=None,
  num_train_timesteps= 1000,
  prediction_type= "epsilon",
  variance_type= "learned_range",
  thresholding= True,
  beta_schedule= "linear",
  beta_start= 0.00085,
  beta_end=0.012
)
pipe.scheduler = ddpm_scheduler

generator = torch.Generator(device="cuda").manual_seed(0)
out = pipe(
    prompt,
    image_embeds=image_emb,
    negative_image_embeds =zero_image_emb,
    height=768,
    width=768,
    num_inference_steps=100,
    generator=generator )

image = out.images[0]

yiyi_test_pipe_ddpm3_out

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 14, 2023

The documentation is not available anymore as the PR was closed or merged.

@sayakpaul
Copy link
Member

Why can't we do something like so?

pipeline.scheduler = DDPMScheduler.from_config(
    pipeline.scheduler.config, **scheduler_args
)

**scheduler_args could take care of any new args that might be needed here. A good example of using **scheduler_args is here:
https://github.com/huggingface/diffusers/blob/ce5504934ac484fca39a1a5434ecfae09eabdf41/examples/dreambooth/train_dreambooth_lora.py#LL1261C17-L1263C18

@yiyixuxu
Copy link
Collaborator Author

still not able to reproduce the doc examples since I think the initial noise construction is a little bit different now with the most recent UI change for negative prompts but like these new images too

image
image

@yiyixuxu
Copy link
Collaborator Author

@sayakpaul yeah but then user has to know the new args for scheduler, which is pretty much everything - the config for ddim and ddpm is really different for kandinsky

ddim

{
   "num_train_timesteps": 1000,
   "beta_schedule":  "linear",
   "beta_start": 0.00085,
   "beta_end":0.012,
    "clip_sample" : false,
    "set_alpha_to_one" : false,
    "steps_offset" : 1,
    "prediction_type" : "epsilon",
    "thresholding" : false
}

ddpm

DDPMScheduler(
  clip_sample= True,
  clip_sample_range=2.0,
  sample_max_value=None,
  num_train_timesteps= 1000,
  prediction_type= "epsilon",
  variance_type= "learned_range",
  thresholding= True,
  beta_schedule= "linear",
  beta_start= 0.00085,
  beta_end=0.012
)

@sayakpaul
Copy link
Member

Hmm, thanks for explaining. But I don't think we can have two different scheduler configurations here as it deviates from how we maintain their storage on the Hub. So, either

  • we can add an entry to the doc showing how to initialize the DDPM scheduler properly

or

  • we can have a pipeline saved on the Hub but with the DDPM scheduler so that the user doesn't have to configure anything.

I would prefer to go with the first approach as that way user retains flexibility.

@@ -367,9 +367,14 @@ def step(
)

# 3. Clip or threshold "predicted x_0"
if self.config.sample_max_value is None and self.config.clip_sample:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this change - what are we doing here exactly? Is the problem that thresholding is done after clamping in Kandinsky?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten

I just need to clamp it and then thresholding

Currently, it will only do thresholding OR clamping, never both together, so if you set clip_smple to be True, it will NOT clip it if threholding is also True. I checked the config for IF, bothclip_sample and thresholding are set to be True there, but only threholding will be applied https://huggingface.co/DeepFloyd/IF-I-XL-v1.0/blob/main/scheduler/scheduler_config.json

I'm not sure why we need this if ... else... logic , I think it's confusing. if we only want threholding, we should just set clip_sample to be False. Removing it now will break IF I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for why thresholding and clipping is done in an if, else statement is because it's more or less the same operation.

If you do thresholding your output has to be clipped to [-1, 1] as can be seen here:

sample = torch.clamp(sample, -s, s) / s # "we threshold xt0 to the range [-s, s] and then divide by s"

So in the case of IF even if the self.config.clip_sample would be run afterward it would make no difference as the output is already scaled to be between -1 and 1. We could/should probably add a warning that running thresholding and clipping together doesn't make sense and should be avoided. For IF it would probably be best if we just set clip_sample=False or is this used for anything else ? cc @williamberman

@yiyixuxu for Kandinsky the current "working" scheduler config is:

DDPMScheduler(
  clip_sample= True,
  clip_sample_range=2.0,
  sample_max_value=None,
  num_train_timesteps= 1000,
  prediction_type= "epsilon",
  variance_type= "learned_range",
  thresholding= True,
  beta_schedule= "linear",
  beta_start= 0.00085,
  beta_end=0.012
)

with this PR. I'm fairly certain that the following config without any changes to scheduling_ddim.py is the same:

DDPMScheduler(
  clip_sample= False,
  sample_max_value=2.0,
  num_train_timesteps= 1000,
  prediction_type= "epsilon",
  variance_type= "learned_range",
  thresholding= True,
  beta_schedule= "linear",
  beta_start= 0.00085,
  beta_end=0.012
)

Could you try this out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current logic since clip sample is the default, we just prioritize thresholding if its also set. So we're ok with both being set but if so we just do the threshold clipping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah it's an if/else because thresholding is a form of clipping so it doesn't make sense to do both

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess overall it just doesn't make too much sense to set both no? Think a warning like "Your clip sample is useless and doesn't do anything" could make sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try this out?

will try and report back :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah no issues with either

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten the cat image looks same to me! will revert

is math the same, or just that difference too small that we can ignore? I understood clip sample and dynamic thresholding are trying to do the same thing, but when I looked at the formula, I thought clamping first would cause the threshold to be different and cause a difference in the result, no the case it seems?
yiyi_test_pipe_ddpm_patrick_out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think math is the same here actually

@@ -390,8 +408,13 @@ def __call__(
image_embeds = torch.cat([negative_image_embeds, image_embeds], dim=0).to(
dtype=prompt_embeds.dtype, device=device
)
if hasattr(self.scheduler, "custom_timesteps"):
Copy link
Contributor

@patrickvonplaten patrickvonplaten Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think this logic makes much sense thb. How are custom timesteps related to a different timestep spacing which is chosen here?

If we need a different spacing for Kandinsky, let's add this to the DDPMScheduler similarly to:

if self.config.timestep_spacing == "leading":

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten
looked into this, I think kandinsky's method of calculating step ratio is just slightly different from what's already in DDPM, maybe we don't need to make any changes as long as it does not affect generation quality?

cc @williamberman because Kandinsky timesteps is is exactly same as unclip

  • kandinsky/ unclip scheduler
step_ratio = (self.scheduler.config.num_train_timesteps - 1) / (num_inference_steps - 1)
timesteps = (np.arange(0, num_inference_steps) * step_ratio).round()[::-1].copy().astype(np.int64)
  • ddpm
step_ratio = self.config.num_train_timesteps // self.num_inference_steps
timesteps = (np.arange(0, num_inference_steps) * step_ratio).round()[::-1].copy().astype(np.int64)

compared the cheeseburger monster, the one above is generated with Kandinsky timesteps; the one below is generated with ddpm scheduler without any change. I can see the eyes and mouth are a little bit different for the monster on the right but it really doesn't affect quality

yiyi_test_pipe_ddpm_burger_out_original

yiyi_test_pipe_ddpm_burger_out

only super minor difference here too

yiyi_test_pipe_ddpm_hair_out_original

yiyi_test_pipe_ddpm_hair_out

@patrickvonplaten
Copy link
Contributor

Agree that the DDPM scheduler works better, but also agree that we should do anything backwards breaking.

Given that the schedulers seem to be completely different to each other, I think we should just upload the DDPM scheduler to a new folder that can then be downloaded by the user easily.

Uploaded it here: https://huggingface.co/kandinsky-community/kandinsky-2-1/tree/main/ddpm_scheduler

@yiyixuxu we could just add the following line to the docs:

from diffusers import DDPMScheduler

scheduler = DDPMScheduler.from_pretrained("kandinsky-community/kandinsky-2-1", subfolder="ddpm_scheduler")
pipe = DiffusionPipeline.from_pretrained(..., scheduler=scheduler)

@patrickvonplaten
Copy link
Contributor

Also the design of the this PR needs to be adapted tough. I think this PR introduces complex/weird design choices

@patrickvonplaten
Copy link
Contributor

Let's try to get this one in soon @yiyixuxu

@yiyixuxu
Copy link
Collaborator Author

@patrickvonplaten ready for another review

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool that works!

@yiyixuxu yiyixuxu merged commit 95ea538 into main Jun 21, 2023
@patrickvonplaten patrickvonplaten deleted the add-ddpm-kandinsky branch June 28, 2023 11:10
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* update doc

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* update doc

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants