-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Add ddpm kandinsky #3783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ddpm kandinsky #3783
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Why can't we do something like so? pipeline.scheduler = DDPMScheduler.from_config(
pipeline.scheduler.config, **scheduler_args
)
|
@sayakpaul yeah but then user has to know the new args for scheduler, which is pretty much everything - the config for ddim and ddpm is really different for kandinsky ddim {
"num_train_timesteps": 1000,
"beta_schedule": "linear",
"beta_start": 0.00085,
"beta_end":0.012,
"clip_sample" : false,
"set_alpha_to_one" : false,
"steps_offset" : 1,
"prediction_type" : "epsilon",
"thresholding" : false
} ddpm DDPMScheduler(
clip_sample= True,
clip_sample_range=2.0,
sample_max_value=None,
num_train_timesteps= 1000,
prediction_type= "epsilon",
variance_type= "learned_range",
thresholding= True,
beta_schedule= "linear",
beta_start= 0.00085,
beta_end=0.012
) |
Hmm, thanks for explaining. But I don't think we can have two different scheduler configurations here as it deviates from how we maintain their storage on the Hub. So, either
or
I would prefer to go with the first approach as that way user retains flexibility. |
@@ -367,9 +367,14 @@ def step( | |||
) | |||
|
|||
# 3. Clip or threshold "predicted x_0" | |||
if self.config.sample_max_value is None and self.config.clip_sample: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this change - what are we doing here exactly? Is the problem that thresholding is done after clamping in Kandinsky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just need to clamp it and then thresholding
Currently, it will only do thresholding OR clamping, never both together, so if you set clip_smple
to be True
, it will NOT clip it if threholding
is also True
. I checked the config for IF, bothclip_sample
and thresholding
are set to be True
there, but only threholding will be applied https://huggingface.co/DeepFloyd/IF-I-XL-v1.0/blob/main/scheduler/scheduler_config.json
I'm not sure why we need this if ... else...
logic , I think it's confusing. if we only want threholding, we should just set clip_sample
to be False
. Removing it now will break IF I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for why thresholding and clipping is done in an if, else
statement is because it's more or less the same operation.
If you do thresholding your output has to be clipped to [-1, 1] as can be seen here:
sample = torch.clamp(sample, -s, s) / s # "we threshold xt0 to the range [-s, s] and then divide by s" |
So in the case of IF even if the self.config.clip_sample
would be run afterward it would make no difference as the output is already scaled to be between -1 and 1. We could/should probably add a warning that running thresholding
and clipping
together doesn't make sense and should be avoided. For IF it would probably be best if we just set clip_sample=False
or is this used for anything else ? cc @williamberman
@yiyixuxu for Kandinsky the current "working" scheduler config is:
DDPMScheduler(
clip_sample= True,
clip_sample_range=2.0,
sample_max_value=None,
num_train_timesteps= 1000,
prediction_type= "epsilon",
variance_type= "learned_range",
thresholding= True,
beta_schedule= "linear",
beta_start= 0.00085,
beta_end=0.012
)
with this PR. I'm fairly certain that the following config without any changes to scheduling_ddim.py
is the same:
DDPMScheduler(
clip_sample= False,
sample_max_value=2.0,
num_train_timesteps= 1000,
prediction_type= "epsilon",
variance_type= "learned_range",
thresholding= True,
beta_schedule= "linear",
beta_start= 0.00085,
beta_end=0.012
)
Could you try this out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current logic since clip sample is the default, we just prioritize thresholding if its also set. So we're ok with both being set but if so we just do the threshold clipping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yeah it's an if/else because thresholding is a form of clipping so it doesn't make sense to do both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess overall it just doesn't make too much sense to set both no? Think a warning like "Your clip sample is useless and doesn't do anything" could make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try this out?
will try and report back :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah no issues with either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten the cat image looks same to me! will revert
is math the same, or just that difference too small that we can ignore? I understood clip sample and dynamic thresholding are trying to do the same thing, but when I looked at the formula, I thought clamping first would cause the threshold to be different and cause a difference in the result, no the case it seems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think math is the same here actually
@@ -390,8 +408,13 @@ def __call__( | |||
image_embeds = torch.cat([negative_image_embeds, image_embeds], dim=0).to( | |||
dtype=prompt_embeds.dtype, device=device | |||
) | |||
if hasattr(self.scheduler, "custom_timesteps"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't think this logic makes much sense thb. How are custom timesteps related to a different timestep spacing which is chosen here?
If we need a different spacing for Kandinsky, let's add this to the DDPMScheduler similarly to:
if self.config.timestep_spacing == "leading": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten
looked into this, I think kandinsky's method of calculating step ratio is just slightly different from what's already in DDPM, maybe we don't need to make any changes as long as it does not affect generation quality?
cc @williamberman because Kandinsky timesteps is is exactly same as unclip
- kandinsky/ unclip scheduler
step_ratio = (self.scheduler.config.num_train_timesteps - 1) / (num_inference_steps - 1)
timesteps = (np.arange(0, num_inference_steps) * step_ratio).round()[::-1].copy().astype(np.int64)
- ddpm
step_ratio = self.config.num_train_timesteps // self.num_inference_steps
timesteps = (np.arange(0, num_inference_steps) * step_ratio).round()[::-1].copy().astype(np.int64)
compared the cheeseburger monster, the one above is generated with Kandinsky timesteps; the one below is generated with ddpm scheduler without any change. I can see the eyes and mouth are a little bit different for the monster on the right but it really doesn't affect quality
only super minor difference here too
Agree that the DDPM scheduler works better, but also agree that we should do anything backwards breaking. Given that the schedulers seem to be completely different to each other, I think we should just upload the DDPM scheduler to a new folder that can then be downloaded by the user easily. Uploaded it here: https://huggingface.co/kandinsky-community/kandinsky-2-1/tree/main/ddpm_scheduler @yiyixuxu we could just add the following line to the docs: from diffusers import DDPMScheduler
scheduler = DDPMScheduler.from_pretrained("kandinsky-community/kandinsky-2-1", subfolder="ddpm_scheduler")
pipe = DiffusionPipeline.from_pretrained(..., scheduler=scheduler) |
Also the design of the this PR needs to be adapted tough. I think this PR introduces complex/weird design choices |
Let's try to get this one in soon @yiyixuxu |
@patrickvonplaten ready for another review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool that works!
* update doc --------- Co-authored-by: yiyixuxu <yixu310@gmail,com>
* update doc --------- Co-authored-by: yiyixuxu <yixu310@gmail,com>
This PR will refactor Kandinsky pipelines to work with both ddpm and ddim scheduler
@sayakpaul @patrickvonplaten
currently the scheduler config in Kandinsky repo is for ddim, is there anyway we can also save ddpm scheduler config some where? instead having to manually create the ddpm config when user wants swap it the scheduler?