-
Notifications
You must be signed in to change notification settings - Fork 6k
Added the ability to set SDXL Micro-Conditioning
embeddings as 0
#4208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the detailed issue. Yes, we're aware of this issue. @patrickvonplaten I suppose you were working on it? |
Actually only now noticed this - thanks for bringing it up @budui ! Do you think it's also important to provide this feature for inference or just for training? |
Both training and inference should require this feature. For training, diffusers may need to have the ability to reproduce Stability AI's training scripts. For inference, the current SDXL Pipeline lacks the ability to specify a negative micro condition (specified as a specific value or zero embedding). I did a quick experiment, specifying a negative condition: A: condition and negative conditon use the same micro condition as diffusers SDXL pipeline doing now.# prompt: "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# seed: 1000
# original size (1024, 1024) vs (1024, 1024)
condition=dict(
caption=prompt,
crop_left=0,
crop_top=0,
original_height=1024,
original_width=1024,
target_height=1024,
target_width=1024,
),
negative_condition=dict(
caption="",
crop_left=0,
crop_top=0,
original_height=1024,
original_width=1024,
target_height=1024,
target_width=1024,
), B: Negative conditions use a lower original size, resulting in a clearer image# prompt: "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# seed: 1000
# original size (1024, 1024) vs (512, 512)
condition=dict(
caption=prompt,
crop_left=0,
crop_top=0,
original_height=1024,
original_width=1024,
target_height=1024,
target_width=1024,
),
negative_condition=dict(
caption="",
crop_left=0,
crop_top=0,
original_height=512,
original_width=512,
target_height=1024,
target_width=1024,
), I haven't come to the effect of using zero embedding as a negative condition, because I haven't found a quick workaround to do it. But I'd be happy to do more testing after diffusers add a way to specify zero embedding in UNet |
@budui sorry for the delay on our end. Would you maybe be willing to contribute this feature in a PR? We're more than happy to help out. |
@sayakpaul do you want to give this PR/issue a try? |
Yeah |
Is your feature request related to a problem? Please describe.
During the SDXL training process, it may be necessary to pass in a zero embedding as
Micro-Conditioning
embeddings:https://github.com/Stability-AI/generative-models/blob/e25e4c0df1d01fb9720f62c73b4feab2e4003e3f/sgm/modules/encoders/modules.py#L151-L161
https://github.com/Stability-AI/generative-models/blob/e25e4c0df1d01fb9720f62c73b4feab2e4003e3f/configs/example_training/txt2img-clipl-legacy-ucg-training.yaml#L65
Current SDXL-
UNet2DConditionModel
acceptsencoder_hidden_states
,time_ids
andadd_text_embeds
as condition.diffusers/src/diffusers/models/unet_2d_condition.py
Lines 843 to 854 in 2e53936
To correctly finetune the SDXL model, we need to randomly set the condition embeddings to 0 with a suitable probability.
While it is easy to set
encoder_hidden_states
andadd_text_embeds
as zero embedding, It is impossible to zerotime_embeds
at line 849.original SDXL uses different embedders to convert different micro-conditions into Fourier features. during training, different Fourier features are independently randomly set to 0. Therefore,
UNet2DConditionModel
need to be able to independently zerotime_embeds
part.Describe the solution you'd like
Added the ability to set SDXL
Micro-Conditioning
embeddings as 0.Describe alternatives you've considered
Perhaps it is possible to allow diffusers users to pass in a
time_embeds
, and iftime_embeds
exists,time_ids
are no longer used?The text was updated successfully, but these errors were encountered: