Added the ability to set SDXL `Micro-Conditioning` embeddings as 0 #4208

budui · 2023-07-22T12:19:52Z

Is your feature request related to a problem? Please describe.

During the SDXL training process, it may be necessary to pass in a zero embedding as Micro-Conditioning embeddings:

https://github.com/Stability-AI/generative-models/blob/e25e4c0df1d01fb9720f62c73b4feab2e4003e3f/sgm/modules/encoders/modules.py#L151-L161

# those line will randomly set embedding as zero if `ucg_rate` > 0
                if embedder.ucg_rate > 0.0 and embedder.legacy_ucg_val is None:
                    emb = (
                        expand_dims_like(
                            torch.bernoulli(
                                (1.0 - embedder.ucg_rate)
                                * torch.ones(emb.shape[0], device=emb.device)
                            ),
                            emb,
                        )
                        * emb
                    )

https://github.com/Stability-AI/generative-models/blob/e25e4c0df1d01fb9720f62c73b4feab2e4003e3f/configs/example_training/txt2img-clipl-legacy-ucg-training.yaml#L65

# SDXL set  the `ucg_rate` of `original_size_as_tuple` embedder as 0.1. 
# so during traning, we need to pass zero embedding as added embedding for time embedding of Unet
            ucg_rate: 0.1
            input_key: original_size_as_tuple
            target: sgm.modules.encoders.modules.ConcatTimestepEmbedderND
            params:
              outdim: 256  # multiplied by two

Current SDXL-UNet2DConditionModel accepts encoder_hidden_states, time_ids and add_text_embeds as condition.

diffusers/src/diffusers/models/unet_2d_condition.py

Lines 843 to 854 in 2e53936

    
           text_embeds = added_cond_kwargs.get("text_embeds") 
        
           if "time_ids" not in added_cond_kwargs: 
        
               raise ValueError( 
        
                   f"{self.__class__} has the config param `addition_embed_type` set to 'text_time' which requires the keyword argument `time_ids` to be passed in `added_cond_kwargs`" 
        
               ) 
        
           time_ids = added_cond_kwargs.get("time_ids") 
        
           time_embeds = self.add_time_proj(time_ids.flatten()) 
        
           time_embeds = time_embeds.reshape((text_embeds.shape[0], -1)) 
        
           add_embeds = torch.concat([text_embeds, time_embeds], dim=-1) 
        
           add_embeds = add_embeds.to(emb.dtype) 
        
           aug_emb = self.add_embedding(add_embeds)

To correctly finetune the SDXL model, we need to randomly set the condition embeddings to 0 with a suitable probability.
While it is easy to set encoder_hidden_states and add_text_embeds as zero embedding, It is impossible to zero time_embeds at line 849.

original SDXL uses different embedders to convert different micro-conditions into Fourier features. during training, different Fourier features are independently randomly set to 0. Therefore, UNet2DConditionModel need to be able to independently zero time_embeds part.

Describe the solution you'd like

Added the ability to set SDXL Micro-Conditioning embeddings as 0.

Describe alternatives you've considered

Perhaps it is possible to allow diffusers users to pass in a time_embeds, and if time_embeds exists, time_ids are no longer used?

if "time_embeds" in added_cond_kwargs:
    time_embeds = added_cond_kwargs.get("time_embeds") 
else:
    time_ids = added_cond_kwargs.get("time_ids") 
     time_embeds = self.add_time_proj(time_ids.flatten()) 
time_embeds = time_embeds.reshape((text_embeds.shape[0], -1))

The text was updated successfully, but these errors were encountered:

sayakpaul · 2023-07-24T02:25:43Z

Thanks for the detailed issue. Yes, we're aware of this issue.

@patrickvonplaten I suppose you were working on it?

patrickvonplaten · 2023-07-24T18:57:17Z

Actually only now noticed this - thanks for bringing it up @budui !

Do you think it's also important to provide this feature for inference or just for training?

budui · 2023-07-25T02:40:30Z

Both training and inference should require this feature. For training, diffusers may need to have the ability to reproduce Stability AI's training scripts. For inference, the current SDXL Pipeline lacks the ability to specify a negative micro condition (specified as a specific value or zero embedding).

I did a quick experiment, specifying a negative condition:

A: condition and negative conditon use the same micro condition as diffusers SDXL pipeline doing now.

# prompt: "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# seed: 1000
# original size (1024, 1024) vs (1024, 1024)
condition=dict(
        caption=prompt,
        crop_left=0,
        crop_top=0,
        original_height=1024,
        original_width=1024,
        target_height=1024,
        target_width=1024,
),
negative_condition=dict(
        caption="",
        crop_left=0,
        crop_top=0,
        original_height=1024,
        original_width=1024,
        target_height=1024,
        target_width=1024,
 ),

B: Negative conditions use a lower original size, resulting in a clearer image

# prompt: "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
# seed: 1000
# original size (1024, 1024) vs (512, 512)
condition=dict(
        caption=prompt,
        crop_left=0,
        crop_top=0,
        original_height=1024,
        original_width=1024,
        target_height=1024,
        target_width=1024,
),
negative_condition=dict(
        caption="",
        crop_left=0,
        crop_top=0,
        original_height=512,
        original_width=512,
        target_height=1024,
        target_width=1024,
 ),

I haven't come to the effect of using zero embedding as a negative condition, because I haven't found a quick workaround to do it. But I'd be happy to do more testing after diffusers add a way to specify zero embedding in UNet

sayakpaul · 2023-08-04T03:12:25Z

@budui sorry for the delay on our end. Would you maybe be willing to contribute this feature in a PR? We're more than happy to help out.

patrickvonplaten · 2023-08-23T20:23:18Z

@sayakpaul do you want to give this PR/issue a try?

sayakpaul · 2023-08-24T10:33:55Z

Yeah

This was referenced Aug 24, 2023

improve setup.py #4748

Merged

[Core] Support negative conditions in SDXL #4774

Merged

sayakpaul closed this as completed in #4774 Aug 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added the ability to set SDXL `Micro-Conditioning` embeddings as 0 #4208

Added the ability to set SDXL `Micro-Conditioning` embeddings as 0 #4208

budui commented Jul 22, 2023

sayakpaul commented Jul 24, 2023

Uh oh!

patrickvonplaten commented Jul 24, 2023

Uh oh!

budui commented Jul 25, 2023

Uh oh!

sayakpaul commented Aug 4, 2023

Uh oh!

patrickvonplaten commented Aug 23, 2023

Uh oh!

sayakpaul commented Aug 24, 2023

Uh oh!

Added the ability to set SDXL Micro-Conditioning embeddings as 0 #4208

Added the ability to set SDXL Micro-Conditioning embeddings as 0 #4208

Comments

budui commented Jul 22, 2023

sayakpaul commented Jul 24, 2023

Uh oh!

patrickvonplaten commented Jul 24, 2023

Uh oh!

budui commented Jul 25, 2023

A: condition and negative conditon use the same micro condition as diffusers SDXL pipeline doing now.

B: Negative conditions use a lower original size, resulting in a clearer image

Uh oh!

sayakpaul commented Aug 4, 2023

Uh oh!

patrickvonplaten commented Aug 23, 2023

Uh oh!

sayakpaul commented Aug 24, 2023

Uh oh!

Added the ability to set SDXL `Micro-Conditioning` embeddings as 0 #4208

Added the ability to set SDXL `Micro-Conditioning` embeddings as 0 #4208