Correct bad attn naming #3797

patrickvonplaten · 2023-06-15T11:52:59Z

This PR corrects incorrect variable usage / naming as discovered in #2011 in a non-breaking way

HuggingFaceDocBuilderDev · 2023-06-15T12:00:29Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2023-06-15T12:11:24Z

src/diffusers/models/unet_2d_condition.py

+        # The reason for this behavior is to correct for incorrectly named variables that were introduced
+        # when this library was created. The incorrect naming was only discovered much later in https://github.com/huggingface/diffusers/issues/2011#issuecomment-1547958131
+        # Changing `attention_head_dim` to `num_attention_heads` for 40,000 configurations is too backwards breaking
+        # which is why we correct for the naming here.


admitting naming mess-up 😅

src/diffusers/models/unet_2d_condition.py

src/diffusers/models/controlnet.py

patrickvonplaten · 2023-06-15T12:27:00Z

src/diffusers/models/unet_2d_blocks.py

@@ -35,7 +35,7 @@ def get_down_block(
    add_downsample,
    resnet_eps,
    resnet_act_fn,
-    attn_num_head_channels,


Renaming is fine here as none of the classes are public classes

src/diffusers/models/unet_2d_blocks_flax.py

src/diffusers/models/controlnet.py

sayakpaul

Good that you identified this still relatively early.

If you can, please share tips for introducing these kinds of changes efficiently.

src/diffusers/models/unet_2d_condition_flax.py

yiyixuxu · 2023-06-15T18:44:20Z

src/diffusers/models/unet_2d_blocks.py

-                        heads=in_channels // attn_num_head_channels if attn_num_head_channels is not None else 1,
-                        dim_head=attn_num_head_channels if attn_num_head_channels is not None else in_channels,
+                        heads=in_channels // attn_num_heads if attn_num_heads is not None else 1,
+                        dim_head=attn_num_heads if attn_num_heads is not None else in_channels,


my head is 🤯 a little bit now reading this PR so I could be wrong

but from what I understand the attn_num_head_channels has been passed correctly as dim_head for all attention classes except for Transformer2D and this change will cause unexpected behavior when we use the new num_attention_heads argument from the UNet2DConditionModel for other attention class

Taking AttnDownBlock2D as example here

using attention_head_dim argument will return expected result

from diffusers import UNet2DConditionModel down_block_types = ("AttnDownBlock2D",) up_block_types = ("AttnUpBlock2D",) unet = UNet2DConditionModel( attention_head_dim = 16, block_out_channels = (320,), down_block_types = down_block_types, up_block_types = up_block_types) # this prints 20 unet.down_blocks[0].attentions[0].heads

using num_attention_heads argument will return the wrong results

down_block_types = ("AttnDownBlock2D",) up_block_types = ("AttnUpBlock2D",) unet = UNet2DConditionModel( num_attention_heads = 16, block_out_channels = (320,), down_block_types = down_block_types, up_block_types = up_block_types) # this prints 20, which is wrong? should be 16 unet.down_blocks[0].attentions[0].heads

maybe we need to path both num_attention_heads and attention_head_dim to the blocks?

You're 100% right great catch! I think there was a double incorrect naming correcting itself here haha

Shouldn't I expect any mapping between what the user specifies as num_attention_heads and attention_head_dim and how it's handled internally?

To put this perspective,

from diffusers import UNet2DConditionModel down_block_types = ("AttnDownBlock2D",) up_block_types = ("AttnUpBlock2D",) unet = UNet2DConditionModel( num_attention_heads = 16, block_out_channels = (320,), down_block_types = down_block_types, up_block_types = up_block_types ) # this prints 40 unet.down_blocks[0].attentions[0].heads

and

down_block_types = ("AttnDownBlock2D",) up_block_types = ("AttnUpBlock2D",) unet = UNet2DConditionModel( attention_head_dim = 16, block_out_channels = (320,), down_block_types = down_block_types, up_block_types = up_block_types) # this prints 20 unet.down_blocks[0].attentions[0].heads

Are all these expected? Like shouldn't unet.down_blocks[0].attentions[0].heads print 16 unless I am missing out on something?

Just in case you don't miss it, pinging @patrickvonplaten (apologies in advance).

@sayakpaul num_attention_heads is not used for AttnDownBlock2D, only for CrossAttnDownBlock2D. If you run your first test with that type of block, num_attention_heads would be 16 as expected. But yes, I understand it can be confusing. Not sure how we can deal with it, perhaps we can follow up in a new PR?

Hmm immediately, I don't have any idea to mitigate that faithfully, though.

@pcuenca i thought both AttnDownBlock2D and CrossAttnDownBlock2D need this argument no? anything with the Attention class

src/diffusers/models/unet_2d_blocks.py

src/diffusers/models/unet_2d_blocks_flax.py

sayakpaul

Did another pass and left some comments.

How can we best test the changes to ensure robustness here?

pcuenca

Interesting PR and issue! Asked some clarifications on a couple of details :)

src/diffusers/models/controlnet.py

pcuenca · 2023-06-20T10:47:26Z

src/diffusers/models/controlnet.py

@@ -219,7 +228,7 @@ def __init__(
                resnet_act_fn=act_fn,
                resnet_groups=norm_num_groups,
                cross_attention_dim=cross_attention_dim,
-                attn_num_head_channels=attention_head_dim[i],
+                num_attention_heads=num_attention_heads[i],


Shouldn't we pass attn_head_dim = attention_head_dim here too? We are ignoring it (replaced with num_attention_heads) but then get_down_block will complain that it's recommended to passattn_head_dim and default to copying it from num_attention_heads. We have all the information at this point in the caller.

Also make naming consistent when we can (attn_head_dim vs attention_head_dim)

Yes good point!

src/diffusers/models/unet_2d_blocks.py

pcuenca · 2023-06-20T10:56:23Z

src/diffusers/models/unet_2d_blocks.py

 ):
+    # If attn head dim is not defined, we default it to the number of heads
+    if attn_head_dim is None:


Suggested change

if attn_head_dim is None:

if attn_head_dim is None and num_attention_heads is not None:

When we call with None (i.e., in the vae), there's no point in showing the warning imo.

We should not pass None anymore to get_up_blocks and get_down_blocks IMO. I'm correcting the VAE here

src/diffusers/models/unet_2d_blocks.py

src/diffusers/models/controlnet_flax.py

src/diffusers/models/unet_2d_condition_flax.py

patrickvonplaten · 2023-06-21T11:58:46Z

src/diffusers/models/controlnet.py

@@ -221,7 +233,8 @@ def __init__(
                resnet_act_fn=act_fn,
                resnet_groups=norm_num_groups,
                cross_attention_dim=cross_attention_dim,
-                attn_num_head_channels=attention_head_dim[i],
+                num_attention_heads=num_attention_heads[i],
+                attention_head_dim=attention_head_dim[i] if attention_head_dim[i] is not None else output_channel,


Let's make sure we never pass attention_head_dim[i] = None to the get_up_block / get_down_block function. This reduces the black magic in the block code and makes it easier for the reader to understand how things are defined for SD

src/diffusers/models/controlnet_flax.py

pcuenca

Awesome! I think the latest changes are easier and require less warnings.

src/diffusers/models/unet_2d_condition.py

src/diffusers/pipelines/versatile_diffusion/modeling_text_unet.py

yiyixuxu · 2023-06-21T18:54:26Z

src/diffusers/models/unet_2d_condition.py

@@ -398,6 +417,7 @@ def __init__(
                resnet_skip_time_act=resnet_skip_time_act,
                resnet_out_scale_factor=resnet_out_scale_factor,
                cross_attention_norm=cross_attention_norm,
+                attention_head_dim=attention_head_dim[i] if attention_head_dim[i] is not None else output_channel,


Suggested change

attention_head_dim=attention_head_dim[i] if attention_head_dim[i] is not None else output_channel,

attention_head_dim=output_channel //num_attention_heads[i],

so not sure what this is intended to do here, I think attention_head_dim would never be None in our case, so the default 8 would just be passed down as it is and used to calculate the number of attention heads

anyways this prints out 40, (320//8), I think if we pass num_attention_heads = 16 we would want to see the number of attentions to be heads to be 16

down_block_types = ("AttnDownBlock2D",) up_block_types = ("AttnUpBlock2D",) unet = UNet2DConditionModel( num_attention_heads = 16, block_out_channels = (320,), down_block_types = down_block_types, up_block_types = up_block_types) # this prints 40 unet.down_blocks[0].attentions[0].heads

#3797 (comment)

Note that by default attention_head_dim is defined to be 8 and it takes priority for classes such as AttnDownBlock2D as mentioned by @pcuenca in #3797

However this is indeed confusing as now we have some attention blocks where num_attention_heads take priority, e.g. the cross attention blocks and some where number of attention heads take priority. I will clean this up in a follow-up PR

@yiyixuxu , this:

attention_head_dim=output_channel //num_attention_heads[i],

would break current behavior, e.g. configs that have attention_head_dim set to None and would then pass an incorrect number here

sayakpaul · 2023-06-22T02:18:22Z

tests/models/test_models_unet_2d.py

@@ -59,7 +59,7 @@ def prepare_init_args_and_inputs_for_common(self):
            "block_out_channels": (32, 64),
            "down_block_types": ("DownBlock2D", "AttnDownBlock2D"),
            "up_block_types": ("AttnUpBlock2D", "UpBlock2D"),
-            "attention_head_dim": None,
+            "attention_head_dim": 3,


Is it possible to also test for the new argument (num_attention_heads) we introduced here to check for feature parity?

Yes, I'll add some tests now for num_attention_heads

sayakpaul

Agree with Pedro's observations from here.

I think Pedro left a couple of nits and I had a question on testing the new argument (num_attention_heads) for feature parity. Other than those, looks good to me!

Co-authored-by: Pedro Cuenca <[email protected]>

* relax tolerance slightly * correct incorrect naming * correct namingc * correct more * Apply suggestions from code review * Fix more * Correct more * correct incorrect naming * Update src/diffusers/models/controlnet.py * Correct flax * Correct renaming * Correct blocks * Fix more * Correct more * mkae style * mkae style * mkae style * mkae style * mkae style * Fix flax * mkae style * rename * rename * rename attn head dim to attention_head_dim * correct flax * make style * improve * Correct more * make style * fix more * mkae style * Update src/diffusers/models/controlnet_flax.py * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> --------- Co-authored-by: Pedro Cuenca <[email protected]>

patrickvonplaten added 5 commits June 12, 2023 18:27

relax tolerance slightly

959e541

Merge branch 'main' of https://github.com/huggingface/diffusers

4f39790

Merge branch 'main' of https://github.com/huggingface/diffusers

bd47db9

Merge branch 'main' of https://github.com/huggingface/diffusers

feeb260

correct incorrect naming

23948ef

correct namingc

8af0a3e

patrickvonplaten mentioned this pull request Jun 15, 2023

the attention_head_dim argument for UNet2DConditionModel #2011

Closed

correct more

31b00ff

patrickvonplaten commented Jun 15, 2023

View reviewed changes

src/diffusers/models/unet_2d_condition.py Outdated Show resolved Hide resolved

patrickvonplaten added 3 commits June 15, 2023 14:11

Apply suggestions from code review

b5735f7

Fix more

22f4df7

Correct more

c5bf80a

patrickvonplaten commented Jun 15, 2023

View reviewed changes

src/diffusers/models/controlnet.py Show resolved Hide resolved

patrickvonplaten commented Jun 15, 2023

View reviewed changes

src/diffusers/models/unet_2d_blocks_flax.py Show resolved Hide resolved

correct incorrect naming

c961aad

patrickvonplaten requested review from williamberman, pcuenca, yiyixuxu, sayakpaul and patil-suraj June 15, 2023 12:30

patrickvonplaten added the Good Example PR label Jun 15, 2023

Update src/diffusers/models/controlnet.py

67a72e7

sayakpaul reviewed Jun 15, 2023

View reviewed changes

src/diffusers/models/controlnet.py Show resolved Hide resolved

sayakpaul approved these changes Jun 15, 2023

View reviewed changes

sayakpaul reviewed Jun 15, 2023

View reviewed changes

src/diffusers/models/unet_2d_condition_flax.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Jun 15, 2023

View reviewed changes

patrickvonplaten commented Jun 16, 2023

View reviewed changes

src/diffusers/models/unet_2d_blocks.py Show resolved Hide resolved

sayakpaul reviewed Jun 20, 2023

View reviewed changes

src/diffusers/models/unet_2d_blocks_flax.py Outdated Show resolved Hide resolved

sayakpaul reviewed Jun 20, 2023

View reviewed changes

patrickvonplaten added 2 commits June 20, 2023 11:35

rename

6bc461b

rename

515e43a

pcuenca reviewed Jun 20, 2023

View reviewed changes

patrickvonplaten added 9 commits June 21, 2023 12:42

rename attn head dim to attention_head_dim

07ecc8f

correct flax

bec5a02

make style

76b4183

improve

3addfda

Correct more

8779471

make style

69ba896

fix more

7f8815a

Merge branch 'main' into correct_bad_attn_naming

d814a3d

mkae style

f735d98

patrickvonplaten commented Jun 21, 2023

View reviewed changes

src/diffusers/models/controlnet_flax.py Outdated Show resolved Hide resolved

Update src/diffusers/models/controlnet_flax.py

4b3c2fa

pcuenca approved these changes Jun 21, 2023

View reviewed changes

src/diffusers/models/unet_2d_condition.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/versatile_diffusion/modeling_text_unet.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Jun 21, 2023

View reviewed changes

sayakpaul reviewed Jun 22, 2023

View reviewed changes

sayakpaul approved these changes Jun 22, 2023

View reviewed changes

patrickvonplaten and others added 3 commits June 22, 2023 12:25

Apply suggestions from code review

16afd62

Co-authored-by: Pedro Cuenca <[email protected]>

Merge branch 'main' into correct_bad_attn_naming

5eaa43a

Merge branch 'main' into correct_bad_attn_naming

82b93f0

patrickvonplaten merged commit 88d2694 into main Jun 22, 2023

patrickvonplaten deleted the correct_bad_attn_naming branch June 22, 2023 11:53

patrickvonplaten mentioned this pull request Jul 5, 2023

[WIP] Finish attn name correction #3957

Closed

6 tasks

yiyixuxu mentioned this pull request Feb 7, 2024

[WIP]correct the attn naming for UNet3DConditionModel #6873

Closed

	if attn_head_dim is None:
	if attn_head_dim is None and num_attention_heads is not None:

	attention_head_dim=attention_head_dim[i] if attention_head_dim[i] is not None else output_channel,
	attention_head_dim=output_channel //num_attention_heads[i],

Correct bad attn naming #3797

Correct bad attn naming #3797

Uh oh!

Conversation

patrickvonplaten commented Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sayakpaul left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcuenca left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

patrickvonplaten commented Jun 15, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 15, 2023 •

edited

Loading

sayakpaul left a comment •

edited

Loading

pcuenca left a comment •

edited

Loading

patrickvonplaten Jun 21, 2023 •

edited

Loading