Skip to content

[SD 3.5 Dreambooth LoRA] support configurable training block & layers #9762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Oct 28, 2024

Conversation

linoytsaban
Copy link
Collaborator

@linoytsaban linoytsaban commented Oct 24, 2024

This PR adds

  • --lora_layers and --lora_blocks to dreambooth lora training script to allow targeting specific blocks & layers.
  • This is generally proven to be useful, and specifically for SD3.5 initial thoughts on the matter brought up here - Stable Diffusion 3.5 Large Fine-tuning Tutorial
  • some initial results to demonstrate the differences:

Group 3-14

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@linoytsaban
Copy link
Collaborator Author

cc @bghira - maybe you explored and have some insights/thoughts to share too 🤗

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Left a couple of comments. LMK if they make sense or are unclear.

> **Photorealism**
> In preliminary testing, we observed that freezing the last few layers of the architecture significantly improved model training when using a photorealistic dataset, preventing detail degradation introduced by small dataset from happening.
> **Anatomy preservation**
> To dampen any possible degradation of anatomy, training only the attention layers and **not** the adaptive linear layers could help. For reference, below is one of the transformer blocks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes total sense to me!

- with `--lora_layers` you can specify the types of layers you wish to train.
By default, the trained layers are -
`"attn.add_k_proj","attn.add_q_proj","attn.add_v_proj", "attn.to_add_out","attn.to_k","attn.to_out.0","attn.to_q","attn.to_v"`
If you wish to have a leaner LoRA / train more blocks over layers you could pass -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaner LoRA targetting what aspect? From what I understand, this heuristic is for targeting a specific quality, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the feature was added to allow experimentation with what layers produce the best quality, but since the default (once the pr is merged) will be
attn.add_k_proj attn.add_q_proj attn.add_v_proj attn.to_add_out attn.to_k attn.to_out.0 attn.to_q attn.to_v
which makes every trained block chunkier than the previous default, I wanted to also give as an example the previous setting we had which is
--lora_layers attn.to_k attn.to_q attn.to_v attn.to_out.0
that will result in a smaller lora.
But if it's confusing/unclear I can remove that

Comment on lines +167 to +172
# when not training the text encoder, all the parameters in the state dict should start
# with `"transformer"` in their names.
# In this test, only params of transformer block 0 should be in the state dict
starts_with_transformer = all(
key.startswith("transformer.transformer_blocks.0") for key in lora_state_dict.keys()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super!

@linoytsaban linoytsaban requested a review from sayakpaul October 28, 2024 11:22
`attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,attn.to_k,attn.to_out.0,attn.to_q,attn.to_v`
If you wish to have a leaner LoRA / train more blocks over layers you could pass -
```diff
--lora_layers attn.to_k,attn.to_q,attn.to_v,attn.to_out.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we deleting this block or we're suggesting to add it? In case of addition, it should be like so + --lora_layers attn.to_k,attn.to_q,attn.to_v,attn.to_out.0 under the diff block.


# when not training the text encoder, all the parameters in the state dict should start
# with `"transformer"` in their names.
# In this test, only params of transformer block 0 should be in the state dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisit the comment?

--train_batch_size 1
--gradient_accumulation_steps 1
--max_train_steps 2
--lora_blocks 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the value we're passing here a constant so that it's clear from the test as to what we're varying. Does that make sense?

--train_batch_size 1
--gradient_accumulation_steps 1
--max_train_steps 2
--lora_layers attn.to_k
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work here. I love it, especially the tests! I left only minor comments, LMK if they make sense.

@linoytsaban linoytsaban merged commit db5b6a9 into huggingface:main Oct 28, 2024
8 checks passed
@linoytsaban linoytsaban deleted the sd-3-5-explorations branch November 26, 2024 10:18
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
…#9762)

* configurable layers

* configurable layers

* update README

* style

* add test

* style

* add layer test, update readme, add nargs

* readme

* test style

* remove print, change nargs

* test arg change

* style

* revert nargs 2/2

* address sayaks comments

* style

* address sayaks comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants