-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[SD 3.5 Dreambooth LoRA] support configurable training block & layers #9762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SD 3.5 Dreambooth LoRA] support configurable training block & layers #9762
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
cc @bghira - maybe you explored and have some insights/thoughts to share too 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Left a couple of comments. LMK if they make sense or are unclear.
> **Photorealism** | ||
> In preliminary testing, we observed that freezing the last few layers of the architecture significantly improved model training when using a photorealistic dataset, preventing detail degradation introduced by small dataset from happening. | ||
> **Anatomy preservation** | ||
> To dampen any possible degradation of anatomy, training only the attention layers and **not** the adaptive linear layers could help. For reference, below is one of the transformer blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes total sense to me!
- with `--lora_layers` you can specify the types of layers you wish to train. | ||
By default, the trained layers are - | ||
`"attn.add_k_proj","attn.add_q_proj","attn.add_v_proj", "attn.to_add_out","attn.to_k","attn.to_out.0","attn.to_q","attn.to_v"` | ||
If you wish to have a leaner LoRA / train more blocks over layers you could pass - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaner LoRA targetting what aspect? From what I understand, this heuristic is for targeting a specific quality, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the feature was added to allow experimentation with what layers produce the best quality, but since the default (once the pr is merged) will be
attn.add_k_proj attn.add_q_proj attn.add_v_proj attn.to_add_out attn.to_k attn.to_out.0 attn.to_q attn.to_v
which makes every trained block chunkier than the previous default, I wanted to also give as an example the previous setting we had which is
--lora_layers attn.to_k attn.to_q attn.to_v attn.to_out.0
that will result in a smaller lora.
But if it's confusing/unclear I can remove that
# when not training the text encoder, all the parameters in the state dict should start | ||
# with `"transformer"` in their names. | ||
# In this test, only params of transformer block 0 should be in the state dict | ||
starts_with_transformer = all( | ||
key.startswith("transformer.transformer_blocks.0") for key in lora_state_dict.keys() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super!
…-explorations # Conflicts: # examples/dreambooth/train_dreambooth_lora_sd3.py
examples/dreambooth/README_sd3.md
Outdated
`attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,attn.to_k,attn.to_out.0,attn.to_q,attn.to_v` | ||
If you wish to have a leaner LoRA / train more blocks over layers you could pass - | ||
```diff | ||
--lora_layers attn.to_k,attn.to_q,attn.to_v,attn.to_out.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we deleting this block or we're suggesting to add it? In case of addition, it should be like so + --lora_layers attn.to_k,attn.to_q,attn.to_v,attn.to_out.0
under the diff block.
|
||
# when not training the text encoder, all the parameters in the state dict should start | ||
# with `"transformer"` in their names. | ||
# In this test, only params of transformer block 0 should be in the state dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisit the comment?
--train_batch_size 1 | ||
--gradient_accumulation_steps 1 | ||
--max_train_steps 2 | ||
--lora_blocks 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the value we're passing here a constant so that it's clear from the test as to what we're varying. Does that make sense?
--train_batch_size 1 | ||
--gradient_accumulation_steps 1 | ||
--max_train_steps 2 | ||
--lora_layers attn.to_k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work here. I love it, especially the tests! I left only minor comments, LMK if they make sense.
…#9762) * configurable layers * configurable layers * update README * style * add test * style * add layer test, update readme, add nargs * readme * test style * remove print, change nargs * test arg change * style * revert nargs 2/2 * address sayaks comments * style * address sayaks comments
This PR adds
--lora_layers
and--lora_blocks
to dreambooth lora training script to allow targeting specific blocks & layers.