Skip to content

device map legacy attention block weight conversion #3804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

williamberman
Copy link
Contributor

@williamberman williamberman commented Jun 15, 2023

re: #3740

this is not ideal but afaik this is the only way to solve this without exposing functionality in accelerate

@@ -78,6 +78,7 @@ def __init__(
self.upcast_softmax = upcast_softmax
self.rescale_output_factor = rescale_output_factor
self.residual_connection = residual_connection
self.dropout = dropout
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added so can re-create the dropout module when converting back to new weight format

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 15, 2023

The documentation is not available anymore as the PR was closed or merged.

# (which look like they should be private variables?), so we can't use the standard hooks
# to rename parameters on load. We need to mimic the original weight names so the correct
# attributes are available. After we have loaded the weights, we convert the deprecated
# names to the new non-deprecated names. Then we _greatly encourage_ the user to convert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have guidance available for the users on how they should perform the conversion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nevermind. I guess you meant once we load the old attention block weight names and run conversion internally, and suggest users save the pipeline. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just this blurb here

  f"Taking `{str(e)}` while using `accelerate.load_checkpoint_and_dispatch` to mean {pretrained_model_name_or_path}"
                                " was saved with deprecated attention block weight names. We will load it with the deprecated attention block"
                                " names and convert them on the fly to the new attention block format. Please re-save the model after this conversion,"
                                " so we don't have to do the on the fly renaming in the future. If the model is from a hub checkpoint,"
                                " please also re-upload it or open a PR on the original repository."

pipe = DiffusionPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-pipe", safety_checker=None)

pre_conversion = pipe(
"foo",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Killer prompt.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concrete! Nice tests.

# names to the new non-deprecated names. Then we _greatly encourage_ the user to convert
# the weights so we don't have to do this again.

if "'Attention' object has no attribute" in str(e):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty hacky, but OK! Let's leave it for now :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I cringed while writing this 🙃

@williamberman williamberman force-pushed the device_map_legacy_attention_block_conversion branch from 0aab8ca to 60aa2e6 Compare June 16, 2023 17:10
@williamberman williamberman merged commit 59aefe9 into huggingface:main Jun 16, 2023
@williamberman williamberman deleted the device_map_legacy_attention_block_conversion branch June 16, 2023 17:39
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants