-
Notifications
You must be signed in to change notification settings - Fork 6.1k
device map legacy attention block weight conversion #3804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
device map legacy attention block weight conversion #3804
Conversation
@@ -78,6 +78,7 @@ def __init__( | |||
self.upcast_softmax = upcast_softmax | |||
self.rescale_output_factor = rescale_output_factor | |||
self.residual_connection = residual_connection | |||
self.dropout = dropout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added so can re-create the dropout module when converting back to new weight format
The documentation is not available anymore as the PR was closed or merged. |
# (which look like they should be private variables?), so we can't use the standard hooks | ||
# to rename parameters on load. We need to mimic the original weight names so the correct | ||
# attributes are available. After we have loaded the weights, we convert the deprecated | ||
# names to the new non-deprecated names. Then we _greatly encourage_ the user to convert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have guidance available for the users on how they should perform the conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nevermind. I guess you meant once we load the old attention block weight names and run conversion internally, and suggest users save the pipeline. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just this blurb here
f"Taking `{str(e)}` while using `accelerate.load_checkpoint_and_dispatch` to mean {pretrained_model_name_or_path}"
" was saved with deprecated attention block weight names. We will load it with the deprecated attention block"
" names and convert them on the fly to the new attention block format. Please re-save the model after this conversion,"
" so we don't have to do the on the fly renaming in the future. If the model is from a hub checkpoint,"
" please also re-upload it or open a PR on the original repository."
pipe = DiffusionPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-pipe", safety_checker=None) | ||
|
||
pre_conversion = pipe( | ||
"foo", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Killer prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concrete! Nice tests.
# names to the new non-deprecated names. Then we _greatly encourage_ the user to convert | ||
# the weights so we don't have to do this again. | ||
|
||
if "'Attention' object has no attribute" in str(e): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty hacky, but OK! Let's leave it for now :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I cringed while writing this 🙃
0aab8ca
to
60aa2e6
Compare
re: #3740
this is not ideal but afaik this is the only way to solve this without exposing functionality in accelerate