-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Fixed the bug related to saving DeepSpeed models. #6628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks! Can you share an example training command to check with DeepSpeed? I checked if the changes worked without DeepSpeed and they do: https://colab.research.google.com/gist/sayakpaul/6d60b261a42e0e9fb07c0e9505e7b82f/scratchpad.ipynb |
Thank you for your reply, here is my training script, datasets are from huggingface, I use single A100. train shell
Here's my accelerate config file
|
Thanks! And did you observe any speedups? Additionally, if you could also modify the |
After using DeepSpeed, the GPU memory usage significantly decreases. I can add this information about DeepSpeed to the README file later. |
That's very good to know. Let's add this info to the README and we can then merge :) |
* origin/main: Fix failing tests due to Posix Path (huggingface#6627)
I have updated the README with instructions on how to train the SDXL model using DeepSpeed, please check :) |
Looking fantastic. Will merge once the CI is green. |
Thank you for your review, I hope it can help diffusers. |
Of course it will! |
* Fixed the bug related to saving DeepSpeed models. * Add information about training SD models using DeepSpeed to the README. * Apply suggestions from code review --------- Co-authored-by: mhh001 <[email protected]> Co-authored-by: Sayak Paul <[email protected]>
What does this PR do?
When using DeepSpeed in the accelerate library to train a model, I encountered an issue while saving the checkpoint. I found that the model in save_model_hook is of type DeepSpeedEngine, which led to an "unexpected save model" error. To resolve this, I needed to unwrap the model, ensuring that it can be compared using isinstance for model type. After making these modifications, the model could be saved correctly.
Fixes # (issue)

fix this bug
after fix this bug ckpt can be saved
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul @patrickvonplaten
HF projects:
-->