-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Refactor execution device & cpu offload #4114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| return torch.device(module._hf_hook.execution_device) | ||
| return self.device | ||
|
|
||
| def enable_sequential_cpu_offload(self, gpu_id: int = 0, device: Union[torch.device, str] = "cuda"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is general enough to work with all models. Also applies suggestion from #3990
|
The documentation is not available anymore as the PR was closed or merged. |
williamberman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this makes a ton of sense! We should be sure to monitor the integration tests
| def _execution_device(self): | ||
| r""" | ||
| Returns the device on which the pipeline's models will be executed. After calling | ||
| `pipeline.enable_sequential_cpu_offload()` the execution device can only be inferred from Accelerate's module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we format pipeline.enable_sequential_cpu_offload() such that the hyperlink generates properly?
|
|
||
| # set these parameters to False in the child class if the pipeline does not support the corresponding functionality | ||
| test_attention_slicing = True | ||
| test_cpu_offload = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we discarding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always test it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We completely remove the test_cpu_offload flag because every pipeline is now tested
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning up!
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* create general cpu offload & execution device * Remove boiler plate * finish * kp * Correct offload more pipelines * up * Update src/diffusers/pipelines/pipeline_utils.py * make style * up
* create general cpu offload & execution device * Remove boiler plate * finish * kp * Correct offload more pipelines * up * Update src/diffusers/pipelines/pipeline_utils.py * make style * up
* create general cpu offload & execution device * Remove boiler plate * finish * kp * Correct offload more pipelines * up * Update src/diffusers/pipelines/pipeline_utils.py * make style * up
* create general cpu offload & execution device * Remove boiler plate * finish * kp * Correct offload more pipelines * up * Update src/diffusers/pipelines/pipeline_utils.py * make style * up
* create general cpu offload & execution device * Remove boiler plate * finish * kp * Correct offload more pipelines * up * Update src/diffusers/pipelines/pipeline_utils.py * make style * up
Refactor
enable_sequential_cpu_offloadand_execution_deviceEvery pipeline should be able to use
cpu_offloadin a very non-customized way as essentially we just need to wrap alltorch.nn.Modulecomponets into thecpu_offloadfunction. Similarly can we determine the_execution_deviceby looping over all torch.nn.Module components.Therefore this PR:
enable_sequantial_cpu_offloadand_execution_devicetoDiffusionPipeline_exclude_from_cpu_offloadclass list to classes where certain specific components cannot be offloaded because of weird parameter structureI tested this PR on GPU by running: