cpu_offload vRAM memory consumption large than 4GB #1934

Sanster · 2023-01-06T03:23:31Z

Describe the bug

I am using the code from https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings to test cpu_offload, but the vRAM memory consumption is large than 4GB

GPU	cpu_offload enabled	vRAM cost
1080	Yes	4539MB
1080	No	5101MB
TITAN RTX	Yes	5134MB
TITAN RTX	No	5668MB

Reproduction

I am using the code from https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
pipe.enable_sequential_cpu_offload()
image = pipe(prompt).images[0]

Logs

No response

System Info

test on 1080/TITAN RTX

diffusers version: 0.11.1
accelerate version: 0.15.0
Platform: Linux-4.15.0-142-generic-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): 1.10.1+cu111 (True)
Huggingface_hub version: 0.11.1
Transformers version: 4.25.1
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2023-01-10T17:37:20Z

Hey @Sanster,

Thanks a lot for the super clean bug report. When running your code-snippet in combination with:

nvidia-smi

I can observe the same numbers as in your table.

I think the problem is that we move all modules to GPU in the very beginning. We shouldn't do this. When setting:

pipe.enable_sequential_cpu_offload()

It's important to previously not run .to("cuda"). E.g. the following should give much better memory numbers:

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    safety_checker=None,
)

prompt = "a photo of an astronaut riding a horse on mars"
pipe.enable_sequential_cpu_offload()
image = pipe(prompt, num_inference_steps=4).images[0]

When running the above I'm getting <3GB memory usage.

In this example here, I'm not using the safety_checker since there is a bug with cpu_offload + safety checker at the moment that should however be corrected in this PR: #1968 along-side the incorrect documentation as spotted by you. Thanks a lot ❤️

It's an interesting use case here since by it might not be super intuitive to have to remove .to("cuda") when using enable_sequential_cpu_offload(...).

cc @pcuenca @patil-suraj @anton-l and maybe also @sgugger @muellerzr just FYI since it might be a common problem people run into.

github-actions · 2023-02-05T15:02:40Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sanster added the bug Something isn't working label Jan 6, 2023

patrickvonplaten mentioned this issue Jan 10, 2023

[CPU offload] correct cpu offload #1968

Merged

Sanster mentioned this issue Jan 18, 2023

cpu_offload not work with PaintByExample pipeline #2026

Closed

patrickvonplaten mentioned this issue Jan 26, 2023

0.12.0.dev0 consumes more VRAM than 0.10.2 #2053

Closed

howsmyanimeprofilepicture mentioned this issue Feb 4, 2023

feat) optimization kr translation Pseudo-Lab/diffusers#1

Merged

github-actions bot added the stale Issues that haven't received updates label Feb 5, 2023

Sanster closed this as completed Feb 5, 2023

This was referenced Nov 14, 2023

[Docs] Fix typos, update, and add visualizations at Using Diffusers' Pipelines for Inference Page #5649

Merged

[Docs] Remove .to('cuda') before .enable_model_cpu_offload() #5795

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu_offload vRAM memory consumption large than 4GB #1934

cpu_offload vRAM memory consumption large than 4GB #1934

Sanster commented Jan 6, 2023 •

edited

Loading

patrickvonplaten commented Jan 10, 2023

github-actions bot commented Feb 5, 2023

cpu_offload vRAM memory consumption large than 4GB #1934

cpu_offload vRAM memory consumption large than 4GB #1934

Comments

Sanster commented Jan 6, 2023 • edited Loading

Describe the bug

Reproduction

Logs

System Info

patrickvonplaten commented Jan 10, 2023

github-actions bot commented Feb 5, 2023

Sanster commented Jan 6, 2023 •

edited

Loading