-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models #3716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@AndyShih12 can this be used with other pipelines too to make them concurrent? |
Yes paradigms should be compatible with most pipelines, just by changing the denoising loop to the parallel denoising loop. But each pipeline is in a different file so we would have to make a parallel version of each pipeline separately. |
requesting feedback @patrickvonplaten @sayakpaul |
@patrickvonplaten @sayakpaul I think that there's an issue with the way pipelines are made. This pipeline will only work for txt2img only despite needing a specific part modified only. I think there should be a better way to mix multiple pipeline than making more and more pipeline. I am unsure what the best approach would be, but I think this would save a lot of work in the long run. |
I understand your point of view. But as stated in our documentation, we want to be flexible and readable with our codebase. The way we achieve that is by exposing the pipelines in a way that is as self-contained as possible, taking inspiration from |
@AndyShih12 thanks so much for your PR. Could you also give us some comparisons on how parallel sampling improves the efficiency of a standard say, I will get to reviewing the PR soon. |
The documentation is not available anymore as the PR was closed or merged. |
@sayakpaul thank you! Sure, parallel sampling on 8 GPUs can give a 3.1x speedup over StableDiffusionPipeline for 1000-step DDPM, and a 1.8x speedup for 200-step DDIM. There are some more details and comparisons in the paper. ![]() Here is what we can expect the speedup to be when using fewer GPUs, on 1000-step DDPM. Here is a script to compare running time. As we can see above, it's important run with multiple GPUs. import torch
from diffusers import DDPMParallelScheduler, DDIMParallelScheduler
from diffusers import StableDiffusionParadigmsPipeline, StableDiffusionPipeline
scheduler = DDPMParallelScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
prompt = "a photo of an astronaut riding a horse on mars"
sequential_pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", scheduler=scheduler, torch_dtype=torch.float16)
sequential_pipe = sequential_pipe.to("cuda")
ngpu = torch.cuda.device_count()
parallel_pipe = StableDiffusionParadigmsPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", scheduler=scheduler, torch_dtype=torch.float16)
parallel_pipe = parallel_pipe.to("cuda")
parallel_pipe.wrapped_unet = torch.nn.DataParallel(parallel_pipe.unet, device_ids=[d for d in range(ngpu)])
num_inference_steps, batch_per_device = 1000, 5
# warmup
_ = sequential_pipe(prompt, num_inference_steps=10).images[0]
_ = parallel_pipe(prompt, parallel=ngpu * batch_per_device, num_inference_steps=10).images[0]
# run
image_sequential = sequential_pipe(prompt, num_inference_steps=num_inference_steps).images[0]
image_parallel = parallel_pipe(prompt, parallel=ngpu * batch_per_device, num_inference_steps=num_inference_steps).images[0] |
Thanks for providing the info!
Do you mean that if we use the pipeline without parallel sampling on Additionally, I wanted to make you aware of our distributed inference support in case you aren't: https://huggingface.co/docs/diffusers/main/en/training/distributed_inference. It would be good to have a direct comparison of the timings using the setup used in our docs as well. Nonetheless, I am reviewing your PR now. |
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_paradigms.py
Show resolved
Hide resolved
Co-authored-by: Patrick von Platen <[email protected]>
…DDIMParallelScheduler
Ok, I've refactored the schedulers so that DDPMParallelScheduler and DDIMParallelScheduler are now separate classes! @patrickvonplaten |
tests/schedulers/test_schedulers.py
Outdated
# if t is a tensor, match the number of dimensions of sample | ||
if isinstance(t, torch.Tensor): | ||
num_dims = len(sample.shape) | ||
# pad t with 1s to match num_dims | ||
t = t.reshape(-1, *(1,) * (num_dims - 1)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could I get a brief explanation on why was this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also curious here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically timestep t
is a scalar, and sample
has shape e.g. (batch, 3, 8, 8).
For parallel sampling, t
has shape (batch,) and sample
has shape (batch, 3, 8, 8).
When running net.forward(sample, t, ...) with the typical unet, both versions run fine.
But the way this test model is implemented it has error broadcasting the dimensions for parallel sampling, so I'm broadcasting manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, think we're close to getting this one in
Awesome, I made the new changes so that scheduling_ddim.py is now untouched! Let me know if there is anything else! |
It looks like a couple of scheduler tests are now failing:
Could we try to fix those before merging? :-) |
I ran the failing test cases ( @AndyShih12, let's maybe try to fix the failing tests before merging :) |
Oh thank you for the catch, I had assumed the test issue was due to the main branch. Sorry about not testing it thoroughly. Indeed I fixed it by properly casting the type/device of the new tensor in the dummy model. |
The failing test seems to be unrelated. Let me update the branch once and rerun the workflow. |
Thanks for your great contribution! |
thank you both for all the help! |
…fusion models (huggingface#3716) * add paradigms parallel sampling pipeline * linting * ran make fix-copies * add paradigms parallel sampling pipeline * linting * ran make fix-copies * Apply suggestions from code review Co-authored-by: Sayak Paul <[email protected]> * changes based on review * add docs for paradigms * update docs with paradigms abstract * improve documentation, and add tests for ddim/ddpm batch_step_no_noise * fix docs and run make fix-copies * minor changes to docs. * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * move parallel scheduler to new classes for DDPMParallelScheduler and DDIMParallelScheduler * remove changes for scheduling_ddim, adjust licenses, credits, and commented code * fix tensor type that is breaking tests --------- Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>
…fusion models (huggingface#3716) * add paradigms parallel sampling pipeline * linting * ran make fix-copies * add paradigms parallel sampling pipeline * linting * ran make fix-copies * Apply suggestions from code review Co-authored-by: Sayak Paul <[email protected]> * changes based on review * add docs for paradigms * update docs with paradigms abstract * improve documentation, and add tests for ddim/ddpm batch_step_no_noise * fix docs and run make fix-copies * minor changes to docs. * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * move parallel scheduler to new classes for DDPMParallelScheduler and DDIMParallelScheduler * remove changes for scheduling_ddim, adjust licenses, credits, and commented code * fix tensor type that is breaking tests --------- Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>
This pull request implements the paper Parallel Sampling of Diffusion Models: https://arxiv.org/abs/2305.16317
Based on the repository: https://github.com/AndyShih12/paradigms
Example of use: