[feat] allow SDXL pipeline to run with fused QKV projections #6030

sayakpaul · 2023-12-03T04:31:27Z

What does this PR do?

Adds an option for running the SDXL pipeline with QKV projections fused. For self-attention, all the projection matrices are horizontally fused. For cross-attention, key and value projection matrices are fused.

Some more comments are inline.

A lot of thanks to @cpuhrsch for helping.

src/diffusers/models/attention_processor.py

patrickvonplaten

Looks very nice! Could you add some code that would allow to test the speed-ups?

sayakpaul · 2023-12-04T10:38:09Z

Looks very nice! Could you add some code that would allow to test the speed-ups?

https://github.com/sayakpaul/sdxl-fast

Also, note that this is just one of the many things needed to speed things up. But throughput-wise it does contribute and one can check with the following code:

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.fuse_qkv_projections()

for _ in range(5):
    _ = pipeline("hey", num_inference_steps=25).images[0]

pipeline.unfuse_qkv_projections()

for _ in range(5):
    _ = pipeline("hey", num_inference_steps=25).images[0]

You should see speedup in the throughput.

Co-authored-by: Patrick von Platen <[email protected]>

patrickvonplaten · 2023-12-04T11:31:45Z

Indeed, getting a nice 5% speed-up!

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

patrickvonplaten

Cool feature!

Co-authored-by: Patrick von Platen <[email protected]>

sayakpaul · 2023-12-04T14:54:01Z

@DN6 any reason why the fetcher is failing?

patrickvonplaten

Very nice job here!

patrickvonplaten · 2023-12-06T18:57:33Z

src/diffusers/schedulers/scheduling_euler_discrete.py

@@ -289,6 +290,8 @@ def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.devic
            self.timesteps = torch.from_numpy(timesteps.astype(np.float32)).to(device=device)

        self.sigmas = torch.cat([sigmas, torch.zeros(1, device=sigmas.device)])
+        if sigmas.device.type == "cuda":


this breaks the add_noise function and thus inpaint and img2img and training

…face#6030) * debug * from step * print * turn sigma a list * make str * init_noise_sigma * comment * remove prints * feat: introduce fused projections * change to a better name * no grad * device. * device * dtype * okay * print * more print * fix: unbind -> split * fix: qkv >-> k * enable disable * apply attention processor within the method * attn processors * _enable_fused_qkv_projections * remove print * add fused projection to vae * add todos. * add: documentation and cleanups. * add: test for qkv projection fusion. * relax assertions. * relax further * fix: docs * fix-copies * correct error message. * Empty-Commit * better conditioning on disable_fused_qkv_projections * check * check processor * bfloat16 computation. * check latent dtype * style * remove copy temporarily * cast latent to bfloat16 * fix: vae -> self.vae * remove print. * add _change_to_group_norm_32 * comment out stuff that didn't work * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * reflect patrick's suggestions. * fix imports * fix: disable call. * fix more * fix device and dtype * fix conditions. * fix more * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> --------- Co-authored-by: Patrick von Platen <[email protected]>

sayakpaul added 30 commits December 1, 2023 09:14

debug

bf4e645

from step

afb517a

print

55f1842

turn sigma a list

215bf3b

make str

75ae3df

init_noise_sigma

ff04934

comment

096fffb

remove prints

bd855d7

feat: introduce fused projections

88c7e16

change to a better name

f5b091d

no grad

a4da76b

device.

c5a5f85

device

4e556a9

dtype

86027e5

okay

a030797

print

01c6038

more print

c4eaec3

fix: unbind -> split

a7da467

fix: qkv >-> k

94fb74a

enable disable

678577b

apply attention processor within the method

580a1c2

attn processors

06bb65b

_enable_fused_qkv_projections

a0b9066

remove print

32012ce

add fused projection to vae

5175b91

add todos.

7b16888

merge main and resolve conflicts

ba14a08

add: documentation and cleanups.

23f8404

add: test for qkv projection fusion.

e51bc7e

relax assertions.

b64e533

patrickvonplaten reviewed Dec 4, 2023

View reviewed changes

src/diffusers/models/attention_processor.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 4, 2023

View reviewed changes

src/diffusers/models/attention_processor.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 4, 2023

View reviewed changes

sayakpaul and others added 9 commits December 4, 2023 16:10

Apply suggestions from code review

7d8b913

Co-authored-by: Patrick von Platen <[email protected]>

reflect patrick's suggestions.

ff28fdd

fix imports

93b5f92

Merge branch 'main' into sdxl/feat

a7a952d

fix: disable call.

8d17831

fix more

d17bbbd

fix device and dtype

a5fb4d7

fix conditions.

8fadb14

fix more

c6d5e86

patrickvonplaten reviewed Dec 4, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 4, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Outdated Show resolved Hide resolved

patrickvonplaten approved these changes Dec 4, 2023

View reviewed changes

sayakpaul and others added 2 commits December 4, 2023 18:25

Apply suggestions from code review

abf9ebc

Co-authored-by: Patrick von Platen <[email protected]>

Merge branch 'main' into sdxl/feat

d485abd

Merge branch 'main' into sdxl/feat

e65ddcd

patrickvonplaten approved these changes Dec 5, 2023

View reviewed changes

sayakpaul merged commit a2bc2e1 into main Dec 6, 2023

sayakpaul deleted the sdxl/feat branch December 6, 2023 02:03

sayakpaul restored the sdxl/feat branch December 6, 2023 02:12

patrickvonplaten reviewed Dec 6, 2023

View reviewed changes

sayakpaul mentioned this pull request Dec 15, 2023

[Core] feat: enable fused attention projections for other SD and SDXL pipelines #6179

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] allow SDXL pipeline to run with fused QKV projections #6030

[feat] allow SDXL pipeline to run with fused QKV projections #6030

Uh oh!

sayakpaul commented Dec 3, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten left a comment

Uh oh!

sayakpaul commented Dec 4, 2023 •

edited

Loading

Uh oh!

patrickvonplaten commented Dec 4, 2023

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten left a comment

Uh oh!

sayakpaul commented Dec 4, 2023

Uh oh!

patrickvonplaten left a comment

Uh oh!

patrickvonplaten Dec 6, 2023

Uh oh!

Uh oh!

[feat] allow SDXL pipeline to run with fused QKV projections #6030

[feat] allow SDXL pipeline to run with fused QKV projections #6030

Uh oh!

Conversation

sayakpaul commented Dec 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Dec 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Dec 4, 2023

Uh oh!

Uh oh!

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Dec 4, 2023

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul commented Dec 3, 2023 •

edited

Loading

sayakpaul commented Dec 4, 2023 •

edited

Loading