Skip to content

[SD-XL] Add new pipelines #3859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Jul 6, 2023
Merged

[SD-XL] Add new pipelines #3859

merged 51 commits into from
Jul 6, 2023

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Jun 23, 2023

Usage for "stabilityai/stable-diffusion-xl-base-0.9":

pip install git+https://github.com/huggingface/diffusers.git@sd_xl

In addition make sure to install transformers, safetensors, accelerate as well as the invisible watermark:

pip install transformers accelerate safetensors

pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark

You can use the model then as follows

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")

# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()

prompt = "An astronaut riding a green horse"

images = pipe(prompt=prompt).images[0]

When using torch >= 2.0, you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline:

pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

If you are limited by GPU VRAM, you can enable cpu offloading by calling pipe.enable_model_cpu_offload
instead of .to("cuda"):

- pipe.to("cuda")
+ pipe.enable_model_cpu_offload()

Usage for "stabilityai/stable-diffusion-xl-refiner-0.9"

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")

# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()

prompt = "An astronaut riding a green horse"

images = pipe(prompt=prompt, output_type="latent").images

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.enable_model_cpu_offload()

# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()

images = pipe(prompt=prompt, image=image).images

When using torch >= 2.0, you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline:

pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

If you are limited by GPU VRAM, you can enable cpu offloading by calling pipe.enable_model_cpu_offload
instead of .to("cuda"):

- pipe.to("cuda")
+ pipe.enable_model_cpu_offload()

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 23, 2023

The documentation is not available anymore as the PR was closed or merged.

if self.config.timestep_spacing == "linspace":
timesteps = np.linspace(0, self.config.num_train_timesteps - 1, num_inference_steps, dtype=float)[::-1].copy()
elif self.config.timestep_spacing == "leading":
step_ratio = self.config.num_train_timesteps // self.num_inference_steps
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this new spacing, doesn't give drastically better results, but better results nevertheless IMO. It's also needed to have 1-to-1 the same results as original code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the original code (XL) use this new spacing scheme, though?

Comment on lines 99 to 100
num_transformer_blocks (`int` or `Tuple[int]`, *optional*, defaults to 1):
The number of transformer blocks of type [`~models.attention.BasicTransformerBlock`]. Only relevant for [`~models.unet_2d_blocks.CrossAttnDownBlock2D`], [`~models.unet_2d_blocks.CrossAttnUpBlock2D`], [`~models.unet_2d_blocks.UNetMidBlock2DCrossAttn`].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a Transformer block can be a UNet block? I don't find the num_transformer_blocks name to be a good one to encompass all the blocks we're supporting here. But cannot think of a better one, either. So, okay to ignore I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, maybe transformer_layers_per_block is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better for sure.

def convert_open_clip_checkpoint(checkpoint):
text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder")
def convert_open_clip_checkpoint(checkpoint, prefix="cond_stage_model.model."):
# text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not affecting the SD 2 conversion process with this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to double check!

Comment on lines +1218 to +1220
num_train_timesteps = original_config.model.params.timesteps or 1000
beta_start = original_config.model.params.linear_start or 0.02
beta_end = original_config.model.params.linear_end or 0.085
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these numbers coming from? I'd make a note for our future reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is hacky for now and shouldn't be this way

@patrickvonplaten patrickvonplaten changed the title [SD-XL, WIP] Add new text encoder [SD-XL] Add new pipelines Jun 27, 2023
text_encoder_lora_scale = (
cross_attention_kwargs.get("scale", None) if cross_attention_kwargs is not None else None
)
(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 tensors are returned instead of just one.
The first 2 tensors are the normal pos and neg prompt embeddings that are passed into cross attention. The last 2 "pooled" embeds are used to additional condition the time embedding

Fix embeddings for classic SD models.
@@ -107,6 +107,13 @@ class EulerDiscreteScheduler(SchedulerMixin, ConfigMixin):
This parameter controls whether to use Karras sigmas (Karras et al. (2022) scheme) for step sizes in the
noise schedule during the sampling process. If True, the sigmas will be determined according to a sequence
of noise levels {σi} as defined in Equation (5) of the paper https://arxiv.org/pdf/2206.00364.pdf.
timestep_spacing (`str`, default `"linspace"`):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes should also work well for other schedulers

A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
`self.processor` in
[diffusers.cross_attention](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py).
guidance_rescale (`float`, *optional*, defaults to 0.7):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaults to 0.0*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes we should probably fix this in a follow-up PR! Sorry just noticed the comment here. Would you like to open a PR here maybe @bghira ? :-)

@ValMystletainn
Copy link

when install i got this error

Collecting git+https://github.com/huggingface/diffusers.git@sd_xl Cloning https://github.com/huggingface/diffusers.git (to revision sd_xl) to /tmp/pip-req-build-5g66rp30 Running command git clone --filter=blob:none --quiet https://github.com/huggingface/diffusers.git /tmp/pip-req-build-5g66rp30 WARNING: Did not find branch or tag 'sd_xl', assuming revision or ref. Running command git checkout -q sd_xl error: pathspec 'sd_xl' did not match any file(s) known to git error: subprocess-exited-with-error

× git checkout -q sd_xl did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× git checkout -q sd_xl did not run successfully. │ exit code: 1 ╰─> See above for output.

he just merge the feature branch to main and delete it. And the documentation in model space is not updated in time, just install the diffuser from main branch rather than sd_xl branch and keep all other things same.

@kaddly
Copy link

kaddly commented Jul 7, 2023

when install i got this error
Collecting git+https://github.com/huggingface/diffusers.git@sd_xl Cloning https://github.com/huggingface/diffusers.git (to revision sd_xl) to /tmp/pip-req-build-5g66rp30 Running command git clone --filter=blob:none --quiet https://github.com/huggingface/diffusers.git /tmp/pip-req-build-5g66rp30 WARNING: Did not find branch or tag 'sd_xl', assuming revision or ref. Running command git checkout -q sd_xl error: pathspec 'sd_xl' did not match any file(s) known to git error: subprocess-exited-with-error
× git checkout -q sd_xl did not run successfully. │ exit code: 1 ╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error
× git checkout -q sd_xl did not run successfully. │ exit code: 1 ╰─> See above for output.

he just merge the feature branch to main and delete it. And the documentation in model space is not updated in time, just install the diffuser from main branch rather than sd_xl branch and keep all other things same.

thanks

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* Add new text encoder

* add transformers depth

* More

* Correct conversion script

* Fix more

* Fix more

* Correct more

* correct text encoder

* Finish all

* proof that in works in run local xl

* clean up

* Get refiner to work

* Add red castle

* Fix batch size

* Improve pipelines more

* Finish text2image tests

* Add img2img test

* Fix more

* fix import

* Fix embeddings for classic models (huggingface#3888)

Fix embeddings for classic SD models.

* Allow multiple prompts to be passed to the refiner (huggingface#3895)

* finish more

* Apply suggestions from code review

* add watermarker

* Model offload (huggingface#3889)

* Model offload.

* Model offload for refiner / img2img

* Hardcode encoder offload on img2img vae encode

Saves some GPU RAM in img2img / refiner tasks so it remains below 8 GB.

---------

Co-authored-by: Patrick von Platen <[email protected]>

* correct

* fix

* clean print

* Update install warning for `invisible-watermark`

* add: missing docstrings.

* fix and simplify the usage example in img2img.

* fix setup for watermarking.

* Revert "fix setup for watermarking."

This reverts commit 491bc9f.

* fix: watermarking setup.

* fix: op.

* run make fix-copies.

* make sure tests pass

* improve convert

* make tests pass

* make tests pass

* better error message

* fiinsh

* finish

* Fix final test

---------

Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* Add new text encoder

* add transformers depth

* More

* Correct conversion script

* Fix more

* Fix more

* Correct more

* correct text encoder

* Finish all

* proof that in works in run local xl

* clean up

* Get refiner to work

* Add red castle

* Fix batch size

* Improve pipelines more

* Finish text2image tests

* Add img2img test

* Fix more

* fix import

* Fix embeddings for classic models (huggingface#3888)

Fix embeddings for classic SD models.

* Allow multiple prompts to be passed to the refiner (huggingface#3895)

* finish more

* Apply suggestions from code review

* add watermarker

* Model offload (huggingface#3889)

* Model offload.

* Model offload for refiner / img2img

* Hardcode encoder offload on img2img vae encode

Saves some GPU RAM in img2img / refiner tasks so it remains below 8 GB.

---------

Co-authored-by: Patrick von Platen <[email protected]>

* correct

* fix

* clean print

* Update install warning for `invisible-watermark`

* add: missing docstrings.

* fix and simplify the usage example in img2img.

* fix setup for watermarking.

* Revert "fix setup for watermarking."

This reverts commit 491bc9f.

* fix: watermarking setup.

* fix: op.

* run make fix-copies.

* make sure tests pass

* improve convert

* make tests pass

* make tests pass

* better error message

* fiinsh

* finish

* Fix final test

---------

Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants