Add SDXL long weighted prompt pipeline (replace pr:4629) #4661

xhinker · 2023-08-17T17:36:58Z

What does this PR do?

replace PR:#4629

This PR added pipeline accepts unlimited size prompt and negative prompt string compatible with A1111 prompt weighting format.

Fixes: #4559

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. (long weighted prompt support for SDXL? #4559 )
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul I recreate a complete new fork and add the new code in this PR, hope no additional commits are coming along, will update the document and provide sample codes once this PR is done. Thanks

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl",
)

prompt = "a (white) cat running on the grass"*20
prompt2 = "play a (football:1.3)"*20
prompt = f"{prompt},{prompt2}"
neg_prompt = "blur, low quality"

pipe.to("cuda")
images = pipe(
    prompt                  = prompt 
    , negative_prompt       = neg_prompt 
).images[0]

pipe.to("cpu")
torch.cuda.empty_cache()
images

HuggingFaceDocBuilderDev · 2023-08-18T04:44:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

sayakpaul · 2023-08-18T04:45:02Z

Thanks for this. I think you forgot to add the example in the README. Once that's done I think we should be ready merge and ship 🚀

…ument

…r/diffusers into sdxl_long_weighted_prompt

xhinker · 2023-08-18T05:11:59Z

Thanks for this. I think you forgot to add the example in the README. Once that's done I think we should be ready merge and ship 🚀

sample added. thanks @sayakpaul

sayakpaul · 2023-08-18T05:18:58Z

I saw a PR on our documentation-images repo from you. I just merged it. Do you want to include a link to that sample somewhere in the README?

xhinker · 2023-08-18T05:20:38Z

I saw a PR on our documentation-images repo from you. I just merged it. Do you want to include a link to that sample somewhere in the README?

Great, let me add the image to the Readme doc. give me a second :)

…ument, add result image

xhinker · 2023-08-18T05:28:45Z

I saw a PR on our documentation-images repo from you. I just merged it. Do you want to include a link to that sample somewhere in the README?

image added to the readme doc

sayakpaul · 2023-08-18T06:00:21Z

Thanks for your valuable contribution!

Skquark · 2023-08-19T03:20:00Z

This is great, planning on integrating but I'm slightly confused on a couple things. In the example, what's going on with prompt = "text"*20 then combining it with prompt2 = "text continuation"*20 to pass to the pipeline? Why are we multiplying, and how would I treat a normal positive prompt whether it's short or long?
Also, will there be img2img or inpainting added into it? How would we handle the refiner step here? Thanks, good stuff...

xhinker · 2023-08-19T03:29:03Z

This is great, planning on integrating but I'm slightly confused on a couple things. In the example, what's going on with prompt = "text"*20 then combining it with prompt2 = "text continuation"*20 to pass to the pipeline? Why are we multiplying, and how would I treat a normal positive prompt whether it's short or long? Also, will there be img2img or inpainting added into it? How would we handle the refiner step here? Thanks, good stuff...

prompt = "text"*20 is simple make a string longer, prompt2 = "play a (football:1.3)"*20 is to test, keywords added after 77 tokens will be reflected in the result. you can simply create a long prompt such as prompt = "loooooooong (weighted) prompt".

In terms of img2img and inpainting, thinking provide a embedding function, so that long weighted prompted can be used for any SDXL based models.

Will look into the internal of refiner and see how it works, and may build one for the refiner

Skquark · 2023-08-19T08:04:16Z

Ah, that makes more sense, was just a little confusing, maybe should replace example with actual long prompt with multiple syntaxes including ((double positive)), [negatives] and such..
SDXL differences are starting to make more sense, finally got most of it working in my DiffusionDeluxe.com app and was using lpw as the default mega pipline because I didn't have to reload pipelines when switching from txt2img & img2img mode, so would be nice to continue using this in addition to Compel as option..
For the refiner, it'd just need the img2img function like the original LPW.. No rush, but looks doable after looking though the code. Thanks.

duongnv0499 · 2023-08-23T16:25:55Z

hi @xhinker , thank for great contribution, but i got error when using your example,

`Downloading (…)ain/model_index.json: 100%
609/609 [00:00<00:00, 27.6kB/s]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:649: FutureWarning: 'cached_download' is the legacy way to download files from the HF hub, please consider upgrading to 'hf_hub_download'
warnings.warn(
Could not locate the pipeline.py inside lpw_stable_diffusion_xl.

HTTPError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
260 try:
--> 261 response.raise_for_status()
262 except HTTPError as e:

9 frames
HTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py

The above exception was the direct cause of the following exception:

HfHubHTTPError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
301 # Convert HTTPError into a HfHubHTTPError to display request information
302 # as well (request id and/or server error message)
--> 303 raise HfHubHTTPError(str(e), response=response) from e
304
305

HfHubHTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py`

with lastest diffusers running on colab, the error show the auto download didnt work, instead of https://raw.githubusercontent.com/huggingface/diffusers/**main**/examples/community/lpw_stable_diffusion_xl.py, it use https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py, can you fix that. Thank you so much

xhinker · 2023-08-23T16:29:20Z

hi @xhinker , thank for great contribution, but i got error when using your example,

`Downloading (…)ain/model_index.json: 100%

609/609 [00:00<00:00, 27.6kB/s]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:649: FutureWarning: 'cached_download' is the legacy way to download files from the HF hub, please consider upgrading to 'hf_hub_download'
warnings.warn(
Could not locate the pipeline.py inside lpw_stable_diffusion_xl.
HTTPError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name) 260 try: --> 261 response.raise_for_status() 262 except HTTPError as e:

9 frames HTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py

The above exception was the direct cause of the following exception:

HfHubHTTPError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name) 301 # Convert HTTPError into a HfHubHTTPError to display request information 302 # as well (request id and/or server error message) --> 303 raise HfHubHTTPError(str(e), response=response) from e 304 305

HfHubHTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py`

with lastest diffusers running on colab, the error show the auto download didnt work, instead of https://raw.githubusercontent.com/huggingface/diffusers/**main**/examples/community/lpw_stable_diffusion_xl.py, it use https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py, can you fix that. Thank you so much

@sayakpaul do you have any idea what causes this error?

sayakpaul · 2023-08-23T16:42:24Z

Is there a reproducible Colab Notebook? Could you reproduce it in a Colab Notebook?

@xhinker

duongnv0499 · 2023-08-23T16:43:56Z

@sayakpaul do you have any idea what causes this error?

it worked if I add :

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl"
    , custom_revision   = "main"
)

but it causes another error when i run

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt)

[~/.cache/huggingface/modules/diffusers_modules/git/lpw_stable_diffusion_xl.py](https://localhost:8080/#) in parse_prompt_attention(text)
     98             res[p][1] *= multiplier
     99 
--> 100     for m in re_attention.finditer(text):
    101         text = m.group(0)
    102         weight = m.group(1)

TypeError: expected string or bytes-like object

*update : you can check my colab here colab notebook

xhinker · 2023-08-23T17:06:54Z

you can walk around this issue by providing an empty neg prompt like this:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl"
    , custom_revision   = "main"
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
neg_prompt = ""
image = pipe(prompt=prompt, negative_prompt = neg_prompt)
image[0][0]

duongnv0499 · 2023-08-24T02:40:08Z

you can walk around this issue by providing an empty neg prompt like this:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl"
    , custom_revision   = "main"
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
neg_prompt = ""
image = pipe(prompt=prompt, negative_prompt = neg_prompt)
image[0][0]

thanks you so much ☺️

xhinker · 2023-08-25T04:40:26Z

you can walk around this issue by providing an empty neg prompt like this:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl"
    , custom_revision   = "main"
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
neg_prompt = ""
image = pipe(prompt=prompt, negative_prompt = neg_prompt)
image[0][0]

thanks you so much ☺️

A new PR fixes this empty negative prompt error issue, now you don't need to provide a empty neg prompt with the newest code.
#4743 (comment)

adhikjoshi · 2023-09-18T08:29:09Z

It doesn't work when there is more than 1 sample,

#5081

xhinker · 2023-09-18T16:15:21Z

@adhikjoshi , will add it when I find time, thanks

jrabek · 2023-10-06T14:22:16Z

@xhinker thanks for the amazing contribution! 🙏

Is it possible to directly use the class rather than indirectly loading it via DiffusionPipeline with custom_pipeline?

I am using some other pipeline mixins that I can only use if I directly reference the class.

But basically should the following work? Sorry if this is more of a diffusers question. It wasn't obvious to me.

self.pipe = StableDiffusionLongPromptWeightingPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            vae=vae,
            torch_dtype=torch.float16,
            variant="fp16",
            use_safetensors=True,
        )

The reason I am asking is I see the following warning message still which seems unexpected? Though in the final image I see the effects of the part of the prompt that is supposedly being removed.

The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens

xhinker · 2023-10-06T15:58:03Z

@xhinker thanks for the amazing contribution! 🙏

Is it possible to directly use the class rather than indirectly loading it via DiffusionPipeline with custom_pipeline?

I am using some other pipeline mixins that I can only use if I directly reference the class.

But basically should the following work? Sorry if this is more of a diffusers question. It wasn't obvious to me.
self.pipe = StableDiffusionLongPromptWeightingPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            vae=vae,
            torch_dtype=torch.float16,
            variant="fp16",
            use_safetensors=True,
        )
The reason I am asking is I see the following warning message still which seems unexpected? Though in the final image I see the effects of the part of the prompt that is supposedly being removed.
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens

Yes, you can use the class directly of cause, just ignore the 77 token warning, it is sourced from the SDXL tokenizer.

wddwzwhhxx · 2023-12-28T08:27:31Z

diffusers/examples/community/lpw_stable_diffusion_xl.py

Line 350 in 4c483de

pooled_prompt_embeds = prompt_embeds_2[0]

`

code

    for i in range(len(prompt_token_groups)):
        # get positive prompt embeddings with weights
        token_tensor = torch.tensor([prompt_token_groups[i]], dtype=torch.long, device=self.device)
        weight_tensor = torch.tensor(prompt_weight_groups[i], dtype=torch.float16, device=self.device)
    
        token_tensor_2 = torch.tensor([prompt_token_groups_2[i]], dtype=torch.long, device=self.device)
    
        # use first text encoder
        prompt_embeds_1 = self.text_encoder(token_tensor.to(self.device), output_hidden_states=True)
        prompt_embeds_1_hidden_states = prompt_embeds_1.hidden_states[-2]
    
        # use second text encoder
        prompt_embeds_2 = self.text_encoder_2(token_tensor_2.to(self.device), output_hidden_states=True)
        prompt_embeds_2_hidden_states = prompt_embeds_2.hidden_states[-2]
        pooled_prompt_embeds = prompt_embeds_2[0]

`

@xhinker
In this code snippet, I noticed that pooled_prompt_embeds is repeatedly reassigned within the loop, and there are no other variables to store the replaced values. Could this potentially cause issues? For instance, in the case of a long prompt being segmented into several parts, pooled_prompt_embeds would only retain the content of the last segment. Looking forward to your response!

xhinker · 2023-12-28T08:32:55Z

diffusers/examples/community/lpw_stable_diffusion_xl.py

Line 350 in 4c483de

pooled_prompt_embeds = prompt_embeds_2[0]

`

code

for i in range(len(prompt_token_groups)): # get positive prompt embeddings with weights token_tensor = torch.tensor([prompt_token_groups[i]], dtype=torch.long, device=self.device) weight_tensor = torch.tensor(prompt_weight_groups[i], dtype=torch.float16, device=self.device)
token_tensor_2 = torch.tensor([prompt_token_groups_2[i]], dtype=torch.long, device=self.device)

# use first text encoder
prompt_embeds_1 = self.text_encoder(token_tensor.to(self.device), output_hidden_states=True)
prompt_embeds_1_hidden_states = prompt_embeds_1.hidden_states[-2]

# use second text encoder
prompt_embeds_2 = self.text_encoder_2(token_tensor_2.to(self.device), output_hidden_states=True)
prompt_embeds_2_hidden_states = prompt_embeds_2.hidden_states[-2]
pooled_prompt_embeds = prompt_embeds_2[0]
`

@xhinker In this code snippet, I noticed that pooled_prompt_embeds is repeatedly reassigned within the loop, and there are no other variables to store the replaced values. Could this potentially cause issues? For instance, in the case of a long prompt being segmented into several parts, pooled_prompt_embeds would only retain the content of the last segment. Looking forward to your response!

While we can stack the embeddings to walkaround the 77 token limitation, but seems we can't apply the same strategy for the pooled embedding. You are right, for now, only the last segment's pooled embedding will be output. I was thinking on this problem before, but don't have a good idea to address it for now, would be happy to know if you have any suggestions. Thanks for reading through the code so carefully.

wddwzwhhxx · 2023-12-28T08:52:31Z

diffusers/examples/community/lpw_stable_diffusion_xl.py

Line 350 in 4c483de

pooled_prompt_embeds = prompt_embeds_2[0]

`

code

for i in range(len(prompt_token_groups)): # get positive prompt embeddings with weights token_tensor = torch.tensor([prompt_token_groups[i]], dtype=torch.long, device=self.device) weight_tensor = torch.tensor(prompt_weight_groups[i], dtype=torch.float16, device=self.device)
token_tensor_2 = torch.tensor([prompt_token_groups_2[i]], dtype=torch.long, device=self.device)

# use first text encoder
prompt_embeds_1 = self.text_encoder(token_tensor.to(self.device), output_hidden_states=True)
prompt_embeds_1_hidden_states = prompt_embeds_1.hidden_states[-2]

# use second text encoder
prompt_embeds_2 = self.text_encoder_2(token_tensor_2.to(self.device), output_hidden_states=True)
prompt_embeds_2_hidden_states = prompt_embeds_2.hidden_states[-2]
pooled_prompt_embeds = prompt_embeds_2[0]
`
@xhinker In this code snippet, I noticed that pooled_prompt_embeds is repeatedly reassigned within the loop, and there are no other variables to store the replaced values. Could this potentially cause issues? For instance, in the case of a long prompt being segmented into several parts, pooled_prompt_embeds would only retain the content of the last segment. Looking forward to your response!
While we can stack the embeddings to walkaround the 77 token limitation, but seems we can't apply the same strategy for the pooled embedding. You are right, for now, only the last segment's pooled embedding will be output. I was thinking on this problem before, but don't have a good idea to address it for now, would be happy to know if you have any suggestions. Thanks for reading through the code so carefully.

I haven't found a suitable solution for now, and changing the shape of pooled_prompt_embeds would lead to errors in subsequent steps…… However, perhaps using the pooled_prompt_embeds from the first segment instead of the last one would be better because, in a prompt exceeding a length of 77, the content of the first segment is often more crucial.

panxiaoguang · 2024-03-15T13:40:38Z

when using playground-v2.5 and this long weighted pipeline simultaneously, I will get an image with extreamly bad quality.

…#4661) * Add SDXL long weighted prompt pipeline * Add SDXL long weighted prompt pipeline usage sample in the readme document * Add SDXL long weighted prompt pipeline usage sample in the readme document, add result image

xhinker added 2 commits August 17, 2023 10:33

Add SDXL long weighted prompt pipeline

0c684be

Merge branch 'main' into sdxl_long_weighted_prompt

feb8b64

xhinker added 2 commits August 17, 2023 22:10

Add SDXL long weighted prompt pipeline usage sample in the readme doc…

6677c54

…ument

Merge branch 'sdxl_long_weighted_prompt' of https://github.com/xhinke…

074f23a

…r/diffusers into sdxl_long_weighted_prompt

Add SDXL long weighted prompt pipeline usage sample in the readme doc…

813b6f3

…ument, add result image

sayakpaul merged commit d7c4ae6 into huggingface:main Aug 18, 2023

xhinker mentioned this pull request Aug 23, 2023

fix sdxl_lwp empty neg_prompt error issue #4743

Merged

6 tasks

xhinker mentioned this pull request Aug 25, 2023

How to handle SDXL long prompt #4716

Closed

MilkClouds mentioned this pull request Nov 15, 2023

fix: enabled num_images_per_prompt>1 for lpw_stable_diffusion_xl (community pipeline) #5807

Merged

6 tasks

Add SDXL long weighted prompt pipeline (replace pr:4629) #4661

Add SDXL long weighted prompt pipeline (replace pr:4629) #4661

Uh oh!

Conversation

xhinker commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 18, 2023

Uh oh!

sayakpaul commented Aug 18, 2023

Uh oh!

xhinker commented Aug 18, 2023

Uh oh!

sayakpaul commented Aug 18, 2023

Uh oh!

xhinker commented Aug 18, 2023

Uh oh!

xhinker commented Aug 18, 2023

Uh oh!

sayakpaul commented Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Skquark commented Aug 19, 2023

Uh oh!

xhinker commented Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Skquark commented Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duongnv0499 commented Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xhinker commented Aug 23, 2023

`Downloading (…)ain/model_index.json: 100%

Uh oh!

sayakpaul commented Aug 23, 2023

Uh oh!

duongnv0499 commented Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xhinker commented Aug 23, 2023

Uh oh!

duongnv0499 commented Aug 24, 2023

Uh oh!

xhinker commented Aug 25, 2023

Uh oh!

adhikjoshi commented Sep 18, 2023

Uh oh!

xhinker commented Sep 18, 2023

Uh oh!

jrabek commented Oct 6, 2023

Uh oh!

xhinker commented Oct 6, 2023

Uh oh!

wddwzwhhxx commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

code

Uh oh!

xhinker commented Dec 28, 2023

code

Uh oh!

wddwzwhhxx commented Dec 28, 2023

code

Uh oh!

panxiaoguang commented Mar 15, 2024

Uh oh!

Uh oh!

xhinker commented Aug 17, 2023 •

edited

Loading

sayakpaul commented Aug 18, 2023 •

edited

Loading

xhinker commented Aug 19, 2023 •

edited

Loading

Skquark commented Aug 19, 2023 •

edited

Loading

duongnv0499 commented Aug 23, 2023 •

edited

Loading

duongnv0499 commented Aug 23, 2023 •

edited

Loading

wddwzwhhxx commented Dec 28, 2023 •

edited

Loading