Skip to content

Add SDXL long weighted prompt pipeline (replace pr:4629) #4661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 18, 2023

Conversation

xhinker
Copy link
Contributor

@xhinker xhinker commented Aug 17, 2023

What does this PR do?

replace PR:#4629

This PR added pipeline accepts unlimited size prompt and negative prompt string compatible with A1111 prompt weighting format.

Fixes: #4559

Before submitting

Who can review?

@sayakpaul I recreate a complete new fork and add the new code in this PR, hope no additional commits are coming along, will update the document and provide sample codes once this PR is done. Thanks

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl",
)

prompt = "a (white) cat running on the grass"*20
prompt2 = "play a (football:1.3)"*20
prompt = f"{prompt},{prompt2}"
neg_prompt = "blur, low quality"

pipe.to("cuda")
images = pipe(
    prompt                  = prompt 
    , negative_prompt       = neg_prompt 
).images[0]

pipe.to("cpu")
torch.cuda.empty_cache()
images

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@sayakpaul
Copy link
Member

Thanks for this. I think you forgot to add the example in the README. Once that's done I think we should be ready merge and ship 🚀

@xhinker
Copy link
Contributor Author

xhinker commented Aug 18, 2023

Thanks for this. I think you forgot to add the example in the README. Once that's done I think we should be ready merge and ship 🚀

sample added. thanks @sayakpaul

@sayakpaul
Copy link
Member

I saw a PR on our documentation-images repo from you. I just merged it. Do you want to include a link to that sample somewhere in the README?

@xhinker
Copy link
Contributor Author

xhinker commented Aug 18, 2023

I saw a PR on our documentation-images repo from you. I just merged it. Do you want to include a link to that sample somewhere in the README?

Great, let me add the image to the Readme doc. give me a second :)

@xhinker
Copy link
Contributor Author

xhinker commented Aug 18, 2023

I saw a PR on our documentation-images repo from you. I just merged it. Do you want to include a link to that sample somewhere in the README?

image added to the readme doc

@sayakpaul sayakpaul merged commit d7c4ae6 into huggingface:main Aug 18, 2023
@sayakpaul
Copy link
Member

sayakpaul commented Aug 18, 2023

Thanks for your valuable contribution!

@Skquark
Copy link

Skquark commented Aug 19, 2023

This is great, planning on integrating but I'm slightly confused on a couple things. In the example, what's going on with prompt = "text"*20 then combining it with prompt2 = "text continuation"*20 to pass to the pipeline? Why are we multiplying, and how would I treat a normal positive prompt whether it's short or long?
Also, will there be img2img or inpainting added into it? How would we handle the refiner step here? Thanks, good stuff...

@xhinker
Copy link
Contributor Author

xhinker commented Aug 19, 2023

This is great, planning on integrating but I'm slightly confused on a couple things. In the example, what's going on with prompt = "text"*20 then combining it with prompt2 = "text continuation"*20 to pass to the pipeline? Why are we multiplying, and how would I treat a normal positive prompt whether it's short or long? Also, will there be img2img or inpainting added into it? How would we handle the refiner step here? Thanks, good stuff...

prompt = "text"*20 is simple make a string longer, prompt2 = "play a (football:1.3)"*20 is to test, keywords added after 77 tokens will be reflected in the result. you can simply create a long prompt such as prompt = "loooooooong (weighted) prompt".

In terms of img2img and inpainting, thinking provide a embedding function, so that long weighted prompted can be used for any SDXL based models.

Will look into the internal of refiner and see how it works, and may build one for the refiner

@Skquark
Copy link

Skquark commented Aug 19, 2023

Ah, that makes more sense, was just a little confusing, maybe should replace example with actual long prompt with multiple syntaxes including ((double positive)), [negatives] and such..
SDXL differences are starting to make more sense, finally got most of it working in my DiffusionDeluxe.com app and was using lpw as the default mega pipline because I didn't have to reload pipelines when switching from txt2img & img2img mode, so would be nice to continue using this in addition to Compel as option..
For the refiner, it'd just need the img2img function like the original LPW.. No rush, but looks doable after looking though the code. Thanks.

@duongnv0499
Copy link

duongnv0499 commented Aug 23, 2023

hi @xhinker , thank for great contribution, but i got error when using your example,

`Downloading (…)ain/model_index.json: 100%
609/609 [00:00<00:00, 27.6kB/s]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:649: FutureWarning: 'cached_download' is the legacy way to download files from the HF hub, please consider upgrading to 'hf_hub_download'
warnings.warn(
Could not locate the pipeline.py inside lpw_stable_diffusion_xl.

HTTPError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
260 try:
--> 261 response.raise_for_status()
262 except HTTPError as e:

9 frames
HTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py

The above exception was the direct cause of the following exception:

HfHubHTTPError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
301 # Convert HTTPError into a HfHubHTTPError to display request information
302 # as well (request id and/or server error message)
--> 303 raise HfHubHTTPError(str(e), response=response) from e
304
305

HfHubHTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py`

with lastest diffusers running on colab, the error show the auto download didnt work, instead of https://raw.githubusercontent.com/huggingface/diffusers/**main**/examples/community/lpw_stable_diffusion_xl.py, it use https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py, can you fix that. Thank you so much

@xhinker
Copy link
Contributor Author

xhinker commented Aug 23, 2023

hi @xhinker , thank for great contribution, but i got error when using your example,

`Downloading (…)ain/model_index.json: 100%

609/609 [00:00<00:00, 27.6kB/s]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:649: FutureWarning: 'cached_download' is the legacy way to download files from the HF hub, please consider upgrading to 'hf_hub_download'
warnings.warn(
Could not locate the pipeline.py inside lpw_stable_diffusion_xl.
HTTPError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name) 260 try: --> 261 response.raise_for_status() 262 except HTTPError as e:

9 frames HTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py

The above exception was the direct cause of the following exception:

HfHubHTTPError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name) 301 # Convert HTTPError into a HfHubHTTPError to display request information 302 # as well (request id and/or server error message) --> 303 raise HfHubHTTPError(str(e), response=response) from e 304 305

HfHubHTTPError: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py`

with lastest diffusers running on colab, the error show the auto download didnt work, instead of https://raw.githubusercontent.com/huggingface/diffusers/**main**/examples/community/lpw_stable_diffusion_xl.py, it use https://raw.githubusercontent.com/huggingface/diffusers/v0.20.0/examples/community/lpw_stable_diffusion_xl.py, can you fix that. Thank you so much

@sayakpaul do you have any idea what causes this error?

@sayakpaul
Copy link
Member

Is there a reproducible Colab Notebook? Could you reproduce it in a Colab Notebook?

@xhinker

@duongnv0499
Copy link

duongnv0499 commented Aug 23, 2023

@sayakpaul do you have any idea what causes this error?

it worked if I add :

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl"
    , custom_revision   = "main"
)

but it causes another error when i run

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt)
[~/.cache/huggingface/modules/diffusers_modules/git/lpw_stable_diffusion_xl.py](https://localhost:8080/#) in parse_prompt_attention(text)
     98             res[p][1] *= multiplier
     99 
--> 100     for m in re_attention.finditer(text):
    101         text = m.group(0)
    102         weight = m.group(1)

TypeError: expected string or bytes-like object

*update : you can check my colab here colab notebook

@xhinker
Copy link
Contributor Author

xhinker commented Aug 23, 2023

you can walk around this issue by providing an empty neg prompt like this:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl"
    , custom_revision   = "main"
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
neg_prompt = ""
image = pipe(prompt=prompt, negative_prompt = neg_prompt)
image[0][0]

@duongnv0499
Copy link

you can walk around this issue by providing an empty neg prompt like this:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl"
    , custom_revision   = "main"
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
neg_prompt = ""
image = pipe(prompt=prompt, negative_prompt = neg_prompt)
image[0][0]

thanks you so much ☺️

@xhinker
Copy link
Contributor Author

xhinker commented Aug 25, 2023

you can walk around this issue by providing an empty neg prompt like this:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
    , torch_dtype       = torch.float16
    , use_safetensors   = True
    , variant           = "fp16"
    , custom_pipeline   = "lpw_stable_diffusion_xl"
    , custom_revision   = "main"
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
neg_prompt = ""
image = pipe(prompt=prompt, negative_prompt = neg_prompt)
image[0][0]

thanks you so much ☺️

A new PR fixes this empty negative prompt error issue, now you don't need to provide a empty neg prompt with the newest code.
#4743 (comment)

@adhikjoshi
Copy link

It doesn't work when there is more than 1 sample,

#5081

@xhinker
Copy link
Contributor Author

xhinker commented Sep 18, 2023

@adhikjoshi , will add it when I find time, thanks

@jrabek
Copy link

jrabek commented Oct 6, 2023

@xhinker thanks for the amazing contribution! 🙏

Is it possible to directly use the class rather than indirectly loading it via DiffusionPipeline with custom_pipeline?

I am using some other pipeline mixins that I can only use if I directly reference the class.

But basically should the following work? Sorry if this is more of a diffusers question. It wasn't obvious to me.

self.pipe = StableDiffusionLongPromptWeightingPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            vae=vae,
            torch_dtype=torch.float16,
            variant="fp16",
            use_safetensors=True,
        )

The reason I am asking is I see the following warning message still which seems unexpected? Though in the final image I see the effects of the part of the prompt that is supposedly being removed.

The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens

@xhinker
Copy link
Contributor Author

xhinker commented Oct 6, 2023

@xhinker thanks for the amazing contribution! 🙏

Is it possible to directly use the class rather than indirectly loading it via DiffusionPipeline with custom_pipeline?

I am using some other pipeline mixins that I can only use if I directly reference the class.

But basically should the following work? Sorry if this is more of a diffusers question. It wasn't obvious to me.

self.pipe = StableDiffusionLongPromptWeightingPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            vae=vae,
            torch_dtype=torch.float16,
            variant="fp16",
            use_safetensors=True,
        )

The reason I am asking is I see the following warning message still which seems unexpected? Though in the final image I see the effects of the part of the prompt that is supposedly being removed.

The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens

Yes, you can use the class directly of cause, just ignore the 77 token warning, it is sourced from the SDXL tokenizer.

@wddwzwhhxx
Copy link

wddwzwhhxx commented Dec 28, 2023

pooled_prompt_embeds = prompt_embeds_2[0]

`

code

    for i in range(len(prompt_token_groups)):
        # get positive prompt embeddings with weights
        token_tensor = torch.tensor([prompt_token_groups[i]], dtype=torch.long, device=self.device)
        weight_tensor = torch.tensor(prompt_weight_groups[i], dtype=torch.float16, device=self.device)
    
        token_tensor_2 = torch.tensor([prompt_token_groups_2[i]], dtype=torch.long, device=self.device)
    
        # use first text encoder
        prompt_embeds_1 = self.text_encoder(token_tensor.to(self.device), output_hidden_states=True)
        prompt_embeds_1_hidden_states = prompt_embeds_1.hidden_states[-2]
    
        # use second text encoder
        prompt_embeds_2 = self.text_encoder_2(token_tensor_2.to(self.device), output_hidden_states=True)
        prompt_embeds_2_hidden_states = prompt_embeds_2.hidden_states[-2]
        pooled_prompt_embeds = prompt_embeds_2[0]

`

@xhinker
In this code snippet, I noticed that pooled_prompt_embeds is repeatedly reassigned within the loop, and there are no other variables to store the replaced values. Could this potentially cause issues? For instance, in the case of a long prompt being segmented into several parts, pooled_prompt_embeds would only retain the content of the last segment. Looking forward to your response!

@xhinker
Copy link
Contributor Author

xhinker commented Dec 28, 2023

pooled_prompt_embeds = prompt_embeds_2[0]

`

code

for i in range(len(prompt_token_groups)): # get positive prompt embeddings with weights token_tensor = torch.tensor([prompt_token_groups[i]], dtype=torch.long, device=self.device) weight_tensor = torch.tensor(prompt_weight_groups[i], dtype=torch.float16, device=self.device)

token_tensor_2 = torch.tensor([prompt_token_groups_2[i]], dtype=torch.long, device=self.device)

# use first text encoder
prompt_embeds_1 = self.text_encoder(token_tensor.to(self.device), output_hidden_states=True)
prompt_embeds_1_hidden_states = prompt_embeds_1.hidden_states[-2]

# use second text encoder
prompt_embeds_2 = self.text_encoder_2(token_tensor_2.to(self.device), output_hidden_states=True)
prompt_embeds_2_hidden_states = prompt_embeds_2.hidden_states[-2]
pooled_prompt_embeds = prompt_embeds_2[0]

`

@xhinker In this code snippet, I noticed that pooled_prompt_embeds is repeatedly reassigned within the loop, and there are no other variables to store the replaced values. Could this potentially cause issues? For instance, in the case of a long prompt being segmented into several parts, pooled_prompt_embeds would only retain the content of the last segment. Looking forward to your response!

While we can stack the embeddings to walkaround the 77 token limitation, but seems we can't apply the same strategy for the pooled embedding. You are right, for now, only the last segment's pooled embedding will be output. I was thinking on this problem before, but don't have a good idea to address it for now, would be happy to know if you have any suggestions. Thanks for reading through the code so carefully.

@wddwzwhhxx
Copy link

pooled_prompt_embeds = prompt_embeds_2[0]

`

code

for i in range(len(prompt_token_groups)): # get positive prompt embeddings with weights token_tensor = torch.tensor([prompt_token_groups[i]], dtype=torch.long, device=self.device) weight_tensor = torch.tensor(prompt_weight_groups[i], dtype=torch.float16, device=self.device)

token_tensor_2 = torch.tensor([prompt_token_groups_2[i]], dtype=torch.long, device=self.device)

# use first text encoder
prompt_embeds_1 = self.text_encoder(token_tensor.to(self.device), output_hidden_states=True)
prompt_embeds_1_hidden_states = prompt_embeds_1.hidden_states[-2]

# use second text encoder
prompt_embeds_2 = self.text_encoder_2(token_tensor_2.to(self.device), output_hidden_states=True)
prompt_embeds_2_hidden_states = prompt_embeds_2.hidden_states[-2]
pooled_prompt_embeds = prompt_embeds_2[0]

`
@xhinker In this code snippet, I noticed that pooled_prompt_embeds is repeatedly reassigned within the loop, and there are no other variables to store the replaced values. Could this potentially cause issues? For instance, in the case of a long prompt being segmented into several parts, pooled_prompt_embeds would only retain the content of the last segment. Looking forward to your response!

While we can stack the embeddings to walkaround the 77 token limitation, but seems we can't apply the same strategy for the pooled embedding. You are right, for now, only the last segment's pooled embedding will be output. I was thinking on this problem before, but don't have a good idea to address it for now, would be happy to know if you have any suggestions. Thanks for reading through the code so carefully.

I haven't found a suitable solution for now, and changing the shape of pooled_prompt_embeds would lead to errors in subsequent steps…… However, perhaps using the pooled_prompt_embeds from the first segment instead of the last one would be better because, in a prompt exceeding a length of 77, the content of the first segment is often more crucial.

@panxiaoguang
Copy link

when using playground-v2.5 and this long weighted pipeline simultaneously, I will get an image with extreamly bad quality.

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…#4661)

* Add SDXL long weighted prompt pipeline

* Add SDXL long weighted prompt pipeline usage sample in the readme document

* Add SDXL long weighted prompt pipeline usage sample in the readme document, add result image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

long weighted prompt support for SDXL?
9 participants