[SD-XL] Ability to easily split prompt over the two text encoders #4004

bghira · 2023-07-09T00:44:17Z

Is your feature request related to a problem? Please describe.
SDXL 0.9 comes with a new dual text encoder pipeline.

OpenCLIP ViT-bigG/14 and CLIP-L are both paired up in this pipeline. When running through ComfyUI, the CLIP nodes allow for inputting different pieces of the prompt, to different encoders. The default configuration is like ours, and the same prompt is handed to both encoders.

However, the creative outcome of having additional flexibility of treating the entire embed space as a single concat over the whole prompt's context, drastically alters the results.

Describe the solution you'd like
We are interested in adding optional parameters to the SDXL Base and Img2Img pipelines to allow this flexibility.

prompt_2 and negative_prompt_2 would be great names, as they match the naming convention of text_encoder_2/tokenizer_2

Describe alternatives you've considered

Creating a custom pipeline, which does not force-multiply our efforts.

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2023-07-09T16:05:44Z

Hey @bghira,

Can you show me an example of where providing different text prompts for each text encoder gives much better results? Also, we allow the user to directly provide prompt embeds, so I wonder if this is not enough to cover this use case? #3995

bghira · 2023-07-09T16:24:23Z

nor i or anyone i've asked haven't been able to get the prompt embeds working, and imo having a built in way of doing this seems like it would be really beneficial without having users need to pull Compel in, which they may not be comfortable with.

The subject portion of the prompt in OpenCLIP, and style in CLIP-L:

The subject portion of prompt in CLIP-L and the style in OpenCLIP:

The subject and style prompt in CLIP-L, with OpenCLIP as unconditonal guidance:

The subject and style prompt in OpenCLIP, with CLIP-L as unconditional guidance:

Both encoders have both portions of the prompt

sayakpaul · 2023-07-11T11:34:52Z

Spectacular results!

nor i or anyone i've asked haven't been able to get the prompt embeds working

Could you expand on this further? Do you mean passing prompt_embeds don't work with our SDXL pipeline?

patrickvonplaten · 2023-07-11T12:37:30Z

Ok this seems to make a lot of sense then, thanks for the results! Think it shouldn't be too difficult to support it wit has you say a prompt_2 and negative_prompt_2 input, ok for me to add this! @bghira would you like to give the PR a try? :-)

bghira · 2023-07-11T17:57:16Z

@patrickvonplaten i know my limits, and text embeds seem to be one :D i simply propose the idea for others who willing to take it up and understand these components better.

bghira · 2023-07-11T18:00:47Z

Could you expand on this further? Do you mean passing prompt_embeds don't work with our SDXL pipeline?

the Compel pull request wasn't yet available when I was messing around with them. I tried extracting relevant bits from the XL pipeline and just wasn't able to figure it out. there's not a lot of documentation on this level that makes sense to lesser-informed individuals like myself, so i'm never sure why i'm getting this or that dimensionality error. it's just guessing, digging in with print(f'') statements, and spending an inordinate amount of time looking at things I don't understand.

I haven't gone on to try the Compel PR yet because then yesterday I was stuck on 4003 issues before I realised, the whole pipeline architecture of Diffusers has off-by-one errors. I feel like this kind of subtle behaviour is really going to bite me again when I go back into text embeds.

ergo, it is not something I feel I can accomplish.

patrickvonplaten · 2023-07-13T15:24:24Z

Actually I think we can have prompt_embeds with Compel as well as a very easy user-interface with prompt_2 and negative_prompt_2 :-) So if you'd like to add this, I'm more than happy to review a PR!

MercuryOoO · 2024-01-27T19:58:36Z

nor i or anyone i've asked haven't been able to get the prompt embeds working, and imo having a built in way of doing this seems like it would be really beneficial without having users need to pull Compel in, which they may not be comfortable with.

The subject portion of the prompt in OpenCLIP, and style in CLIP-L:

The subject portion of prompt in CLIP-L and the style in OpenCLIP:

The subject and style prompt in CLIP-L, with OpenCLIP as unconditonal guidance:

The subject and style prompt in OpenCLIP, with CLIP-L as unconditional guidance:

Both encoders have both portions of the prompt

Off-topic, since I didn't find a way to send you a private message, I took the liberty of asking you here. Can you tell me what model and prompt you used to generate these images?

bghira · 2024-01-29T14:31:01Z

that is my model, Terminus. though it is a much earlier version. I don't have the prompt anymore.

MercuryOoO · 2024-01-29T18:04:44Z

Terminus

Thanks!

apolinario mentioned this issue Jul 19, 2023

Allow passing different prompts to each text_encoder on stable_diffusion_xl pipelines #4156

Merged

6 tasks

patrickvonplaten closed this as completed in #4156 Jul 21, 2023

dtlnor mentioned this issue Aug 11, 2023

[Feature Request]: Allow explicit use of SDXL two different text encoders AUTOMATIC1111/stable-diffusion-webui#12141

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SD-XL] Ability to easily split prompt over the two text encoders #4004

[SD-XL] Ability to easily split prompt over the two text encoders #4004

bghira commented Jul 9, 2023

patrickvonplaten commented Jul 9, 2023 •

edited

Loading

bghira commented Jul 9, 2023

sayakpaul commented Jul 11, 2023

patrickvonplaten commented Jul 11, 2023

bghira commented Jul 11, 2023

bghira commented Jul 11, 2023

patrickvonplaten commented Jul 13, 2023

MercuryOoO commented Jan 27, 2024

bghira commented Jan 29, 2024

MercuryOoO commented Jan 29, 2024

[SD-XL] Ability to easily split prompt over the two text encoders #4004

[SD-XL] Ability to easily split prompt over the two text encoders #4004

Comments

bghira commented Jul 9, 2023

patrickvonplaten commented Jul 9, 2023 • edited Loading

bghira commented Jul 9, 2023

sayakpaul commented Jul 11, 2023

patrickvonplaten commented Jul 11, 2023

bghira commented Jul 11, 2023

bghira commented Jul 11, 2023

patrickvonplaten commented Jul 13, 2023

MercuryOoO commented Jan 27, 2024

bghira commented Jan 29, 2024

MercuryOoO commented Jan 29, 2024

patrickvonplaten commented Jul 9, 2023 •

edited

Loading