-
Notifications
You must be signed in to change notification settings - Fork 6k
[SD-XL] Ability to easily split prompt over the two text encoders #4004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Spectacular results!
Could you expand on this further? Do you mean passing |
Ok this seems to make a lot of sense then, thanks for the results! Think it shouldn't be too difficult to support it wit has you say a |
@patrickvonplaten i know my limits, and text embeds seem to be one :D i simply propose the idea for others who willing to take it up and understand these components better. |
the Compel pull request wasn't yet available when I was messing around with them. I tried extracting relevant bits from the XL pipeline and just wasn't able to figure it out. there's not a lot of documentation on this level that makes sense to lesser-informed individuals like myself, so i'm never sure why i'm getting this or that dimensionality error. it's just guessing, digging in with print(f'') statements, and spending an inordinate amount of time looking at things I don't understand. I haven't gone on to try the Compel PR yet because then yesterday I was stuck on 4003 issues before I realised, the whole pipeline architecture of Diffusers has off-by-one errors. I feel like this kind of subtle behaviour is really going to bite me again when I go back into text embeds. ergo, it is not something I feel I can accomplish. |
Actually I think we can have prompt_embeds with Compel as well as a very easy user-interface with |
that is my model, Terminus. though it is a much earlier version. I don't have the prompt anymore. |
Thanks! |
Is your feature request related to a problem? Please describe.
SDXL 0.9 comes with a new dual text encoder pipeline.
OpenCLIP ViT-bigG/14 and CLIP-L are both paired up in this pipeline. When running through ComfyUI, the CLIP nodes allow for inputting different pieces of the prompt, to different encoders. The default configuration is like ours, and the same prompt is handed to both encoders.
However, the creative outcome of having additional flexibility of treating the entire embed space as a single concat over the whole prompt's context, drastically alters the results.
Describe the solution you'd like
We are interested in adding optional parameters to the SDXL Base and Img2Img pipelines to allow this flexibility.
prompt_2
andnegative_prompt_2
would be great names, as they match the naming convention oftext_encoder_2
/tokenizer_2
Describe alternatives you've considered
The text was updated successfully, but these errors were encountered: