Skip to content

Support for manual CLIP loading in StableDiffusionPipeline - txt2img. #3832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 28, 2023
Merged

Support for manual CLIP loading in StableDiffusionPipeline - txt2img. #3832

merged 6 commits into from
Jun 28, 2023

Conversation

WadRex
Copy link
Contributor

@WadRex WadRex commented Jun 20, 2023

What does this PR do?

Fixes #3822

This pull request introduces a feature that enhances the loading mechanism of the CLIP model when used in conjunction with StableDiffusionPipeline.from_ckpt(). This PR affects only the txt2img part of the mentioned pipeline. Users now have the flexibility to manually load the CLIP model and tokenizer, thereby bypassing the force loading behavior. This PR resolves the challenge of creating a fully portable solution, as the CLIP model and tokenizer would previously end up in the cache.
With this enhancement, users can now specify their desired CLIP model and tokenizer location as follows:

# Users have the option to choose the official Hugging Face repository: `openai/clip-vit-large-patch14`,
# or provide the local path to load CLIP from.
clip_text_model = CLIPTextModel.from_pretrained("repo/id/or/path/to/local/folder")
clip_tokenizer = CLIPTokenizer.from_pretrained("repo/id/or/path/to/local/folder")

# Subsequently, these parameters are passed into the `StableDiffusionPipeline` function.
pipeline = StableDiffusionPipeline.from_ckpt("path/to/single/safetensors/or/bin/file", 
                                              clip_text_model=clip_text_model,
                                              clip_tokenizer=clip_tokenizer)

It is important to note that this pull request does not impact the behavior if users choose not to provide any parameters for clip_text_model or clip_tokenizer. In such cases, the code will behave as it did before the PR.

Before submitting

Who can review?

@patrickvonplaten
@sayakpaul

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! @sayakpaul wdyt?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 21, 2023

The documentation is not available anymore as the PR was closed or merged.

Comment on lines 1342 to 1345
clip_text_model (`transformers.models.clip.modeling_clip.CLIPTextModel`, *optional*, defaults to `None`):
An instance of `CLIPTextModel` to use. If this parameter is `None`, the function will load a new instance of `CLIPTextModel`, if needed.
clip_tokenizer (`transformers.models.clip.tokenization_clip.CLIPTokenizer`, *optional*, defaults to `None`):
An instance of `CLIPTokenizer` to use. If this parameter is `None`, the function will load a new instance of `CLIPTokenizer`, if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe follow something similar to for the docstrings here?

Comment on lines 1076 to 1079
clip_text_model (`transformers.models.clip.modeling_clip.CLIPTextModel`, *optional*, defaults to `None`):
An instance of `CLIPTextModel` to use. If this parameter is `None`, the function will load a new instance of `CLIPTextModel`, if needed.
clip_tokenizer (`transformers.models.clip.tokenization_clip.CLIPTokenizer`, *optional*, defaults to `None`):
An instance of `CLIPTokenizer` to use. If this parameter is `None`, the function will load a new instance of `CLIPTokenizer`, if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this important PR! Left some nits.

@sayakpaul
Copy link
Member

Regarding the failing test here, could you run make style && make quality from your environment? More info is available on: https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md

@WadRex
Copy link
Contributor Author

WadRex commented Jun 24, 2023

@patrickvonplaten
@sayakpaul
Are there any more requirements or changes needed to successfully merge this PR?

@patrickvonplaten patrickvonplaten merged commit 1500130 into huggingface:main Jun 28, 2023
@WadRex WadRex deleted the fix-clip-forceload-txt2img branch July 8, 2023 19:45
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
…huggingface#3832)

* Support for manual CLIP loading in StableDiffusionPipeline - txt2img.

* Update src/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py

* Update variables & according docs to match previous style.

* Updated to match style & quality of 'diffusers'

---------

Co-authored-by: Patrick von Platen <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…huggingface#3832)

* Support for manual CLIP loading in StableDiffusionPipeline - txt2img.

* Update src/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py

* Update variables & according docs to match previous style.

* Updated to match style & quality of 'diffusers'

---------

Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StableDiffusionPipeline() and CLIP "cooperation"
4 participants