add models for T2I-Adapter-XL #4696

MC-E · 2023-08-21T14:19:52Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

MC-E · 2023-08-21T14:24:11Z

We add T2I-Adapter-XL models to support the control on SDXL.

sayakpaul · 2023-08-21T14:43:16Z

@MC-E this looks good to me! Do we have any checkpoints that we can test this with?

HuggingFaceDocBuilderDev · 2023-08-21T14:50:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

patil-suraj

Very cool and minimal. just left two small comments.

src/diffusers/models/adapter.py

sayakpaul · 2023-08-22T03:59:58Z

@MC-E I tried loading an existing checkpoint here: https://colab.research.google.com/gist/sayakpaul/e9ed999df5714a6c9bb9c0a06cc9922a/scratchpad.ipynb. But I am currently facing issues. Could you look into it?

MC-E · 2023-08-22T04:03:32Z

@MC-E I tried loading an existing checkpoint here: https://colab.research.google.com/gist/sayakpaul/e9ed999df5714a6c9bb9c0a06cc9922a/scratchpad.ipynb. But I am currently facing issues. Could you look into it?

It seems that all the key prefixes have an additional "adapter."?

sayakpaul · 2023-08-22T04:15:23Z

@MC-E I did the following:

final_state_dict = {f"adapter.{k}": v for k, v in state_dict_new.items()}

xl_adapter = T2IAdapter(adapter_type="full_adapter_xl", downscale_factor=16)
xl_adapter.load_state_dict(final_state_dict)

It leads to:

RuntimeError: Error(s) in loading state_dict for T2IAdapter:
	size mismatch for adapter.conv_in.weight: copying a param with shape torch.Size([320, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 768, 3, 3]).

Anything I am missing out on?

MC-E · 2023-08-22T04:19:29Z

Please follow the configs here https://github.com/TencentARC/T2I-Adapter/tree/XL/configs/inference; The number of input channels for OpenPose is three times that of others.

sayakpaul · 2023-08-22T04:20:59Z

I am using canny here:

from huggingface_hub import hf_hub_download

repo_id = "TencentARC/T2I-Adapter"
filepath = "models_XL/adapter-xl-canny.pth"

state_dict_path = hf_hub_download(repo_id=repo_id, filename=filepath)

MC-E · 2023-08-22T04:23:24Z

@sayakpaul The input channels need to be 1. It seems that you set it as 3

MC-E · 2023-08-22T04:24:03Z

Happy to assist you. Feel free to ask any questions:)

sayakpaul · 2023-08-22T04:26:54Z

Yup, that fixed it :)

sayakpaul · 2023-08-22T04:37:06Z

@MC-E I created this conversion script for easy sharing and distribution: https://gist.github.com/sayakpaul/9a888466865bc0844d8979a2b2821c63. Let's add this maybe to the scripts directory. This will give a config file and the state dict file in safetensors resembling what we have here: https://huggingface.co/TencentARC/t2iadapter_seg_sd14v1/tree/main.

Also, since you have already incorporated the feedback provided by @patil-suraj, I think for newer checkpoints (assuming we would use these blocks) we won't require any conversion.

sayakpaul · 2023-08-22T04:42:03Z

I think the next step would be to:

Generate a diffusers variant of the checkpoints existing here: https://huggingface.co/TencentARC/T2I-Adapter/tree/main/models_XL.
Make changes to https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_adapter.py for SDXL-specific changes.
Run inference with the converted checkpoint (from pt. 1) to ensure everything is working as expected.

Pt 2 mostly involves:

Writing a new encode_prompt() method. We can readily use:

diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

Line 219 in 8d30d25

def encode_prompt(

.
Change how the unet is called following:

diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

Line 812 in 8d30d25

noise_pred = self.unet(

.

We can add a new pipeline for this: pipeline_stable_diffusion_xl_adapter.py.

Let me know if you face any difficulties here.

Then in a follow-up PR, we can add the training script to start our full-length experiments.

MC-E · 2023-08-23T03:48:07Z

@sayakpaul The XL pipeline is added. The hf models are uploaded at: https://huggingface.co/Adapter/t2iadapter/tree/main

MC-E · 2023-08-23T03:50:14Z

@sayakpaul The test code is:

import torch

from diffusers import (
    T2IAdapter,
    StableDiffusionXLAdapterPipeline,
    DDPMScheduler,
)
from diffusers.utils import load_image

sketch_image = load_image('https://huggingface.co/Adapter/t2iadapter/resolve/main/sketch.png')
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'

adapter = T2IAdapter.from_pretrained("Adapter/t2iadapter", subfolder='sketch_sdxl_1.0',torch_dtype=torch.float16, adapter_type="full_adapter_xl")
scheduler = DDPMScheduler.from_pretrained(model_id, subfolder="scheduler")

pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    model_id, adapter=adapter, torch_dtype=torch.float16, variant="fp16", scheduler=scheduler
)

pipe.to('cuda')
generator = torch.manual_seed(42)
sketch_image_out = pipe(prompt='a photo of a dog in real world, high quality', negative_prompt='extra digit, fewer digits, cropped, worst quality, low quality', image=sketch_image, generator=generator, guidance_scale=7.5).images[0]

sketch_image_out.save('sketch_image_out.png')

sayakpaul · 2023-08-23T04:05:58Z

@MC-E thanks for providing the code snippet and for your contributions!

A couple of things:

I simplified your snippet so that it's easier to follow. I hope that's okay. Could you please also leave a link to edge_dog.png so that we can try it out?
Do we need to ensure the inputs images of (1024, 1024) resolution as SDXL base model is typically good at that resolution?
The default Euler Discrete scheduler of SDXL works well with a reduced guidance_scale (5.0 for example). So, I would try to reduce the num_inference_steps and guidance_scale and see the results.

src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py

src/diffusers/pipelines/__init__.py

sayakpaul · 2023-08-29T04:41:46Z

@MC-E I think you accidentally borked the commit history. It shouldn't be like that.

MC-E · 2023-08-29T04:50:51Z

In order to resolve conflicts, I perform a rebase on this PR:(

sayakpaul · 2023-08-29T05:05:44Z

Thanks for your amazing contribution!

MC-E · 2023-08-29T05:09:52Z

Thank you for your help:)

sayakpaul · 2023-08-29T05:14:33Z

The documentation is updated: https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/adapter#usage-example-with-the-base-model-of-stablediffusionxl.

MC-E · 2023-08-29T05:20:28Z

Okay, thanks:)

* T2I-Adapter-XL * update * update * add pipeline * modify pipeline * modify pipeline * modify pipeline * modify pipeline * modify pipeline * modify modeling_text_unet * fix styling. * fix: copies. * adapter settings * new test case * new test case * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * revert prints. * new test case * remove print * org test case * add test_pipeline * styling. * fix copies. * modify test parameter * style. * add adapter-xl doc * double quotes in docs * Fix potential type mismatch * style. --------- Co-authored-by: sayakpaul <[email protected]>

patil-suraj reviewed Aug 21, 2023

View reviewed changes

src/diffusers/models/adapter.py Outdated Show resolved Hide resolved

src/diffusers/models/adapter.py Outdated Show resolved Hide resolved

sayakpaul reviewed Aug 22, 2023

View reviewed changes

src/diffusers/models/adapter.py Outdated Show resolved Hide resolved