Skip to content

Commit aedd787

Browse files
authored
[docs] ControlNet guide (huggingface#4640)
* first draft * finish first draft * feedback and remove sections from API pages * clean docstrings * add full code example
1 parent 7caa368 commit aedd787

File tree

9 files changed

+878
-774
lines changed

9 files changed

+878
-774
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@
6464
title: Overview
6565
- local: using-diffusers/sdxl
6666
title: Stable Diffusion XL
67+
- local: using-diffusers/controlnet
68+
title: ControlNet
6769
- local: using-diffusers/distilled_sd
6870
title: Distilled Stable Diffusion inference
6971
- local: using-diffusers/reproducibility

docs/source/en/api/pipelines/controlnet.md

Lines changed: 13 additions & 283 deletions
Large diffs are not rendered by default.

docs/source/en/api/pipelines/controlnet_sdxl.md

Lines changed: 15 additions & 131 deletions
Original file line numberDiff line numberDiff line change
@@ -12,151 +12,35 @@ specific language governing permissions and limitations under the License.
1212

1313
# ControlNet with Stable Diffusion XL
1414

15-
[Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
15+
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
1616

17-
Using a pretrained model, we can provide control images (for example, a depth map) to control Stable Diffusion text-to-image generation so that it follows the structure of the depth image and fills in the details.
17+
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
1818

1919
The abstract from the paper is:
2020

2121
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
2222

23-
We provide support using ControlNets with [Stable Diffusion XL](./stable_diffusion/stable_diffusion_xl.md) (SDXL).
23+
You can find additional smaller Stable Diffusion XL (SDXL) ControlNet checkpoints from the 🤗 [Diffusers](https://huggingface.co/diffusers) Hub organization, and browse [community-trained](https://huggingface.co/models?other=stable-diffusion-xl&other=controlnet) checkpoints on the Hub.
2424

25-
You can find numerous SDXL ControlNet checkpoints from [this link](https://huggingface.co/models?other=stable-diffusion-xl&other=controlnet). There are some smaller ControlNet checkpoints too:
25+
<Tip warning={true}>
2626

27-
* [controlnet-canny-sdxl-1.0-small](https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small)
28-
* [controlnet-canny-sdxl-1.0-mid](https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-mid)
29-
* [controlnet-depth-sdxl-1.0-small](https://huggingface.co/diffusers/controlnet-depth-sdxl-1.0-small)
30-
* [controlnet-depth-sdxl-1.0-mid](https://huggingface.co/diffusers/controlnet-depth-sdxl-1.0-mid)
27+
🧪 Many of the SDXL ControlNet checkpoints are experimental, and there is a lot of room for improvement. Feel free to open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) and leave us feedback on how we can improve!
3128

32-
We also encourage you to train custom ControlNets; we provide a [training script](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md) for this.
29+
</Tip>
3330

34-
You can find some results below:
31+
If you don't see a checkpoint you're interested in, you can train your own SDXL ControlNet with our [training script](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
3532

36-
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/sd_xl/sdxl_controlnet_canny_grid.png" width=600/>
33+
<Tip>
3734

38-
🚨 At the time of this writing, many of these SDXL ControlNet checkpoints are experimental and there is a lot of room for improvement. We encourage our users to provide feedback. 🚨
35+
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
3936

40-
## MultiControlNet
41-
42-
You can compose multiple ControlNet conditionings from different image inputs to create a *MultiControlNet*. To get better results, it is often helpful to:
43-
44-
1. mask conditionings such that they don't overlap (for example, mask the area of a canny image where the pose conditioning is located)
45-
2. experiment with the [`controlnet_conditioning_scale`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet#diffusers.StableDiffusionControlNetPipeline.__call__.controlnet_conditioning_scale) parameter to determine how much weight to assign to each conditioning input
46-
47-
In this example, you'll combine a canny image and a human pose estimation image to generate a new image.
48-
49-
Prepare the canny image conditioning:
50-
51-
```py
52-
from diffusers.utils import load_image
53-
from PIL import Image
54-
import numpy as np
55-
import cv2
56-
57-
canny_image = load_image(
58-
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/landscape.png"
59-
)
60-
canny_image = np.array(canny_image)
61-
62-
low_threshold = 100
63-
high_threshold = 200
64-
65-
canny_image = cv2.Canny(canny_image, low_threshold, high_threshold)
66-
67-
# zero out middle columns of image where pose will be overlayed
68-
zero_start = canny_image.shape[1] // 4
69-
zero_end = zero_start + canny_image.shape[1] // 2
70-
canny_image[:, zero_start:zero_end] = 0
71-
72-
canny_image = canny_image[:, :, None]
73-
canny_image = np.concatenate([canny_image, canny_image, canny_image], axis=2)
74-
canny_image = Image.fromarray(canny_image).resize((1024, 1024))
75-
```
76-
77-
<div class="flex gap-4">
78-
<div>
79-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/landscape.png"/>
80-
<figcaption class="mt-2 text-center text-sm text-gray-500">original image</figcaption>
81-
</div>
82-
<div>
83-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/controlnet/landscape_canny_masked.png"/>
84-
<figcaption class="mt-2 text-center text-sm text-gray-500">canny image</figcaption>
85-
</div>
86-
</div>
87-
88-
Prepare the human pose estimation conditioning:
89-
90-
```py
91-
from controlnet_aux import OpenposeDetector
92-
from diffusers.utils import load_image
93-
94-
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
95-
96-
openpose_image = load_image(
97-
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png"
98-
)
99-
openpose_image = openpose(openpose_image).resize((1024, 1024))
100-
```
101-
102-
<div class="flex gap-4">
103-
<div>
104-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/person.png"/>
105-
<figcaption class="mt-2 text-center text-sm text-gray-500">original image</figcaption>
106-
</div>
107-
<div>
108-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/controlnet/person_pose.png"/>
109-
<figcaption class="mt-2 text-center text-sm text-gray-500">human pose image</figcaption>
110-
</div>
111-
</div>
112-
113-
Load a list of ControlNet models that correspond to each conditioning, and pass them to the [`StableDiffusionXLControlNetPipeline`]. Use the faster [`UniPCMultistepScheduler`] and nable model offloading to reduce memory usage.
114-
115-
```py
116-
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL, UniPCMultistepScheduler
117-
import torch
118-
119-
controlnets = [
120-
ControlNetModel.from_pretrained(
121-
"thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16, use_safetensors=True
122-
),
123-
ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16, use_safetensors=True),
124-
]
125-
126-
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, use_safetensors=True)
127-
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
128-
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnets, vae=vae, torch_dtype=torch.float16, use_safetensors=True
129-
)
130-
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
131-
pipe.enable_model_cpu_offload()
132-
```
133-
134-
Now you can pass your prompt (an optional negative prompt if you're using one), canny image, and pose image to the pipeline:
135-
136-
```py
137-
prompt = "a giant standing in a fantasy landscape, best quality"
138-
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
139-
140-
generator = torch.manual_seed(1)
141-
142-
images = [openpose_image, canny_image]
143-
144-
images = pipe(
145-
prompt,
146-
image=images,
147-
num_inference_steps=25,
148-
generator=generator,
149-
negative_prompt=negative_prompt,
150-
num_images_per_prompt=3,
151-
controlnet_conditioning_scale=[1.0, 0.8],
152-
).images[0]
153-
```
154-
155-
<div class="flex justify-center">
156-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/multicontrolnet.png"/>
157-
</div>
37+
</Tip>
15838

15939
## StableDiffusionXLControlNetPipeline
16040
[[autodoc]] StableDiffusionXLControlNetPipeline
16141
- all
162-
- __call__
42+
- __call__
43+
44+
## StableDiffusionPipelineOutput
45+
46+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

0 commit comments

Comments
 (0)