Skip to content

Commit 7942bb8

Browse files
[Docs] Fix typos, improve, update at Using Diffusers' Task page (huggingface#5611)
* Fix typos, improve, update; kandinsky doesn't want fp16 due to deprecation; ogkalu and kohbanye don't have safetensor; add make_image_grid for better visualization * Update inpaint.md * Remove erronous Space * Update docs/source/en/using-diffusers/conditional_image_generation.md Co-authored-by: Steven Liu <[email protected]> * Update img2img.md * load_image() already converts to RGB * Update depth2img.md * Update img2img.md * Update inpaint.md --------- Co-authored-by: Steven Liu <[email protected]>
1 parent aab6de2 commit 7942bb8

File tree

5 files changed

+187
-162
lines changed

5 files changed

+187
-162
lines changed

docs/source/en/using-diffusers/conditional_image_generation.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ You can generate images from a prompt in 🤗 Diffusers in two steps:
3030

3131
```py
3232
from diffusers import AutoPipelineForText2Image
33+
import torch
3334

3435
pipeline = AutoPipelineForText2Image.from_pretrained(
3536
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
@@ -42,6 +43,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
4243
image = pipeline(
4344
"stained glass of darth vader, backlight, centered composition, masterpiece, photorealistic, 8k"
4445
).images[0]
46+
image
4547
```
4648

4749
<div class="flex justify-center">
@@ -65,6 +67,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
6567
).to("cuda")
6668
generator = torch.Generator("cuda").manual_seed(31)
6769
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
70+
image
6871
```
6972

7073
### Stable Diffusion XL
@@ -80,6 +83,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
8083
).to("cuda")
8184
generator = torch.Generator("cuda").manual_seed(31)
8285
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
86+
image
8387
```
8488

8589
### Kandinsky 2.2
@@ -93,15 +97,16 @@ from diffusers import AutoPipelineForText2Image
9397
import torch
9498

9599
pipeline = AutoPipelineForText2Image.from_pretrained(
96-
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16, variant="fp16"
100+
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
97101
).to("cuda")
98102
generator = torch.Generator("cuda").manual_seed(31)
99103
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
104+
image
100105
```
101106

102107
### ControlNet
103108

104-
ControlNet are auxiliary models or adapters that are finetuned on top of text-to-image models, such as [Stable Diffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5). Using ControlNet models in combination with text-to-image models offers diverse options for more explicit control over how to generate an image. With ControlNet's, you add an additional conditioning input image to the model. For example, if you provide an image of a human pose (usually represented as multiple keypoints that are connected into a skeleton) as a conditioning input, the model generates an image that follows the pose of the image. Check out the more in-depth [ControlNet](controlnet) guide to learn more about other conditioning inputs and how to use them.
109+
ControlNet models are auxiliary models or adapters that are finetuned on top of text-to-image models, such as [Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5). Using ControlNet models in combination with text-to-image models offers diverse options for more explicit control over how to generate an image. With ControlNet, you add an additional conditioning input image to the model. For example, if you provide an image of a human pose (usually represented as multiple keypoints that are connected into a skeleton) as a conditioning input, the model generates an image that follows the pose of the image. Check out the more in-depth [ControlNet](controlnet) guide to learn more about other conditioning inputs and how to use them.
105110

106111
In this example, let's condition the ControlNet with a human pose estimation image. Load the ControlNet model pretrained on human pose estimations:
107112

@@ -124,6 +129,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
124129
).to("cuda")
125130
generator = torch.Generator("cuda").manual_seed(31)
126131
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=pose_image, generator=generator).images[0]
132+
image
127133
```
128134

129135
<div class="flex flex-row gap-4">
@@ -163,6 +169,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
163169
image = pipeline(
164170
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", height=768, width=512
165171
).images[0]
172+
image
166173
```
167174

168175
<div class="flex justify-center">
@@ -171,7 +178,7 @@ image = pipeline(
171178

172179
<Tip warning={true}>
173180

174-
Other models may have different default image sizes depending on the image size's in the training dataset. For example, SDXL's default image size is 1024x1024 and using lower `height` and `width` values may result in lower quality images. Make sure you check the model's API reference first!
181+
Other models may have different default image sizes depending on the image sizes in the training dataset. For example, SDXL's default image size is 1024x1024 and using lower `height` and `width` values may result in lower quality images. Make sure you check the model's API reference first!
175182

176183
</Tip>
177184

@@ -189,6 +196,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
189196
image = pipeline(
190197
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", guidance_scale=3.5
191198
).images[0]
199+
image
192200
```
193201

194202
<div class="flex flex-row gap-4">
@@ -221,16 +229,17 @@ image = pipeline(
221229
prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
222230
negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy",
223231
).images[0]
232+
image
224233
```
225234

226235
<div class="flex flex-row gap-4">
227236
<div class="flex-1">
228237
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-1.png"/>
229-
<figcaption class="mt-2 text-center text-sm text-gray-500">negative prompt = "ugly, deformed, disfigured, poor details, bad anatomy"</figcaption>
238+
<figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"</figcaption>
230239
</div>
231240
<div class="flex-1">
232241
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-2.png"/>
233-
<figcaption class="mt-2 text-center text-sm text-gray-500">negative prompt = "astronaut"</figcaption>
242+
<figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "astronaut"</figcaption>
234243
</div>
235244
</div>
236245

@@ -252,6 +261,7 @@ image = pipeline(
252261
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
253262
generator=generator,
254263
).images[0]
264+
image
255265
```
256266

257267
## Control image generation
@@ -278,14 +288,14 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
278288
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
279289
).to("cuda")
280290
image = pipeline(
281-
prompt_emebds=prompt_embeds, # generated from Compel
291+
prompt_embeds=prompt_embeds, # generated from Compel
282292
negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
283293
).images[0]
284294
```
285295

286296
### ControlNet
287297

288-
As you saw in the [ControlNet](#controlnet) section, these models offer a more flexible and accurate way to generate images by incorporating an additional conditioning image input. Each ControlNet model is pretrained on a particular type of conditioning image to generate new images that resemble it. For example, if you take a ControlNet pretrained on depth maps, you can give the model a depth map as a conditioning input and it'll generate an image that preserves the spatial information in it. This is quicker and easier than specifying the depth information in a prompt. You can even combine multiple conditioning inputs with a [MultiControlNet](controlnet#multicontrolnet)!
298+
As you saw in the [ControlNet](#controlnet) section, these models offer a more flexible and accurate way to generate images by incorporating an additional conditioning image input. Each ControlNet model is pretrained on a particular type of conditioning image to generate new images that resemble it. For example, if you take a ControlNet model pretrained on depth maps, you can give the model a depth map as a conditioning input and it'll generate an image that preserves the spatial information in it. This is quicker and easier than specifying the depth information in a prompt. You can even combine multiple conditioning inputs with a [MultiControlNet](controlnet#multicontrolnet)!
289299

290300
There are many types of conditioning inputs you can use, and 🤗 Diffusers supports ControlNet for Stable Diffusion and SDXL models. Take a look at the more comprehensive [ControlNet](controlnet) guide to learn how you can use these models.
291301

@@ -300,7 +310,7 @@ from diffusers import AutoPipelineForText2Image
300310
import torch
301311

302312
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16").to("cuda")
303-
pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overheard", fullgraph=True)
313+
pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)
304314
```
305315

306-
For more tips on how to optimize your code to save memory and speed up inference, read the [Memory and speed](../optimization/fp16) and [Torch 2.0](../optimization/torch2.0) guides.
316+
For more tips on how to optimize your code to save memory and speed up inference, read the [Memory and speed](../optimization/fp16) and [Torch 2.0](../optimization/torch2.0) guides.

docs/source/en/using-diffusers/depth2img.md

Lines changed: 6 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,10 @@ Start by creating an instance of the [`StableDiffusionDepth2ImgPipeline`]:
2020

2121
```python
2222
import torch
23-
import requests
24-
from PIL import Image
25-
2623
from diffusers import StableDiffusionDepth2ImgPipeline
24+
from diffusers.utils import load_image, make_image_grid
2725

28-
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
26+
pipeline = StableDiffusionDepth2ImgPipeline.from_pretrained(
2927
"stabilityai/stable-diffusion-2-depth",
3028
torch_dtype=torch.float16,
3129
use_safetensors=True,
@@ -36,22 +34,13 @@ Now pass your prompt to the pipeline. You can also pass a `negative_prompt` to p
3634

3735
```python
3836
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
39-
init_image = Image.open(requests.get(url, stream=True).raw)
37+
init_image = load_image(url)
4038
prompt = "two tigers"
41-
n_prompt = "bad, deformed, ugly, bad anatomy"
42-
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0]
43-
image
39+
negative_prompt = "bad, deformed, ugly, bad anatomy"
40+
image = pipeline(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7).images[0]
41+
make_image_grid([init_image, image], rows=1, cols=2)
4442
```
4543

4644
| Input | Output |
4745
|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
4846
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/coco-cats.png" width="500"/> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/depth2img-tigers.png" width="500"/> |
49-
50-
Play around with the Spaces below and see if you notice a difference between generated images with and without a depth map!
51-
52-
<iframe
53-
src="https://radames-stable-diffusion-depth2img.hf.space"
54-
frameborder="0"
55-
width="850"
56-
height="500"
57-
></iframe>

0 commit comments

Comments
 (0)