Skip to content

Commit 013edb6

Browse files
Update main docs (huggingface#1706)
* Remove bogus file * [Docs] Remove mentioning of gated access since no longer exsits * add docs to index * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
1 parent 86ac3ea commit 013edb6

File tree

5 files changed

+66
-67
lines changed

5 files changed

+66
-67
lines changed

README.md

Lines changed: 11 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -79,19 +79,13 @@ In order to get started, we recommend taking a look at two notebooks:
7979
Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/) and [RunwayML](https://runwayml.com/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 4GB VRAM.
8080
See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information.
8181

82-
You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license carefully and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation.
83-
8482

8583
### Text-to-Image generation with Stable Diffusion
8684

8785
First let's install
88-
```bash
89-
pip install --upgrade diffusers transformers scipy
90-
```
9186

92-
Run this command to log in with your HF Hub token if you haven't before (you can skip this step if you prefer to run the model locally, follow [this](#running-the-model-locally) instead)
9387
```bash
94-
huggingface-cli login
88+
pip install --upgrade diffusers transformers accelerate
9589
```
9690

9791
We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full
@@ -101,25 +95,24 @@ precision while being roughly twice as fast and requiring half the amount of GPU
10195
import torch
10296
from diffusers import StableDiffusionPipeline
10397

104-
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, revision="fp16")
98+
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
10599
pipe = pipe.to("cuda")
106100

107101
prompt = "a photo of an astronaut riding a horse on mars"
108102
image = pipe(prompt).images[0]
109103
```
110104

111105
#### Running the model locally
112-
If you don't want to login to Hugging Face, you can also simply download the model folder
113-
(after having [accepted the license](https://huggingface.co/runwayml/stable-diffusion-v1-5)) and pass
114-
the path to the local folder to the `StableDiffusionPipeline`.
106+
107+
You can also simply download the model folder and pass the path to the local folder to the `StableDiffusionPipeline`.
115108

116109
```
117110
git lfs install
118111
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
119112
```
120113

121-
Assuming the folder is stored locally under `./stable-diffusion-v1-5`, you can also run stable diffusion
122-
without requiring an authentication token:
114+
Assuming the folder is stored locally under `./stable-diffusion-v1-5`, you can run stable diffusion
115+
as follows:
123116

124117
```python
125118
pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
@@ -134,11 +127,7 @@ to using `fp16`.
134127
The following snippet should result in less than 4GB VRAM.
135128

136129
```python
137-
pipe = StableDiffusionPipeline.from_pretrained(
138-
"runwayml/stable-diffusion-v1-5",
139-
revision="fp16",
140-
torch_dtype=torch.float16,
141-
)
130+
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
142131
pipe = pipe.to("cuda")
143132

144133
prompt = "a photo of an astronaut riding a horse on mars"
@@ -164,7 +153,6 @@ If you want to run Stable Diffusion on CPU or you want to have maximum precision
164153
please run the model in the default *full-precision* setting:
165154

166155
```python
167-
# make sure you're logged in with `huggingface-cli login`
168156
from diffusers import StableDiffusionPipeline
169157

170158
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
@@ -262,11 +250,8 @@ from diffusers import StableDiffusionImg2ImgPipeline
262250
# load the pipeline
263251
device = "cuda"
264252
model_id_or_path = "runwayml/stable-diffusion-v1-5"
265-
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
266-
model_id_or_path,
267-
revision="fp16",
268-
torch_dtype=torch.float16,
269-
)
253+
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
254+
270255
# or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
271256
# and pass `model_id_or_path="./stable-diffusion-v1-5"`.
272257
pipe = pipe.to(device)
@@ -288,10 +273,7 @@ You can also run this example on colab [![Open In Colab](https://colab.research.
288273

289274
### In-painting using Stable Diffusion
290275

291-
The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and a text prompt. It uses a model optimized for this particular task, whose license you need to accept before use.
292-
293-
Please, visit the [model card](https://huggingface.co/runwayml/stable-diffusion-inpainting), read the license carefully and tick the checkbox if you agree. Note that this is an additional license, you need to accept it even if you accepted the text-to-image Stable Diffusion license in the past. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation.
294-
276+
The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and a text prompt.
295277

296278
```python
297279
import PIL
@@ -311,11 +293,7 @@ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data
311293
init_image = download_image(img_url).resize((512, 512))
312294
mask_image = download_image(mask_url).resize((512, 512))
313295

314-
pipe = StableDiffusionInpaintPipeline.from_pretrained(
315-
"runwayml/stable-diffusion-inpainting",
316-
revision="fp16",
317-
torch_dtype=torch.float16,
318-
)
296+
pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16)
319297
pipe = pipe.to("cuda")
320298

321299
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"

docs/source/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@
2626
title: "Text-Guided Image-to-Image"
2727
- local: using-diffusers/inpaint
2828
title: "Text-Guided Image-Inpainting"
29+
- local: using-diffusers/depth2img
30+
title: "Text-Guided Depth-to-Image"
2931
- local: using-diffusers/custom_pipeline_examples
3032
title: "Community Pipelines"
3133
- local: using-diffusers/contribute_pipeline

docs/source/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ specific language governing permissions and limitations under the License.
1818

1919
# 🧨 Diffusers
2020

21-
🤗 Diffusers provides pretrained vision diffusion models, and serves as a modular toolbox for inference and training.
21+
🤗 Diffusers provides pretrained vision and audio diffusion models, and serves as a modular toolbox for inference and training.
2222

2323
More precisely, 🤗 Diffusers offers:
2424

docs/source/quicktour.mdx

Lines changed: 17 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,12 @@ Whether you're a developer or an everyday user, this quick tour will help you ge
1818
Before you begin, make sure you have all the necessary libraries installed:
1919

2020
```bash
21-
pip install --upgrade diffusers
21+
pip install --upgrade diffusers accelerate transformers
2222
```
2323

24+
- [`accelerate`](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training
25+
- [`transformers`](https://huggingface.co/docs/transformers/index) is required to run the most popular diffusion models, such as [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion)
26+
2427
## DiffusionPipeline
2528

2629
The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference. You can use the [`DiffusionPipeline`] out-of-the-box for many tasks across different modalities. Take a look at the table below for some supported tasks:
@@ -29,19 +32,26 @@ The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion syst
2932
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|
3033
| Unconditional Image Generation | generate an image from gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation`) |
3134
| Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) |
32-
| Text-Guided Image-to-Image Translation | generate an image given an original image and a text prompt | [img2img](./using-diffusers/img2img) |
35+
| Text-Guided Image-to-Image Translation | adapt an image guided by a text prompt | [img2img](./using-diffusers/img2img) |
3336
| Text-Guided Image-Inpainting | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) |
37+
| Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2image](./using-diffusers/depth2image) |
3438

3539
For more in-detail information on how diffusion pipelines function for the different tasks, please have a look at the [**Using Diffusers**](./using-diffusers/overview) section.
3640

3741
As an example, start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download.
3842
You can use the [`DiffusionPipeline`] for any [Diffusers' checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads).
39-
In this guide though, you'll use [`DiffusionPipeline`] for text-to-image generation with [Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256):
43+
In this guide though, you'll use [`DiffusionPipeline`] for text-to-image generation with [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion).
44+
45+
For [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion), please carefully read its [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) before running the model.
46+
This is due to the improved image generation capabilities of the model and the potentially harmful content that could be produced with it.
47+
Please, head over to your stable diffusion model of choice, *e.g.* [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5), and read the license.
48+
49+
You can load the model as follows:
4050

4151
```python
4252
>>> from diffusers import DiffusionPipeline
4353

44-
>>> pipeline = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")
54+
>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
4555
```
4656
4757
The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components.
@@ -66,40 +76,14 @@ You can save the image by simply calling:
6676
>>> image.save("image_of_squirrel_painting.png")
6777
```
6878

69-
More advanced models, like [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) require you to accept a [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) before running the model.
70-
This is due to the improved image generation capabilities of the model and the potentially harmful content that could be produced with it.
71-
Please, head over to your stable diffusion model of choice, *e.g.* [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license carefully and tick the checkbox if you agree.
72-
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
73-
Having "click-accepted" the license, you can save your token:
74-
75-
```python
76-
AUTH_TOKEN = "<please-fill-with-your-token>"
77-
```
78-
79-
You can then load [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)
80-
just like we did before only that now you need to pass your `AUTH_TOKEN`:
81-
82-
```python
83-
>>> from diffusers import DiffusionPipeline
84-
85-
>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_auth_token=AUTH_TOKEN)
86-
```
87-
88-
If you do not pass your authentication token you will see that the diffusion system will not be correctly
89-
downloaded. Forcing the user to pass an authentication token ensures that it can be verified that the
90-
user has indeed read and accepted the license, which also means that an internet connection is required.
91-
92-
**Note**: If you do not want to be forced to pass an authentication token, you can also simply download
93-
the weights locally via:
79+
**Note**: You can also use the pipeline locally by downloading the weights via:
9480

9581
```
9682
git lfs install
9783
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
9884
```
9985

100-
and then load locally saved weights into the pipeline. This way, you do not need to pass an authentication
101-
token. Assuming that `"./stable-diffusion-v1-5"` is the local path to the cloned stable-diffusion-v1-5 repo,
102-
you can also load the pipeline as follows:
86+
and then loading the saved weights into the pipeline.
10387

10488
```python
10589
>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
@@ -121,7 +105,7 @@ you could use it as follows:
121105
```python
122106
>>> from diffusers import EulerDiscreteScheduler
123107

124-
>>> pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_auth_token=AUTH_TOKEN)
108+
>>> pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
125109

126110
>>> # change scheduler to Euler
127111
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Text-Guided Image-to-Image Generation
14+
15+
The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images as well as a `depth_map` to preserve the images' structure. If no `depth_map` is provided, the pipeline will automatically predict the depth via an integrated depth-estimation model.
16+
17+
```python
18+
import torch
19+
import requests
20+
from PIL import Image
21+
22+
from diffusers import StableDiffusionDepth2ImgPipeline
23+
24+
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
25+
"stabilityai/stable-diffusion-2-depth",
26+
torch_dtype=torch.float16,
27+
).to("cuda")
28+
29+
30+
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
31+
init_image = Image.open(requests.get(url, stream=True).raw)
32+
prompt = "two tigers"
33+
n_prompt = "bad, deformed, ugly, bad anatomy"
34+
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0]
35+
```

0 commit comments

Comments
 (0)