Skip to content

Commit 8aac1f9

Browse files
apolinarioapolinarioanton-lpatrickvonplatenpcuenca
authored
v1-5 docs updates (huggingface#921)
* Update README.md Additionally add FLAX so the model card can be slimmer and point to this page * Find and replace all * v-1-5 -> v1-5 * revert test changes * Update README.md Co-authored-by: Patrick von Platen <[email protected]> * Update docs/source/quicktour.mdx Co-authored-by: Pedro Cuenca <[email protected]> * Update README.md Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/quicktour.mdx Co-authored-by: Pedro Cuenca <[email protected]> * Update README.md Co-authored-by: Suraj Patil <[email protected]> * Revert certain references to v1-5 * Docs changes * Apply suggestions from code review Co-authored-by: apolinario <[email protected]> Co-authored-by: anton-l <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Suraj Patil <[email protected]>
1 parent 2c82e0c commit 8aac1f9

25 files changed

+156
-78
lines changed

README.md

Lines changed: 96 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -64,44 +64,54 @@ In order to get started, we recommend taking a look at two notebooks:
6464
- The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffusion models training methods. This notebook takes a step-by-step approach to training your
6565
diffusion models on an image dataset, with explanatory graphics.
6666

67-
## **New** Stable Diffusion is now fully compatible with `diffusers`!
67+
## Stable Diffusion is fully compatible with `diffusers`!
6868

69-
Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
69+
Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/) and [RunwayML](https://runwayml.com/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 4GB VRAM.
7070
See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information.
7171

72-
You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation.
72+
You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license carefully and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation.
7373

7474

7575
### Text-to-Image generation with Stable Diffusion
7676

77+
First let's install
78+
```bash
79+
pip install --upgrade diffusers transformers scipy
80+
```
81+
82+
Run this command to log in with your HF Hub token if you haven't before (you can skip this step if you prefer to run the model locally, follow [this](#running-the-model-locally) instead)
83+
```bash
84+
huggingface-cli login
85+
```
86+
7787
We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full
7888
precision while being roughly twice as fast and requiring half the amount of GPU RAM.
7989

8090
```python
81-
# make sure you're logged in with `huggingface-cli login`
8291
from diffusers import StableDiffusionPipeline
8392

84-
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_type=torch.float16, revision="fp16")
93+
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_type=torch.float16, revision="fp16")
8594
pipe = pipe.to("cuda")
8695

8796
prompt = "a photo of an astronaut riding a horse on mars"
8897
image = pipe(prompt).images[0]
8998
```
9099

91-
**Note**: If you don't want to use the token, you can also simply download the model weights
92-
(after having [accepted the license](https://huggingface.co/CompVis/stable-diffusion-v1-4)) and pass
100+
#### Running the model locally
101+
If you don't want to login to Hugging Face, you can also simply download the model folder
102+
(after having [accepted the license](https://huggingface.co/runwayml/stable-diffusion-v1-5)) and pass
93103
the path to the local folder to the `StableDiffusionPipeline`.
94104

95105
```
96106
git lfs install
97-
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
107+
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
98108
```
99109

100-
Assuming the folder is stored locally under `./stable-diffusion-v1-4`, you can also run stable diffusion
110+
Assuming the folder is stored locally under `./stable-diffusion-v1-5`, you can also run stable diffusion
101111
without requiring an authentication token:
102112

103113
```python
104-
pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4")
114+
pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
105115
pipe = pipe.to("cuda")
106116

107117
prompt = "a photo of an astronaut riding a horse on mars"
@@ -114,7 +124,7 @@ The following snippet should result in less than 4GB VRAM.
114124

115125
```python
116126
pipe = StableDiffusionPipeline.from_pretrained(
117-
"CompVis/stable-diffusion-v1-4",
127+
"runwayml/stable-diffusion-v1-5",
118128
revision="fp16",
119129
torch_dtype=torch.float16,
120130
)
@@ -125,7 +135,7 @@ pipe.enable_attention_slicing()
125135
image = pipe(prompt).images[0]
126136
```
127137

128-
If you wish to use a different scheduler, you can simply instantiate
138+
If you wish to use a different scheduler (e.g.: DDIM, LMS, PNDM/PLMS), you can instantiate
129139
it before the pipeline and pass it to `from_pretrained`.
130140

131141
```python
@@ -138,7 +148,7 @@ lms = LMSDiscreteScheduler(
138148
)
139149

140150
pipe = StableDiffusionPipeline.from_pretrained(
141-
"CompVis/stable-diffusion-v1-4",
151+
"runwayml/stable-diffusion-v1-5",
142152
revision="fp16",
143153
torch_dtype=torch.float16,
144154
scheduler=lms,
@@ -158,7 +168,7 @@ please run the model in the default *full-precision* setting:
158168
# make sure you're logged in with `huggingface-cli login`
159169
from diffusers import StableDiffusionPipeline
160170

161-
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
171+
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
162172

163173
# disable the following line if you run on CPU
164174
pipe = pipe.to("cuda")
@@ -169,6 +179,75 @@ image = pipe(prompt).images[0]
169179
image.save("astronaut_rides_horse.png")
170180
```
171181

182+
### JAX/Flax
183+
184+
To use StableDiffusion on TPUs and GPUs for faster inference you can leverage JAX/Flax.
185+
186+
Running the pipeline with default PNDMScheduler
187+
188+
```python
189+
import jax
190+
import numpy as np
191+
from flax.jax_utils import replicate
192+
from flax.training.common_utils import shard
193+
194+
from diffusers import FlaxStableDiffusionPipeline
195+
196+
pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
197+
"runwayml/stable-diffusion-v1-5", revision="flax", dtype=jax.numpy.bfloat16
198+
)
199+
200+
prompt = "a photo of an astronaut riding a horse on mars"
201+
202+
prng_seed = jax.random.PRNGKey(0)
203+
num_inference_steps = 50
204+
205+
num_samples = jax.device_count()
206+
prompt = num_samples * [prompt]
207+
prompt_ids = pipeline.prepare_inputs(prompt)
208+
209+
# shard inputs and rng
210+
params = replicate(params)
211+
prng_seed = jax.random.split(prng_seed, jax.device_count())
212+
prompt_ids = shard(prompt_ids)
213+
214+
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
215+
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
216+
```
217+
218+
**Note**:
219+
If you are limited by TPU memory, please make sure to load the `FlaxStableDiffusionPipeline` in `bfloat16` precision instead of the default `float32` precision as done above. You can do so by telling diffusers to load the weights from "bf16" branch.
220+
221+
```python
222+
import jax
223+
import numpy as np
224+
from flax.jax_utils import replicate
225+
from flax.training.common_utils import shard
226+
227+
from diffusers import FlaxStableDiffusionPipeline
228+
229+
pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
230+
"runwayml/stable-diffusion-v1-5", revision="bf16", dtype=jax.numpy.bfloat16
231+
)
232+
233+
prompt = "a photo of an astronaut riding a horse on mars"
234+
235+
prng_seed = jax.random.PRNGKey(0)
236+
num_inference_steps = 50
237+
238+
num_samples = jax.device_count()
239+
prompt = num_samples * [prompt]
240+
prompt_ids = pipeline.prepare_inputs(prompt)
241+
242+
# shard inputs and rng
243+
params = replicate(params)
244+
prng_seed = jax.random.split(prng_seed, jax.device_count())
245+
prompt_ids = shard(prompt_ids)
246+
247+
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
248+
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
249+
```
250+
172251
### Image-to-Image text-guided generation with Stable Diffusion
173252

174253
The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.
@@ -183,14 +262,14 @@ from diffusers import StableDiffusionImg2ImgPipeline
183262

184263
# load the pipeline
185264
device = "cuda"
186-
model_id_or_path = "CompVis/stable-diffusion-v1-4"
265+
model_id_or_path = "runwayml/stable-diffusion-v1-5"
187266
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
188267
model_id_or_path,
189268
revision="fp16",
190269
torch_dtype=torch.float16,
191270
)
192-
# or download via git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
193-
# and pass `model_id_or_path="./stable-diffusion-v1-4"`.
271+
# or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
272+
# and pass `model_id_or_path="./stable-diffusion-v1-5"`.
194273
pipe = pipe.to(device)
195274

196275
# let's download an initial image

docs/source/api/pipelines/overview.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,8 @@ Diffusion models often consist of multiple independently-trained models or other
6767
Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one.
6868
During inference, we however want to be able to easily load all components and use them in inference - even if one component, *e.g.* CLIP's text encoder, originates from a different library, such as [Transformers](https://github.com/huggingface/transformers). To that end, all pipelines provide the following functionality:
6969

70-
- [`from_pretrained` method](../diffusion_pipeline) that accepts a Hugging Face Hub repository id, *e.g.* [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) or a path to a local directory, *e.g.*
71-
"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [CompVis/stable-diffusion-v1-4/model_index.json](https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/model_index.json), which defines all components that should be
70+
- [`from_pretrained` method](../diffusion_pipeline) that accepts a Hugging Face Hub repository id, *e.g.* [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) or a path to a local directory, *e.g.*
71+
"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [runwayml/stable-diffusion-v1-5/model_index.json](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), which defines all components that should be
7272
loaded into the pipelines. More specifically, for each model/component one needs to define the format `<name>: ["<library>", "<class name>"]`. `<name>` is the attribute name given to the loaded instance of `<class name>` which can be found in the library or pipeline folder called `"<library>"`.
7373
- [`save_pretrained`](../diffusion_pipeline) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`.
7474
In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated
@@ -100,7 +100,7 @@ logic including pre-processing, an unrolled diffusion loop, and post-processing
100100
# make sure you're logged in with `huggingface-cli login`
101101
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
102102

103-
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
103+
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
104104
pipe = pipe.to("cuda")
105105

106106
prompt = "a photo of an astronaut riding a horse on mars"
@@ -123,7 +123,7 @@ from diffusers import StableDiffusionImg2ImgPipeline
123123
# load the pipeline
124124
device = "cuda"
125125
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
126-
"CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16
126+
"runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16
127127
).to(device)
128128

129129
# let's download an initial image

docs/source/optimization/fp16.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ If you use a CUDA GPU, you can take advantage of `torch.autocast` to perform inf
5656
from torch import autocast
5757
from diffusers import StableDiffusionPipeline
5858

59-
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
59+
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
6060
pipe = pipe.to("cuda")
6161

6262
prompt = "a photo of an astronaut riding a horse on mars"
@@ -72,7 +72,7 @@ To save more GPU memory and get even more speed, you can load and run the model
7272

7373
```Python
7474
pipe = StableDiffusionPipeline.from_pretrained(
75-
"CompVis/stable-diffusion-v1-4",
75+
"runwayml/stable-diffusion-v1-5",
7676
revision="fp16",
7777
torch_dtype=torch.float16,
7878
)
@@ -97,7 +97,7 @@ import torch
9797
from diffusers import StableDiffusionPipeline
9898

9999
pipe = StableDiffusionPipeline.from_pretrained(
100-
"CompVis/stable-diffusion-v1-4",
100+
"runwayml/stable-diffusion-v1-5",
101101
revision="fp16",
102102
torch_dtype=torch.float16,
103103
)
@@ -152,7 +152,7 @@ def generate_inputs():
152152

153153

154154
pipe = StableDiffusionPipeline.from_pretrained(
155-
"CompVis/stable-diffusion-v1-4",
155+
"runwayml/stable-diffusion-v1-5",
156156
revision="fp16",
157157
torch_dtype=torch.float16,
158158
).to("cuda")
@@ -216,7 +216,7 @@ class UNet2DConditionOutput:
216216

217217

218218
pipe = StableDiffusionPipeline.from_pretrained(
219-
"CompVis/stable-diffusion-v1-4",
219+
"runwayml/stable-diffusion-v1-5",
220220
revision="fp16",
221221
torch_dtype=torch.float16,
222222
).to("cuda")

docs/source/optimization/mps.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ We recommend to "prime" the pipeline using an additional one-time pass through i
3131
# make sure you're logged in with `huggingface-cli login`
3232
from diffusers import StableDiffusionPipeline
3333

34-
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
34+
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
3535
pipe = pipe.to("mps")
3636

3737
prompt = "a photo of an astronaut riding a horse on mars"

docs/source/optimization/onnx.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ The snippet below demonstrates how to use the ONNX runtime. You need to use `Sta
2828
from diffusers import StableDiffusionOnnxPipeline
2929

3030
pipe = StableDiffusionOnnxPipeline.from_pretrained(
31-
"CompVis/stable-diffusion-v1-4",
31+
"runwayml/stable-diffusion-v1-5",
3232
revision="onnx",
3333
provider="CUDAExecutionProvider",
3434
)

docs/source/quicktour.mdx

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -68,22 +68,21 @@ You can save the image by simply calling:
6868

6969
More advanced models, like [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) require you to accept a [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) before running the model.
7070
This is due to the improved image generation capabilities of the model and the potentially harmful content that could be produced with it.
71-
Long story short: Head over to your stable diffusion model of choice, *e.g.* [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4), read through the license and click-accept to get
72-
access to the model.
71+
Please, head over to your stable diffusion model of choice, *e.g.* [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license carefully and tick the checkbox if you agree.
7372
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
7473
Having "click-accepted" the license, you can save your token:
7574

7675
```python
7776
AUTH_TOKEN = "<please-fill-with-your-token>"
7877
```
7978

80-
You can then load [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4)
79+
You can then load [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)
8180
just like we did before only that now you need to pass your `AUTH_TOKEN`:
8281

8382
```python
8483
>>> from diffusers import DiffusionPipeline
8584

86-
>>> generator = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=AUTH_TOKEN)
85+
>>> generator = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_auth_token=AUTH_TOKEN)
8786
```
8887

8988
If you do not pass your authentication token you will see that the diffusion system will not be correctly
@@ -95,15 +94,15 @@ the weights locally via:
9594

9695
```
9796
git lfs install
98-
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
97+
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
9998
```
10099

101100
and then load locally saved weights into the pipeline. This way, you do not need to pass an authentication
102-
token. Assuming that `"./stable-diffusion-v1-4"` is the local path to the cloned stable-diffusion-v1-4 repo,
101+
token. Assuming that `"./stable-diffusion-v1-5"` is the local path to the cloned stable-diffusion-v1-5 repo,
103102
you can also load the pipeline as follows:
104103

105104
```python
106-
>>> generator = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-4")
105+
>>> generator = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
107106
```
108107

109108
Running the pipeline is then identical to the code above as it's the same model architecture.
@@ -125,7 +124,7 @@ you could use it as follows:
125124
>>> scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear")
126125

127126
>>> generator = StableDiffusionPipeline.from_pretrained(
128-
... "CompVis/stable-diffusion-v1-4", scheduler=scheduler, use_auth_token=AUTH_TOKEN
127+
... "runwayml/stable-diffusion-v1-5", scheduler=scheduler, use_auth_token=AUTH_TOKEN
129128
... )
130129
```
131130

docs/source/training/text_inversion.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ accelerate config
6464

6565
### Cat toy example
6666

67-
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
67+
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
6868

6969
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
7070

@@ -83,7 +83,7 @@ Now let's get our dataset.Download 3-4 images from [here](https://drive.google.c
8383
And launch the training using
8484

8585
```bash
86-
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
86+
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
8787
export DATA_DIR="path-to-dir-containing-images"
8888

8989
accelerate launch textual_inversion.py \

docs/source/using-diffusers/custom_pipelines.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ feature_extractor = CLIPFeatureExtractor.from_pretrained(clip_model_id)
5858
clip_model = CLIPModel.from_pretrained(clip_model_id)
5959

6060
pipeline = DiffusionPipeline.from_pretrained(
61-
"CompVis/stable-diffusion-v1-4",
61+
"runwayml/stable-diffusion-v1-5",
6262
custom_pipeline="clip_guided_stable_diffusion",
6363
clip_model=clip_model,
6464
feature_extractor=feature_extractor,

0 commit comments

Comments
 (0)