Skip to content

Commit acd3178

Browse files
authored
Docs: recommend xformers (huggingface#1724)
* Fix links to flash attention. * Add xformers installation instructions. * Make link to xformers install more prominent. * Link to xformers install from training docs.
1 parent c6d0dff commit acd3178

File tree

5 files changed

+39
-4
lines changed

5 files changed

+39
-4
lines changed

docs/source/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@
4545
- sections:
4646
- local: optimization/fp16
4747
title: "Memory and Speed"
48+
- local: optimization/xformers
49+
title: "xFormers"
4850
- local: optimization/onnx
4951
title: "ONNX"
5052
- local: optimization/open_vino

docs/source/optimization/fp16.mdx

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ specific language governing permissions and limitations under the License.
1212

1313
# Memory and speed
1414

15-
We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed.
15+
We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed. As a general rule, we recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for memory efficient attention, please see the recommended [installation instructions](xformers).
16+
17+
We'll discuss how the following settings impact performance and memory.
1618

1719
| | Latency | Speedup |
1820
| ---------------- | ------- | ------- |
@@ -322,7 +324,9 @@ with torch.inference_mode():
322324

323325

324326
## Memory Efficient Attention
325-
Recent work on optimizing the bandwitdh in the attention block have generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention (from @tridao, [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf)) .
327+
328+
Recent work on optimizing the bandwitdh in the attention block has generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention from @tridao: [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf).
329+
326330
Here are the speedups we obtain on a few Nvidia GPUs when running the inference at 512x512 with a batch size of 1 (one prompt):
327331

328332
| GPU | Base Attention FP16 | Memory Efficient Attention FP16 |
@@ -338,7 +342,7 @@ Here are the speedups we obtain on a few Nvidia GPUs when running the inference
338342
To leverage it just make sure you have:
339343
- PyTorch > 1.12
340344
- Cuda available
341-
- Installed the [xformers](https://github.com/facebookresearch/xformers) library
345+
- [Installed the xformers library](xformers).
342346
```python
343347
from diffusers import StableDiffusionPipeline
344348
import torch
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Installing xFormers
14+
15+
We recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for both inference and training. In our tests, the optimizations performed in the attention blocks allow for both faster speed and reduced memory consumption.
16+
17+
Installing xFormers has historically been a bit involved, as binary distributions were not always up to date. Fortunately, the project has [very recently](https://github.com/facebookresearch/xformers/pull/591) integrated a process to build pip wheels as part of the project's continuous integration, so this should improve a lot starting from xFormers version 0.0.16.
18+
19+
Until xFormers 0.0.16 is deployed, you can install pip wheels using [`TestPyPI`](https://test.pypi.org/project/formers/). These are the steps that worked for us in a Linux computer to install xFormers version 0.0.15:
20+
21+
```bash
22+
pip install pyre-extensions==0.0.23
23+
pip install -i https://test.pypi.org/simple/ formers==0.0.15.dev376
24+
```
25+
26+
We'll update these instructions when the wheels are published to the official PyPI repository.

docs/source/training/dreambooth.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,9 @@ pip install git+https://github.com/huggingface/diffusers
3636
pip install -U -r diffusers/examples/dreambooth/requirements.txt
3737
```
3838

39-
Then initialize and configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
39+
xFormers is not part of the training requirements, but [we recommend you install it if you can](../optimization/xformers). It could make your training faster and less memory intensive.
40+
41+
After all dependencies have been set up you can configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
4042

4143
```bash
4244
accelerate config

docs/source/training/overview.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
3838
- [Text Inversion](./text_inversion)
3939
- [Dreambooth](./dreambooth)
4040

41+
If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive.
4142

4243
| Task | 🤗 Accelerate | 🤗 Datasets | Colab
4344
|---|---|:---:|:---:|

0 commit comments

Comments
 (0)