Docs: recommend xformers (huggingface#1724)

pcuenca · web-flow · commit acd317810bc1 · 2022-12-16T15:49:01.000+01:00
* Fix links to flash attention.

* Add xformers installation instructions.

* Make link to xformers install more prominent.

* Link to xformers install from training docs.
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -45,6 +45,8 @@
 - sections:
   - local: optimization/fp16
     title: "Memory and Speed"
+  - local: optimization/xformers
+    title: "xFormers"
   - local: optimization/onnx
     title: "ONNX"
   - local: optimization/open_vino
diff --git a/docs/source/optimization/fp16.mdx b/docs/source/optimization/fp16.mdx
@@ -12,7 +12,9 @@ specific language governing permissions and limitations under the License.
 
 # Memory and speed
 
-We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed.
+We present some techniques and ideas to optimize 🤗 Diffusers _inference_ for memory or speed. As a general rule, we recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for memory efficient attention, please see the recommended [installation instructions](xformers).
+
+We'll discuss how the following settings impact performance and memory.
 
 |                  | Latency | Speedup |
 | ---------------- | ------- | ------- |
@@ -322,7 +324,9 @@ with torch.inference_mode():
 
 
 ## Memory Efficient Attention
-Recent work on optimizing the bandwitdh in the attention block have generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention (from @tridao, [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf)) .
+
+Recent work on optimizing the bandwitdh in the attention block has generated huge speed ups and gains in GPU memory usage. The most recent being Flash Attention from @tridao: [code](https://github.com/HazyResearch/flash-attention), [paper](https://arxiv.org/pdf/2205.14135.pdf).
+
 Here are the speedups we obtain on a few Nvidia GPUs when running the inference at 512x512 with a batch size of 1 (one prompt):
 
 | GPU              	| Base Attention FP16 	| Memory Efficient Attention FP16 	|
@@ -338,7 +342,7 @@ Here are the speedups we obtain on a few Nvidia GPUs when running the inference
 To leverage it just make sure you have: 
  - PyTorch > 1.12
  - Cuda available
- - Installed the [xformers](https://github.com/facebookresearch/xformers) library
+ - [Installed the xformers library](xformers).
 ```python
 from diffusers import StableDiffusionPipeline
 import torch
diff --git a/docs/source/optimization/xformers.mdx b/docs/source/optimization/xformers.mdx
@@ -0,0 +1,26 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Installing xFormers
+
+We recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for both inference and training. In our tests, the optimizations performed in the attention blocks allow for both faster speed and reduced memory consumption.
+
+Installing xFormers has historically been a bit involved, as binary distributions were not always up to date. Fortunately, the project has [very recently](https://github.com/facebookresearch/xformers/pull/591) integrated a process to build pip wheels as part of the project's continuous integration, so this should improve a lot starting from xFormers version 0.0.16.
+
+Until xFormers 0.0.16 is deployed, you can install pip wheels using [`TestPyPI`](https://test.pypi.org/project/formers/). These are the steps that worked for us in a Linux computer to install xFormers version 0.0.15:
+
+```bash
+pip install pyre-extensions==0.0.23
+pip install -i https://test.pypi.org/simple/ formers==0.0.15.dev376
+```
+
+We'll update these instructions when the wheels are published to the official PyPI repository.
diff --git a/docs/source/training/dreambooth.mdx b/docs/source/training/dreambooth.mdx
@@ -36,7 +36,9 @@ pip install git+https://github.com/huggingface/diffusers
 pip install -U -r diffusers/examples/dreambooth/requirements.txt
 ```
 
-Then initialize and configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
+xFormers is not part of the training requirements, but [we recommend you install it if you can](../optimization/xformers). It could make your training faster and less memory intensive.
+
+After all dependencies have been set up you can configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
 
 ```bash
 accelerate config
diff --git a/docs/source/training/overview.mdx b/docs/source/training/overview.mdx
@@ -38,6 +38,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
 - [Text Inversion](./text_inversion)
 - [Dreambooth](./dreambooth)
 
+If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive.
 
 | Task | 🤗 Accelerate | 🤗 Datasets | Colab
 |---|---|:---:|:---:|