Skip to content

Commit 3ce905c

Browse files
authored
[docs] Merge LoRAs (huggingface#7213)
* merge loras * feedback * torch.compile * feedback
1 parent f539497 commit 3ce905c

File tree

4 files changed

+301
-204
lines changed

4 files changed

+301
-204
lines changed

docs/source/en/_toctree.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
- local: tutorials/basic_training
1919
title: Train a diffusion model
2020
- local: tutorials/using_peft_for_inference
21-
title: Inference with PEFT
21+
title: Load LoRAs for inference
2222
- local: tutorials/fast_diffusion
2323
title: Accelerate inference of text-to-image diffusion models
2424
title: Tutorials
@@ -62,6 +62,8 @@
6262
title: Textual inversion
6363
- local: using-diffusers/ip_adapter
6464
title: IP-Adapter
65+
- local: using-diffusers/merge_loras
66+
title: Merge LoRAs
6567
- local: training/distributed_inference
6668
title: Distributed inference with multiple GPUs
6769
- local: using-diffusers/reusing_seeds

docs/source/en/tutorials/using_peft_for_inference.md

Lines changed: 21 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,17 @@ specific language governing permissions and limitations under the License.
1414

1515
# Load LoRAs for inference
1616

17-
There are many adapters (with LoRAs being the most common type) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images. With the 🤗 [PEFT](https://huggingface.co/docs/peft/index) integration in 🤗 Diffusers, it is really easy to load and manage adapters for inference. In this guide, you'll learn how to use different adapters with [Stable Diffusion XL (SDXL)](../api/pipelines/stable_diffusion/stable_diffusion_xl) for inference.
17+
There are many adapter types (with [LoRAs](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) being the most popular) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images.
1818

19-
Throughout this guide, you'll use LoRA as the main adapter technique, so we'll use the terms LoRA and adapter interchangeably. You should have some familiarity with LoRA, and if you don't, we welcome you to check out the [LoRA guide](https://huggingface.co/docs/peft/conceptual_guides/lora).
19+
In this tutorial, you'll learn how to easily load and manage adapters for inference with the 🤗 [PEFT](https://huggingface.co/docs/peft/index) integration in 🤗 Diffusers. You'll use LoRA as the main adapter technique, so you'll see the terms LoRA and adapter used interchangeably.
2020

2121
Let's first install all the required libraries.
2222

2323
```bash
24-
!pip install -q transformers accelerate
25-
!pip install peft
26-
!pip install diffusers
24+
!pip install -q transformers accelerate peft diffusers
2725
```
2826

29-
Now, let's load a pipeline with a SDXL checkpoint:
27+
Now, load a pipeline with a [Stable Diffusion XL (SDXL)](../api/pipelines/stable_diffusion/stable_diffusion_xl) checkpoint:
3028

3129
```python
3230
from diffusers import DiffusionPipeline
@@ -36,16 +34,13 @@ pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
3634
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")
3735
```
3836

39-
40-
Next, load a LoRA checkpoint with the [`~diffusers.loaders.StableDiffusionXLLoraLoaderMixin.load_lora_weights`] method.
41-
42-
With the 🤗 PEFT integration, you can assign a specific `adapter_name` to the checkpoint, which let's you easily switch between different LoRA checkpoints. Let's call this adapter `"toy"`.
37+
Next, load a [CiroN2022/toy-face](https://huggingface.co/CiroN2022/toy-face) adapter with the [`~diffusers.loaders.StableDiffusionXLLoraLoaderMixin.load_lora_weights`] method. With the 🤗 PEFT integration, you can assign a specific `adapter_name` to the checkpoint, which let's you easily switch between different LoRA checkpoints. Let's call this adapter `"toy"`.
4338

4439
```python
4540
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
4641
```
4742

48-
And then perform inference:
43+
Make sure to include the token `toy_face` in the prompt and then you can perform inference:
4944

5045
```python
5146
prompt = "toy_face of a hacker with a hoodie"
@@ -59,17 +54,16 @@ image
5954

6055
![toy-face](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_8_1.png)
6156

57+
With the `adapter_name` parameter, it is really easy to use another adapter for inference! Load the [nerijs/pixel-art-xl](https://huggingface.co/nerijs/pixel-art-xl) adapter that has been fine-tuned to generate pixel art images and call it `"pixel"`.
6258

63-
With the `adapter_name` parameter, it is really easy to use another adapter for inference! Load the [nerijs/pixel-art-xl](https://huggingface.co/nerijs/pixel-art-xl) adapter that has been fine-tuned to generate pixel art images, and let's call it `"pixel"`.
64-
65-
The pipeline automatically sets the first loaded adapter (`"toy"`) as the active adapter. But you can activate the `"pixel"` adapter with the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method as shown below:
59+
The pipeline automatically sets the first loaded adapter (`"toy"`) as the active adapter, but you can activate the `"pixel"` adapter with the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method:
6660

6761
```python
6862
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")
6963
pipe.set_adapters("pixel")
7064
```
7165

72-
Let's now generate an image with the second adapter and check the result:
66+
Make sure you include the token `pixel art` in your prompt to generate a pixel art image:
7367

7468
```python
7569
prompt = "a hacker with a hoodie, pixel art"
@@ -81,29 +75,25 @@ image
8175

8276
![pixel-art](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_12_1.png)
8377

84-
## Combine multiple adapters
78+
## Merge adapters
8579

86-
You can also perform multi-adapter inference where you combine different adapter checkpoints for inference.
80+
You can also merge different adapter checkpoints for inference to blend their styles together.
8781

88-
Once again, use the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method to activate two LoRA checkpoints and specify the weight for how the checkpoints should be combined.
82+
Once again, use the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method to activate the `pixel` and `toy` adapters and specify the weights for how they should be merged.
8983

9084
```python
9185
pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0])
9286
```
9387

94-
Now that we have set these two adapters, let's generate an image from the combined adapters!
95-
9688
<Tip>
9789

9890
LoRA checkpoints in the diffusion community are almost always obtained with [DreamBooth](https://huggingface.co/docs/diffusers/main/en/training/dreambooth). DreamBooth training often relies on "trigger" words in the input text prompts in order for the generation results to look as expected. When you combine multiple LoRA checkpoints, it's important to ensure the trigger words for the corresponding LoRA checkpoints are present in the input text prompts.
9991

10092
</Tip>
10193

102-
The trigger words for [CiroN2022/toy-face](https://hf.co/CiroN2022/toy-face) and [nerijs/pixel-art-xl](https://hf.co/nerijs/pixel-art-xl) are found in their repositories.
103-
94+
Remember to use the trigger words for [CiroN2022/toy-face](https://hf.co/CiroN2022/toy-face) and [nerijs/pixel-art-xl](https://hf.co/nerijs/pixel-art-xl) (these are found in their repositories) in the prompt to generate an image.
10495

10596
```python
106-
# Notice how the prompt is constructed.
10797
prompt = "toy_face of a hacker with a hoodie, pixel art"
10898
image = pipe(
10999
prompt, num_inference_steps=30, cross_attention_kwargs={"scale": 1.0}, generator=torch.manual_seed(0)
@@ -113,15 +103,16 @@ image
113103

114104
![toy-face-pixel-art](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_16_1.png)
115105

116-
Impressive! As you can see, the model was able to generate an image that mixes the characteristics of both adapters.
106+
Impressive! As you can see, the model generated an image that mixed the characteristics of both adapters.
107+
108+
> [!TIP]
109+
> Through its PEFT integration, Diffusers also offers more efficient merging methods which you can learn about in the [Merge LoRAs](../using-diffusers/merge_loras) guide!
117110
118-
If you want to go back to using only one adapter, use the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method to activate the `"toy"` adapter:
111+
To return to only using one adapter, use the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method to activate the `"toy"` adapter:
119112

120113
```python
121-
# First, set the adapter.
122114
pipe.set_adapters("toy")
123115

124-
# Then, run inference.
125116
prompt = "toy_face of a hacker with a hoodie"
126117
lora_scale= 0.9
127118
image = pipe(
@@ -130,11 +121,7 @@ image = pipe(
130121
image
131122
```
132123

133-
![toy-face-again](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_18_1.png)
134-
135-
136-
If you want to switch to only the base model, disable all LoRAs with the [`~diffusers.loaders.UNet2DConditionLoadersMixin.disable_lora`] method.
137-
124+
Or to disable all adapters entirely, use the [`~diffusers.loaders.UNet2DConditionLoadersMixin.disable_lora`] method to return the base model.
138125

139126
```python
140127
pipe.disable_lora()
@@ -145,11 +132,9 @@ image = pipe(prompt, num_inference_steps=30, generator=torch.manual_seed(0)).ima
145132
image
146133
```
147134

148-
![no-lora](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_20_1.png)
135+
## Manage active adapters
149136

150-
## Monitoring active adapters
151-
152-
You have attached multiple adapters in this tutorial, and if you're feeling a bit lost on what adapters have been attached to the pipeline's components, you can easily check the list of active adapters using the [`~diffusers.loaders.LoraLoaderMixin.get_active_adapters`] method:
137+
You have attached multiple adapters in this tutorial, and if you're feeling a bit lost on what adapters have been attached to the pipeline's components, use the [`~diffusers.loaders.LoraLoaderMixin.get_active_adapters`] method to check the list of active adapters:
153138

154139
```py
155140
active_adapters = pipe.get_active_adapters()
@@ -164,78 +149,3 @@ list_adapters_component_wise = pipe.get_list_adapters()
164149
list_adapters_component_wise
165150
{"text_encoder": ["toy", "pixel"], "unet": ["toy", "pixel"], "text_encoder_2": ["toy", "pixel"]}
166151
```
167-
168-
## Compatibility with `torch.compile`
169-
170-
If you want to compile your model with `torch.compile` make sure to first fuse the LoRA weights into the base model and unload them.
171-
172-
```diff
173-
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")
174-
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
175-
176-
pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0])
177-
# Fuses the LoRAs into the Unet
178-
pipe.fuse_lora()
179-
pipe.unload_lora_weights()
180-
181-
+ pipe.unet.to(memory_format=torch.channels_last)
182-
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
183-
184-
prompt = "toy_face of a hacker with a hoodie, pixel art"
185-
image = pipe(prompt, num_inference_steps=30, generator=torch.manual_seed(0)).images[0]
186-
```
187-
188-
> [!TIP]
189-
> You can refer to the `torch.compile()` section [here](https://huggingface.co/docs/diffusers/main/en/optimization/torch2.0#torchcompile) and [here](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) for more elaborate examples.
190-
191-
## Fusing adapters into the model
192-
193-
You can use PEFT to easily fuse/unfuse multiple adapters directly into the model weights (both UNet and text encoder) using the [`~diffusers.loaders.LoraLoaderMixin.fuse_lora`] method, which can lead to a speed-up in inference and lower VRAM usage.
194-
195-
```py
196-
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")
197-
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
198-
199-
pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0])
200-
# Fuses the LoRAs into the Unet
201-
pipe.fuse_lora()
202-
203-
prompt = "toy_face of a hacker with a hoodie, pixel art"
204-
image = pipe(prompt, num_inference_steps=30, generator=torch.manual_seed(0)).images[0]
205-
206-
# Gets the Unet back to the original state
207-
pipe.unfuse_lora()
208-
```
209-
210-
You can also fuse some adapters using `adapter_names` for faster generation:
211-
212-
```py
213-
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")
214-
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
215-
216-
pipe.set_adapters(["pixel"], adapter_weights=[0.5, 1.0])
217-
# Fuses the LoRAs into the Unet
218-
pipe.fuse_lora(adapter_names=["pixel"])
219-
220-
prompt = "a hacker with a hoodie, pixel art"
221-
image = pipe(prompt, num_inference_steps=30, generator=torch.manual_seed(0)).images[0]
222-
223-
# Gets the Unet back to the original state
224-
pipe.unfuse_lora()
225-
226-
# Fuse all adapters
227-
pipe.fuse_lora(adapter_names=["pixel", "toy"])
228-
229-
prompt = "toy_face of a hacker with a hoodie, pixel art"
230-
image = pipe(prompt, num_inference_steps=30, generator=torch.manual_seed(0)).images[0]
231-
```
232-
233-
## Saving a pipeline after fusing the adapters
234-
235-
To properly save a pipeline after it's been loaded with the adapters, it should be serialized like so:
236-
237-
```python
238-
pipe.fuse_lora(lora_scale=1.0)
239-
pipe.unload_lora_weights()
240-
pipe.save_pretrained("path-to-pipeline")
241-
```

docs/source/en/using-diffusers/loading_adapters.md

Lines changed: 11 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ image
103103

104104
<Tip>
105105

106-
LoRA is a very general training technique that can be used with other training methods. For example, it is common to train a model with DreamBooth and LoRA.
106+
LoRA is a very general training technique that can be used with other training methods. For example, it is common to train a model with DreamBooth and LoRA. It is also increasingly common to load and merge multiple LoRAs to create new and unique images. You can learn more about it in the in-depth [Merge LoRAs](merge_loras) guide since merging is outside the scope of this loading guide.
107107

108108
</Tip>
109109

@@ -165,101 +165,14 @@ To unload the LoRA weights, use the [`~loaders.LoraLoaderMixin.unload_lora_weigh
165165
pipeline.unload_lora_weights()
166166
```
167167

168-
### Load multiple LoRAs
169-
170-
It can be fun to use multiple LoRAs together to create something entirely new and unique. The [`~loaders.LoraLoaderMixin.fuse_lora`] method allows you to fuse the LoRA weights with the original weights of the underlying model.
171-
172-
<Tip>
173-
174-
Fusing the weights can lead to a speedup in inference latency because you don't need to separately load the base model and LoRA! You can save your fused pipeline with [`~DiffusionPipeline.save_pretrained`] to avoid loading and fusing the weights every time you want to use the model.
175-
176-
</Tip>
177-
178-
Load an initial model:
179-
180-
```py
181-
from diffusers import StableDiffusionXLPipeline, AutoencoderKL
182-
import torch
183-
184-
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
185-
pipeline = StableDiffusionXLPipeline.from_pretrained(
186-
"stabilityai/stable-diffusion-xl-base-1.0",
187-
vae=vae,
188-
torch_dtype=torch.float16,
189-
).to("cuda")
190-
```
191-
192-
Next, load the LoRA checkpoint and fuse it with the original weights. The `lora_scale` parameter controls how much to scale the output by with the LoRA weights. It is important to make the `lora_scale` adjustments in the [`~loaders.LoraLoaderMixin.fuse_lora`] method because it won't work if you try to pass `scale` to the `cross_attention_kwargs` in the pipeline.
193-
194-
If you need to reset the original model weights for any reason (use a different `lora_scale`), you should use the [`~loaders.LoraLoaderMixin.unfuse_lora`] method.
195-
196-
```py
197-
pipeline.load_lora_weights("ostris/ikea-instructions-lora-sdxl")
198-
pipeline.fuse_lora(lora_scale=0.7)
199-
200-
# to unfuse the LoRA weights
201-
pipeline.unfuse_lora()
202-
```
203-
204-
Then fuse this pipeline with the next set of LoRA weights:
205-
206-
```py
207-
pipeline.load_lora_weights("ostris/super-cereal-sdxl-lora")
208-
pipeline.fuse_lora(lora_scale=0.7)
209-
```
210-
211-
<Tip warning={true}>
212-
213-
You can't unfuse multiple LoRA checkpoints, so if you need to reset the model to its original weights, you'll need to reload it.
214-
215-
</Tip>
216-
217-
Now you can generate an image that uses the weights from both LoRAs:
218-
219-
```py
220-
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration"
221-
image = pipeline(prompt).images[0]
222-
image
223-
```
224-
225-
### 🤗 PEFT
226-
227-
<Tip>
228-
229-
Read the [Inference with 🤗 PEFT](../tutorials/using_peft_for_inference) tutorial to learn more about its integration with 🤗 Diffusers and how you can easily work with and juggle multiple adapters. You'll need to install 🤗 Diffusers and PEFT from source to run the example in this section.
230-
231-
</Tip>
232-
233-
Another way you can load and use multiple LoRAs is to specify the `adapter_name` parameter in [`~loaders.LoraLoaderMixin.load_lora_weights`]. This method takes advantage of the 🤗 PEFT integration. For example, load and name both LoRA weights:
234-
235-
```py
236-
from diffusers import DiffusionPipeline
237-
import torch
238-
239-
pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
240-
pipeline.load_lora_weights("ostris/ikea-instructions-lora-sdxl", weight_name="ikea_instructions_xl_v1_5.safetensors", adapter_name="ikea")
241-
pipeline.load_lora_weights("ostris/super-cereal-sdxl-lora", weight_name="cereal_box_sdxl_v1.safetensors", adapter_name="cereal")
242-
```
243-
244-
Now use the [`~loaders.UNet2DConditionLoadersMixin.set_adapters`] to activate both LoRAs, and you can configure how much weight each LoRA should have on the output:
245-
246-
```py
247-
pipeline.set_adapters(["ikea", "cereal"], adapter_weights=[0.7, 0.5])
248-
```
249-
250-
Then, generate an image:
251-
252-
```py
253-
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration"
254-
image = pipeline(prompt, num_inference_steps=30, cross_attention_kwargs={"scale": 1.0}).images[0]
255-
image
256-
```
257-
258168
### Kohya and TheLastBen
259169

260170
Other popular LoRA trainers from the community include those by [Kohya](https://github.com/kohya-ss/sd-scripts/) and [TheLastBen](https://github.com/TheLastBen/fast-stable-diffusion). These trainers create different LoRA checkpoints than those trained by 🤗 Diffusers, but they can still be loaded in the same way.
261171

262-
Let's download the [Blueprintify SD XL 1.0](https://civitai.com/models/150986/blueprintify-sd-xl-10) checkpoint from [Civitai](https://civitai.com/):
172+
<hfoptions id="other-trainers">
173+
<hfoption id="Kohya">
174+
175+
To load a Kohya LoRA, let's download the [Blueprintify SD XL 1.0](https://civitai.com/models/150986/blueprintify-sd-xl-10) checkpoint from [Civitai](https://civitai.com/) as an example:
263176

264177
```sh
265178
!wget https://civitai.com/api/download/models/168776 -O blueprintify-sd-xl-10.safetensors
@@ -293,6 +206,9 @@ Some limitations of using Kohya LoRAs with 🤗 Diffusers include:
293206

294207
</Tip>
295208

209+
</hfoption>
210+
<hfoption id="TheLastBen">
211+
296212
Loading a checkpoint from TheLastBen is very similar. For example, to load the [TheLastBen/William_Eggleston_Style_SDXL](https://huggingface.co/TheLastBen/William_Eggleston_Style_SDXL) checkpoint:
297213

298214
```py
@@ -308,6 +224,9 @@ image = pipeline(prompt=prompt).images[0]
308224
image
309225
```
310226

227+
</hfoption>
228+
</hfoptions>
229+
311230
## IP-Adapter
312231

313232
[IP-Adapter](https://ip-adapter.github.io/) is a lightweight adapter that enables image prompting for any diffusion model. This adapter works by decoupling the cross-attention layers of the image and text features. All the other model components are frozen and only the embedded image features in the UNet are trained. As a result, IP-Adapter files are typically only ~100MBs.

0 commit comments

Comments
 (0)