Skip to content

Commit c891330

Browse files
authored
Add examples with Intel optimizations (huggingface#1579)
* Add examples with Intel optimizations (BF16 fine-tuning and inference) * Remove unused package * Add README for intel_opts and refine the description for research projects * Add notes of intel opts for diffusers
1 parent c5f04d4 commit c891330

File tree

6 files changed

+790
-0
lines changed

6 files changed

+790
-0
lines changed

examples/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,10 @@ For such examples, we are more lenient regarding the philosophy defined above an
5252
Examples that are useful for the community, but are either not yet deemed popular or not yet following our above philosophy should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder. The community folder therefore includes training examples and inference pipelines.
5353
**Note**: Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) to show to the community how you like to use `diffusers` 🪄.
5454

55+
## Research Projects
56+
57+
We also provide **research_projects** examples that are maintained by the community as defined in the respective research project folders. These examples are useful and offer the extended capabilities which are complementary to the official examples. You may refer to [research_projects](https://github.com/huggingface/diffusers/tree/main/examples/research_projects) for details.
58+
5559
## Important note
5660

5761
To make sure you can successfully run the latest versions of the example scripts, you have to **install the library from source** and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
## Diffusers examples with Intel optimizations
2+
3+
**This research project is not actively maintained by the diffusers team. For any questions or comments, please make sure to tag @hshen14 .**
4+
5+
This aims to provide diffusers examples with Intel optimizations such as Bfloat16 for training/fine-tuning acceleration and 8-bit integer (INT8) for inference acceleration on Intel platforms.
6+
7+
## Accelerating the fine-tuning for textual inversion
8+
9+
We accelereate the fine-tuning for textual inversion with Intel Extension for PyTorch. The [examples](textual_inversion) enable both single node and multi-node distributed training with Bfloat16 support on Intel Xeon Scalable Processor.
10+
11+
## Accelerating the inference for Stable Diffusion using Bfloat16
12+
13+
We start the inference acceleration with Bfloat16 using Intel Extension for PyTorch. The [script](inference_bf16.py) is generally designed to support standard Stable Diffusion models with Bfloat16 support.
14+
15+
## Accelerating the inference for Stable Diffusion using INT8
16+
17+
Coming soon ...
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
import torch
2+
3+
import intel_extension_for_pytorch as ipex
4+
from diffusers import StableDiffusionPipeline
5+
from PIL import Image
6+
7+
8+
def image_grid(imgs, rows, cols):
9+
assert len(imgs) == rows * cols
10+
11+
w, h = imgs[0].size
12+
grid = Image.new("RGB", size=(cols * w, rows * h))
13+
grid_w, grid_h = grid.size
14+
15+
for i, img in enumerate(imgs):
16+
grid.paste(img, box=(i % cols * w, i // cols * h))
17+
return grid
18+
19+
20+
prompt = ["a lovely <dicoo> in red dress and hat, in the snowly and brightly night, with many brighly buildings"]
21+
batch_size = 8
22+
prompt = prompt * batch_size
23+
24+
device = "cpu"
25+
model_id = "path-to-your-trained-model"
26+
model = StableDiffusionPipeline.from_pretrained(model_id)
27+
model = model.to(device)
28+
29+
# to channels last
30+
model.unet = model.unet.to(memory_format=torch.channels_last)
31+
model.vae = model.vae.to(memory_format=torch.channels_last)
32+
model.text_encoder = model.text_encoder.to(memory_format=torch.channels_last)
33+
model.safety_checker = model.safety_checker.to(memory_format=torch.channels_last)
34+
35+
# optimize with ipex
36+
model.unet = ipex.optimize(model.unet.eval(), dtype=torch.bfloat16, inplace=True)
37+
model.vae = ipex.optimize(model.vae.eval(), dtype=torch.bfloat16, inplace=True)
38+
model.text_encoder = ipex.optimize(model.text_encoder.eval(), dtype=torch.bfloat16, inplace=True)
39+
model.safety_checker = ipex.optimize(model.safety_checker.eval(), dtype=torch.bfloat16, inplace=True)
40+
41+
# compute
42+
seed = 666
43+
generator = torch.Generator(device).manual_seed(seed)
44+
with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
45+
images = model(prompt, guidance_scale=7.5, num_inference_steps=50, generator=generator).images
46+
47+
# save image
48+
grid = image_grid(images, rows=2, cols=4)
49+
grid.save(model_id + ".png")
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
## Textual Inversion fine-tuning example
2+
3+
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
4+
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
5+
6+
## Training with Intel Extension for PyTorch
7+
8+
Intel Extension for PyTorch provides the optimizations for faster training and inference on CPUs. You can leverage the training example "textual_inversion.py". Follow the [instructions](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion) to get the model and [dataset](https://huggingface.co/sd-concepts-library/dicoo2) before running the script.
9+
10+
The example supports both single node and multi-node distributed training:
11+
12+
### Single node training
13+
14+
```bash
15+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
16+
export DATA_DIR="path-to-dir-containing-dicoo-images"
17+
18+
python textual_inversion.py \
19+
--pretrained_model_name_or_path=$MODEL_NAME \
20+
--train_data_dir=$DATA_DIR \
21+
--learnable_property="object" \
22+
--placeholder_token="<dicoo>" --initializer_token="toy" \
23+
--seed=7 \
24+
--resolution=512 \
25+
--train_batch_size=1 \
26+
--gradient_accumulation_steps=1 \
27+
--max_train_steps=3000 \
28+
--learning_rate=2.5e-03 --scale_lr \
29+
--output_dir="textual_inversion_dicoo"
30+
```
31+
32+
Note: Bfloat16 is available on Intel Xeon Scalable Processors Cooper Lake or Sapphire Rapids. You may not get performance speedup without Bfloat16 support.
33+
34+
### Multi-node distributed training
35+
36+
Before running the scripts, make sure to install the library's training dependencies successfully:
37+
38+
```bash
39+
python -m pip install oneccl_bind_pt==1.13 -f https://developer.intel.com/ipex-whl-stable-cpu
40+
```
41+
42+
```bash
43+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
44+
export DATA_DIR="path-to-dir-containing-dicoo-images"
45+
46+
oneccl_bindings_for_pytorch_path=$(python -c "from oneccl_bindings_for_pytorch import cwd; print(cwd)")
47+
source $oneccl_bindings_for_pytorch_path/env/setvars.sh
48+
49+
python -m intel_extension_for_pytorch.cpu.launch --distributed \
50+
--hostfile hostfile --nnodes 2 --nproc_per_node 2 textual_inversion.py \
51+
--pretrained_model_name_or_path=$MODEL_NAME \
52+
--train_data_dir=$DATA_DIR \
53+
--learnable_property="object" \
54+
--placeholder_token="<dicoo>" --initializer_token="toy" \
55+
--seed=7 \
56+
--resolution=512 \
57+
--train_batch_size=1 \
58+
--gradient_accumulation_steps=1 \
59+
--max_train_steps=750 \
60+
--learning_rate=2.5e-03 --scale_lr \
61+
--output_dir="textual_inversion_dicoo"
62+
```
63+
The above is a simple distributed training usage on 2 nodes with 2 processes on each node. Add the right hostname or ip address in the "hostfile" and make sure these 2 nodes are reachable from each other. For more details, please refer to the [user guide](https://github.com/intel/torch-ccl).
64+
65+
66+
### Reference
67+
68+
We publish a [Medium blog](https://medium.com/intel-analytics-software/personalized-stable-diffusion-with-few-shot-fine-tuning-on-a-single-cpu-f01a3316b13) on how to create your own Stable Diffusion model on CPUs using textual inversion. Try it out now, if you have interests.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
accelerate
2+
torchvision
3+
transformers>=4.21.0
4+
ftfy
5+
tensorboard
6+
modelcards
7+
intel_extension_for_pytorch>=1.13

0 commit comments

Comments
 (0)