Skip to content

Commit d0d3e24

Browse files
authored
Textual inversion (huggingface#266)
* add textual inversion script * make the loop work * make coarse_loss optional * save pipeline after training * add arg pretrained_model_name_or_path * fix saving * fix gradient_accumulation_steps * style * fix progress bar steps * scale lr * add argument to accept style * remove unused args * scale lr using num gpus * load tokenizer using args * add checks when converting init token to id * improve commnets and style * document args * more cleanup * fix default adamw arsg * TextualInversionWrapper -> CLIPTextualInversionWrapper * fix tokenizer loading * Use the CLIPTextModel instead of wrapper * clean dataset * remove commented code * fix accessing grads for multi-gpu * more cleanup * fix saving on multi-GPU * init_placeholder_token_embeds * add seed * fix flip * fix multi-gpu * add utility methods in wrapper * remove ipynb * don't use wrapper * dont pass vae an dunet to accelerate prepare * bring back accelerator.accumulate * scale latents * use only one progress bar for steps * push_to_hub at the end of training * remove unused args * log some important stats * store args in tensorboard * pretty comments * save the trained embeddings * mobe the script up * add requirements file * more cleanup * fux typo * begin readme * style -> learnable_property * keep vae and unet in eval mode * address review comments * address more comments * removed unused args * add train command in readme * update readme
1 parent 5164c9f commit d0d3e24

File tree

3 files changed

+661
-0
lines changed

3 files changed

+661
-0
lines changed
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
## Textual Inversion fine-tuning example
2+
3+
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images using just 3-5 examples.
4+
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
5+
6+
### Installing the dependencies
7+
8+
Before running the scipts, make sure to install the library's training dependencies:
9+
10+
```bash
11+
pip install diffusers[training] accelerate transformers
12+
```
13+
14+
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
15+
16+
```bash
17+
accelerate config
18+
```
19+
20+
21+
### Cat toy example
22+
23+
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
24+
25+
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
26+
27+
Run the following command to autheticate your token
28+
29+
```bash
30+
huggingface-cli login
31+
```
32+
33+
If you have already cloned the repo, then you won't need to go through these steps. You can simple remove the `--use_auth_token` arg from the following command.
34+
35+
<br>
36+
37+
Now let's get our dataset.Download 3-4 images from [here](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and save them in a directory. This will be our training data.
38+
39+
And launch the training using
40+
41+
```bash
42+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
43+
export DATA_DIR="path-to-dir-containing-images"
44+
45+
accelerate launch textual_inversion.py \
46+
--pretrained_model_name_or_path=$MODEL_NAME --use_auth_token \
47+
--train_data_dir=$DATA_DIR \
48+
--learnable_property="object" \
49+
--placeholder_token="<cat-toy>" --initializer_token="toy" \
50+
--resolution=512 \
51+
--train_batch_size=1 \
52+
--gradient_accumulation_steps=2 \
53+
--max_train_steps=3000 \
54+
--learning_rate=5.0e-04 --scale_lr \
55+
--lr_scheduler="constant" \
56+
--lr_warmup_steps=0 \
57+
--output_dir="textual_inversion_cat"
58+
```
59+
60+
A full training run takes ~1 hour on one V100 GPU.
61+
62+
63+
### Inference
64+
65+
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.
66+
67+
68+
```python
69+
70+
from torch import autocast
71+
from diffusers import StableDiffusionPipeline
72+
73+
model_id = "path-to-your-trained-model"
74+
pipe = pipe = StableDiffusionPipeline.from_pretrained(model_id,torch_dtype=torch.float16).to("cuda")
75+
76+
prompt = "A <cat-toy> backpack"
77+
78+
with autocast("cuda"):
79+
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5)["sample"][0]
80+
81+
image.save("cat-backpack.png")
82+
```
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
accelerate
2+
torchvision
3+
transformers

0 commit comments

Comments
 (0)