Skip to content

Commit 80c10d8

Browse files
yiyixuxuyiyixuxu
andauthored
update Kandinsky doc (huggingface#4301)
* update doc * fix an error in autopipe doc --------- Co-authored-by: yiyixuxu <yixu310@gmail,com>
1 parent 20e9258 commit 80c10d8

File tree

3 files changed

+97
-3
lines changed

3 files changed

+97
-3
lines changed

docs/source/en/api/pipelines/auto_pipeline.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,8 @@ Currently AutoPipeline support the Text-to-Image, Image-to-Image, and Inpainting
3939
- [Stable Diffusion Controlnet](./api/pipelines/controlnet)
4040
- [Stable Diffusion XL](./stable_diffusion/stable_diffusion_xl)
4141
- [IF](./if)
42-
- [Kandinsky](./kandinsky)(./kandinsky)(./kandinsky)(./kandinsky)(./kandinsky)
43-
- [Kandinsky 2.2]()(./kandinsky)
42+
- [Kandinsky](./kandinsky)
43+
- [Kandinsky 2.2](./kandinsky)
4444

4545

4646
## AutoPipelineForText2Image

docs/source/en/api/pipelines/kandinsky.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,30 @@ One cheeseburger monster coming up! Enjoy!
105105

106106
![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-docs/cheeseburger.png)
107107

108+
<Tip>
109+
110+
We also provide an end-to-end Kandinsky pipeline [`KandinskyCombinedPipeline`], which combines both the prior pipeline and text-to-image pipeline, and lets you perform inference in a single step. You can create the combined pipeline with the [`~AutoPipelineForTextToImage.from_pretrained`] method
111+
112+
```python
113+
from diffusers import AutoPipelineForTextToImage
114+
import torch
115+
116+
pipe = AutoPipelineForTextToImage.from_pretrained(
117+
"kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16
118+
)
119+
pipe.enable_model_cpu_offload()
120+
```
121+
122+
Under the hood, it will automatically load both [`KandinskyPriorPipeline`] and [`KandinskyPipeline`]. To generate images, you no longer need to call both pipelines and pass the outputs from one to another. You only need to call the combined pipeline once. You can set different `guidance_scale` and `num_inference_steps` for the prior pipeline with the `prior_guidance_scale` and `prior_num_inference_steps` arguments.
123+
124+
```python
125+
prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting"
126+
negative_prompt = "low quality, bad quality"
127+
128+
image = pipe(prompt=prompt, negative_prompt=negative_prompt, prior_guidance_scale =1.0, guidance_scacle = 4.0, height=768, width=768).images[0]
129+
```
130+
</Tip>
131+
108132
The Kandinsky model works extremely well with creative prompts. Here is some of the amazing art that can be created using the exact same process but with different prompts.
109133

110134
```python
@@ -187,6 +211,34 @@ out.images[0].save("fantasy_land.png")
187211
![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-docs/img2img_fantasyland.png)
188212

189213

214+
<Tip>
215+
216+
You can also use the [`KandinskyImg2ImgCombinedPipeline`] for end-to-end image-to-image generation with Kandinsky 2.1
217+
218+
```python
219+
from diffusers import AutoPipelineForImage2Image
220+
import torch
221+
import requests
222+
from io import BytesIO
223+
from PIL import Image
224+
import os
225+
226+
pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
227+
pipe.enable_model_cpu_offload()
228+
229+
prompt = "A fantasy landscape, Cinematic lighting"
230+
negative_prompt = "low quality, bad quality"
231+
232+
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
233+
234+
response = requests.get(url)
235+
original_image = Image.open(BytesIO(response.content)).convert("RGB")
236+
original_image.thumbnail((768, 768))
237+
238+
image = pipe(prompt=prompt, image=original_image, strength=0.3).images[0]
239+
```
240+
</Tip>
241+
190242
### Text Guided Inpainting Generation
191243

192244
You can use [`KandinskyInpaintPipeline`] to edit images. In this example, we will add a hat to the portrait of a cat.
@@ -231,6 +283,33 @@ image.save("cat_with_hat.png")
231283
```
232284
![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-docs/inpaint_cat_hat.png)
233285

286+
<Tip>
287+
288+
To use the [`KandinskyInpaintCombinedPipeline`] to perform end-to-end image inpainting generation, you can run below code instead
289+
290+
```python
291+
from diffusers import AutoPipelineForInpainting
292+
293+
pipe = AutoPipelineForInpainting.from_pretrained("kandinsky-community/kandinsky-2-1-inpaint", torch_dtype=torch.float16)
294+
pipe.enable_model_cpu_offload()
295+
image = pipe(prompt=prompt, image=original_image, mask_image=mask).images[0]
296+
```
297+
</Tip>
298+
299+
🚨🚨🚨 __Breaking change for Kandinsky Mask Inpainting__ 🚨🚨🚨
300+
301+
We introduced a breaking change for Kandinsky inpainting pipeline in the following pull request: https://github.com/huggingface/diffusers/pull/4207. Previously we accepted a mask format where black pixels represent the masked-out area. This is inconsistent with all other pipelines in diffusers. We have changed the mask format in Knaindsky and now using white pixels instead.
302+
Please upgrade your inpainting code to follow the above. If you are using Kandinsky Inpaint in production. You now need to change the mask to:
303+
304+
```python
305+
# For PIL input
306+
import PIL.ImageOps
307+
mask = PIL.ImageOps.invert(mask)
308+
309+
# For PyTorch and Numpy input
310+
mask = 1 - mask
311+
```
312+
234313
### Interpolate
235314

236315
The [`KandinskyPriorPipeline`] also comes with a cool utility function that will allow you to interpolate the latent space of different images and texts super easily. Here is an example of how you can create an Impressionist-style portrait for your pet based on "The Starry Night".

docs/source/en/api/pipelines/kandinsky_v22.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,22 @@ specific language governing permissions and limitations under the License.
1111

1212
The Kandinsky 2.2 release includes robust new text-to-image models that support text-to-image generation, image-to-image generation, image interpolation, and text-guided image inpainting. The general workflow to perform these tasks using Kandinsky 2.2 is the same as in Kandinsky 2.1. First, you will need to use a prior pipeline to generate image embeddings based on your text prompt, and then use one of the image decoding pipelines to generate the output image. The only difference is that in Kandinsky 2.2, all of the decoding pipelines no longer accept the `prompt` input, and the image generation process is conditioned with only `image_embeds` and `negative_image_embeds`.
1313

14-
Let's look at an example of how to perform text-to-image generation using Kandinsky 2.2.
14+
Same as with Kandinsky 2.1, the easiest way to perform text-to-image generation is to use the combined Kandinsky pipeline. This process is exactly the same as Kandinsky 2.1. All you need to do is to replace the Kandinsky 2.1 checkpoint with 2.2.
15+
16+
```python
17+
from diffusers import AutoPipelineForText2Image
18+
import torch
19+
20+
pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)
21+
pipe.enable_model_cpu_offload()
22+
23+
prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting"
24+
negative_prompt = "low quality, bad quality"
25+
26+
image = pipe(prompt=prompt, negative_prompt=negative_prompt, prior_guidance_scale =1.0, height=768, width=768).images[0]
27+
```
28+
29+
Now, let's look at an example where we take separate steps to run the prior pipeline and text-to-image pipeline. This way, we can understand what's happening under the hood and how Kandinsky 2.2 differs from Kandinsky 2.1.
1530

1631
First, let's create the prior pipeline and text-to-image pipeline with Kandinsky 2.2 checkpoints.
1732

0 commit comments

Comments
 (0)