[BUG] fixes in kadinsky pipeline #11080

ishan-modi · 2025-03-17T06:33:01Z

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev · 2025-03-17T08:54:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2025-04-08T02:27:00Z

thanks @ishan-modi !
are you able to run the docstring examples for these pipelines and see if the outputs are same in this branch vs on main?

ishan-modi · 2025-04-08T07:54:35Z

I tried running both branches for kadinsky3 and following are the results

they are slightly different potentially due to usage of quantization/balanced_strategy(device_map) because of limited GPU, not sure though

main	fixes-issue-11060	diff

Also preprocess and postprocess are exactly same as before

import torch
import numpy as np
from diffusers.image_processor import VaeImageProcessor
from diffusers.utils import load_image
from diffusers.pipelines.pipeline_utils import numpy_to_pil

input_image = load_image(
 "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png"
)
midpoint_image = torch.randn(1, 4, 64, 64)
image_processor = VaeImageProcessor(
    vae_scale_factor=2**3,
    vae_latent_channels=4,
    resample="bicubic",
    reducing_gap=1
)

def preprocess(image, image_processor=None, branch="default"):
    if branch == "main":
        arr = np.array(image.convert("RGB"))
        arr = arr.astype(np.float32) / 127.5 - 1
        arr = np.transpose(arr, [2, 0, 1])
        image = torch.from_numpy(arr).unsqueeze(0)
        return image
    
    image = image_processor.preprocess(image)
    return image

def postprocess(image, image_processor=None, branch="default"):
    if branch=="main":
        image = image * 0.5 + 0.5
        image = image.clamp(0, 1)
        image = image.cpu().permute(0, 2, 3, 1).float().numpy()
        image = numpy_to_pil(image)[0]
        return image
    
    image = image_processor.postprocess(image)[0]
    return image

if torch.equal(preprocess(input_image, image_processor), preprocess(input_image, branch="main")):
    print("Preprocessed images are exactly the same.")

if list(postprocess(midpoint_image, image_processor).getdata()) == list(postprocess(midpoint_image, branch="main").getdata()):
    print("Postprocessed images are exactly the same.")

yiyixuxu

thanks!

yiyixuxu · 2025-04-08T19:59:30Z

I think there are some kandinsky test failure that's potentially related https://github.com/huggingface/diffusers/actions/runs/14330695758/job/40203664079?pr=11080#step:6:33932

ishan-modi · 2025-04-09T04:30:47Z

Thanks for the review, fixed the tests !

yiyixuxu · 2025-04-10T02:45:17Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_controlnet_img2img.py

-            resample="bicubic",
-            reducing_gap=1,
-        )
+        kwargs = {}


ohh we can just give them a default value like this

movq_scale_factor = 2 ** (len(self.movq.config.block_out_channels) - 1) if getattr(self, "movq", None) else .. movq_latent_channels = self.movq.config.latent_channels if getattr(self, "movq", None) else ..

to be consistent with how it is handled in other pipelines, for example https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py#L218

hlky

Same comment for all pipelines.

hlky · 2025-04-10T05:04:59Z

src/diffusers/pipelines/kandinsky3/pipeline_kandinsky3_img2img.py

            else:
                image = latents
+            image = self.image_processor.postprocess(image, output_type)


This breaks output_type == "latent". image_processor.postprocess should not be applied to latent output.

hmm, postprocess function doesn't do anything if the output_type is "latent", see here.

Let me know if I am missing anything

Even so, the code style does not match other pipelines, please make the requested changes to keep consistency with code styling.

Alright, made the change let me know if it looks good

hlky · 2025-04-10T05:54:47Z

src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_img2img.py

+        if output_type not in ["pt", "np", "pil", "latent"]:
+            raise ValueError(
+                f"Only the output types `pt`, `pil`, `np` and `latent` are supported not output_type={output_type}"
+            )


This can be removed. Refer to other pipelines for an example.

diffusers/src/diffusers/pipelines/flux/pipeline_flux.py

Lines 974 to 981 in 31c4f24

if output_type == "latent":

image = latents

else:

latents = self._unpack_latents(latents, height, width, self.vae_scale_factor)

latents = (latents / self.vae.config.scaling_factor) + self.vae.config.shift_factor

image = self.vae.decode(latents, return_dict=False)[0]

image = self.image_processor.postprocess(image, output_type=output_type)

ishan-modi · 2025-04-19T04:48:27Z

@hlky gentle ping

yiyixuxu · 2025-04-21T18:41:22Z

thanks a lot @ishan-modi

ishan-modi added 3 commits March 17, 2025 12:01

bug fix kadinsky pipeline

4f3bd13

code quality updates

e8e29ac

update

b24c6a8

yiyixuxu added the close-to-merge label Apr 8, 2025

yiyixuxu approved these changes Apr 8, 2025

View reviewed changes

update

e6fa809

ishan-modi force-pushed the fixes-issue-11060 branch from 978ad6b to e6fa809 Compare April 9, 2025 04:26

yiyixuxu reviewed Apr 10, 2025

View reviewed changes

addressed PR comments

32a5c8a

hlky suggested changes Apr 10, 2025

View reviewed changes

addressed PR comments

cdd54f4

hlky suggested changes Apr 10, 2025

View reviewed changes

update

410bc5a

ishan-modi requested a review from hlky April 10, 2025 07:19

Merge branch 'main' into fixes-issue-11060

237e502

yiyixuxu approved these changes Apr 21, 2025

View reviewed changes

yiyixuxu merged commit 79ea8eb into huggingface:main Apr 21, 2025
11 of 12 checks passed

ishan-modi deleted the fixes-issue-11060 branch April 22, 2025 03:55


	if output_type == "latent":
	image = latents
	else:
	latents = self._unpack_latents(latents, height, width, self.vae_scale_factor)
	latents = (latents / self.vae.config.scaling_factor) + self.vae.config.shift_factor
	image = self.vae.decode(latents, return_dict=False)[0]
	image = self.image_processor.postprocess(image, output_type=output_type)

[BUG] fixes in kadinsky pipeline #11080

[BUG] fixes in kadinsky pipeline #11080

Uh oh!

Conversation

ishan-modi commented Mar 17, 2025

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2025

Uh oh!

yiyixuxu commented Apr 8, 2025

Uh oh!

ishan-modi commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Apr 8, 2025

Uh oh!

ishan-modi commented Apr 9, 2025

Uh oh!

yiyixuxu Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

hlky left a comment

Choose a reason for hiding this comment

Uh oh!

hlky Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

ishan-modi Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

hlky Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

ishan-modi Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

hlky Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

ishan-modi commented Apr 19, 2025

Uh oh!

Uh oh!

yiyixuxu commented Apr 21, 2025

Uh oh!

Uh oh!

ishan-modi commented Apr 8, 2025 •

edited

Loading