Kolors additional pipelines, community contrib #11372

Teriks · 2025-04-21T03:19:27Z

These are primarily ControlNet pipelines for Kolors.

Adapted from: https://github.com/Kwai-Kolors/Kolors

For direct use with diffusers existing ControlNetModel + an additional pipeline for inpainting.

Compatibility with diffusers ControlNetModel is accomplished via temporary patching of the ControlNetModel instance, (no global side effects such as modifying the class itself)

This supports MultiControlNetModel as well.

Pipelines here:

KolorsControlNetPipeline
KolorsControlNetImg2ImgPipeline
KolorsControlNetInpaintPipeline
KolorsInpaintPipeline

Complete doc with example strings, but could use a look over.

Wish to contribute these to community pipelines, these could be of reference (or adapted) to implement these pipelines in mainline diffusers if desired.

I wrote these for integration into a personal project (CLI/GUI) tool, and they have been tested somewhat thoroughly.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul @yiyixuxu @asomoza

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Adapted from: https://github.com/Kwai-Kolors/Kolors Mostly for direct use with diffusers existing ControlNetModel code + an additional pipeline for inpainting. Pipelines here: KolorsControlNetPipeline KolorsControlNetImg2ImgPipeline KolorsControlNetInpaintPipeline KolorsInpaintPipeline Complete doc, but could use a look over.

yiyixuxu

thanks!

HuggingFaceDocBuilderDev · 2025-04-21T18:31:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2025-04-21T18:42:08Z

@bot /style

yiyixuxu · 2025-04-21T18:45:03Z

thanks for the PR!
can you run make style so that our CI pass? (I think there are some issues that our style bot cannot fix)

Teriks · 2025-04-21T22:44:59Z

thanks for the PR! can you run make style so that our CI pass? (I think there are some issues that our style bot cannot fix)

Other than the formatting issues I see an issue with callback / callback_on_step_end, and a dup method definition, from make style that I should fix.

So I will fix the formatting and then try to clean that up this week.

…arity Example string doc fixes, make sure variant=fp16 Fix device mismatch for encoder_hidden_states in ControlNetModel patch when sequential offload is enabled. Fix _get_add_time_ids implementations and add proper callback_on_step_end implementations In KolorsControlNetImg2ImgPipeline & KolorsControlNetInpaintPipeline Properly implement __call__ arguments: negative_original_size negative_crops_coords_top_left negative_target_size aesthetic_score negative_aesthetic_score In KolorsControlNetPipeline Properly implement __call__ arguments: negative_original_size negative_crops_coords_top_left negative_target_size This covers all typical SDXL conditioning arguments KolorsControlNetPipeline.__call__ argument "control_image" rename to -> "image" in order to match StableDiffusionXLControlNetPipeline

…sers into kolors_additional_community

Teriks · 2025-04-23T06:19:39Z

@yiyixuxu

Should be good to go

I have resolved the issues revealed by make style regarding callback_on_step_end.

Also resolved additional issues related to device mismatch caused by sequential offload, which had not existed for me in diffusers 0.33.1 but exists in the current main branch with these pipelines.

All pipelines in this pull now have proper implementation for callback_on_step_end which is identical to their SDXL pipeline counterparts.

controlnet pipelines for Img2img and Inpaint now have the typical SDXL conditioning arguments.

KolorsControlNetPipeline now accepts image instead of control_image, which is the same as StableDiffusionXLControlNetPipeline

===== #11372 =====

Example string doc fixes, make sure variant=fp16

Fix device mismatch for encoder_hidden_states in ControlNetModel patch when sequential offload is enabled.

Fix _get_add_time_ids implementations and add proper callback_on_step_end implementations

In KolorsControlNetImg2ImgPipeline & KolorsControlNetInpaintPipeline

Properly implement __call__ arguments:

negative_original_size
negative_crops_coords_top_left
negative_target_size
aesthetic_score
negative_aesthetic_score

In KolorsControlNetPipeline

Properly implement __call__ arguments:

negative_original_size
negative_crops_coords_top_left
negative_target_size

This covers all typical SDXL conditioning arguments

KolorsControlNetPipeline.__call__ argument "control_image" rename to -> "image" in order to match StableDiffusionXLControlNetPipeline

=====

Here are example scripts I used for hand testing:

KolorsControlNetImg2ImgPipeline

import torch
import numpy as np
from PIL import Image

from transformers import DPTImageProcessor, DPTForDepthEstimation
from examples.community.pipeline_controlnet_xl_kolors_img2img import KolorsControlNetImg2ImgPipeline
from diffusers.utils import load_image
from diffusers import ControlNetModel

depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to("cuda")
feature_extractor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
controlnet = ControlNetModel.from_pretrained(
    "Kwai-Kolors/Kolors-ControlNet-Depth",
    use_safetensors=True,
    torch_dtype=torch.float16,
)
pipe = KolorsControlNetImg2ImgPipeline.from_pretrained(
    "Kwai-Kolors/Kolors-diffusers",
    controlnet=controlnet,
    variant="fp16",
    use_safetensors=True,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

def get_depth_map(image):
    image = feature_extractor(images=image, return_tensors="pt").pixel_values.to("cuda")

    with torch.no_grad(), torch.autocast("cuda"):
        depth_map = depth_estimator(image).predicted_depth

    depth_map = torch.nn.functional.interpolate(
        depth_map.unsqueeze(1),
        size=(1024, 1024),
        mode="bicubic",
        align_corners=False,
    )
    depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True)
    depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True)
    depth_map = (depth_map - depth_min) / (depth_max - depth_min)
    image = torch.cat([depth_map] * 3, dim=1)
    image = image.permute(0, 2, 3, 1).cpu().numpy()[0]
    image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8))
    return image

prompt = "A robot, 4k photo"
image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
    "/kandinsky/cat.png"
).resize((1024, 1024))
controlnet_conditioning_scale = 0.5  # recommended for good generalization
depth_image = get_depth_map(image)

def callback(pipe, step_index, timestep, callback_kwargs):
    print(step_index, timestep)
    return callback_kwargs

images = pipe(
    prompt,
    image=image,
    control_image=depth_image,
    strength=0.80,
    num_inference_steps=50,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    callback_on_step_end=callback
).images

images[0].save("kolors_controlnet_img2img_output.png")

KolorsControlNetInpaintPipeline

import torch
import numpy as np
from PIL import Image
import cv2
from examples.community.pipeline_controlnet_xl_kolors_inpaint import KolorsControlNetInpaintPipeline
from diffusers import ControlNetModel

from diffusers.utils import load_image

init_image = load_image(
     "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy.png"
)
init_image = init_image.resize((1024, 1024))

generator = torch.Generator(device="cpu").manual_seed(1)

mask_image = load_image(
     "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy_mask.png"
)
mask_image = mask_image.resize((1024, 1024))


def make_canny_condition(image):
    image = np.array(image)
    image = cv2.Canny(image, 100, 200)
    image = image[:, :, None]
    image = np.concatenate([image, image, image], axis=2)
    image = Image.fromarray(image)
    return image


control_image = make_canny_condition(init_image)

controlnet = ControlNetModel.from_pretrained(
     "Kwai-Kolors/Kolors-ControlNet-Canny",
    use_safetensors=True,
    torch_dtype=torch.float16
)
pipe = KolorsControlNetInpaintPipeline.from_pretrained(
     "Kwai-Kolors/Kolors-diffusers",
    controlnet=controlnet,
    variant="fp16",
    use_safetensors=True,
    torch_dtype=torch.float16
)

pipe.enable_model_cpu_offload()


def callback(pipe, step_index, timestep, callback_kwargs):
    print(step_index, timestep)
    return callback_kwargs


image = pipe(
     "a handsome man with ray-ban sunglasses",
    num_inference_steps=20,
    generator=generator,
    eta=1.0,
    image=init_image,
    mask_image=mask_image,
    control_image=control_image,
    callback_on_step_end=callback
).images[0]

image.save("kolors_controlnet_inpaint_output.png")

KolorsControlNetPipeline

import torch
from diffusers import ControlNetModel
from examples.community.pipeline_controlnet_xl_kolors import KolorsControlNetPipeline
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image

prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = "low quality, bad quality, sketches"

# download an image
image = load_image(
    "https://hf.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
)

# initialize the models and pipeline
controlnet_conditioning_scale = 0.5  # recommended for good generalization
controlnet = ControlNetModel.from_pretrained(
    "Kwai-Kolors/Kolors-ControlNet-Canny", torch_dtype=torch.float16
)

pipe = KolorsControlNetPipeline.from_pretrained(
    "Kwai-Kolors/Kolors-diffusers", controlnet=controlnet, torch_dtype=torch.float16, variant='fp16'
)
pipe.enable_model_cpu_offload()

# get canny image
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

def callback(pipe, step_index, timestep, callback_kwargs):
    print(step_index, timestep)
    return callback_kwargs

# generate image
image = pipe(
    prompt, 
    controlnet_conditioning_scale=controlnet_conditioning_scale, 
    image=canny_image,
    callback_on_step_end=callback
).images[0]

image.save("kolors_controlnet_output.png")

KolorsInpaintPipeline

import torch
from examples.community.pipeline_kolors_inpainting import KolorsInpaintPipeline
from diffusers.utils import load_image

# Initialize the pipeline
pipe = KolorsInpaintPipeline.from_pretrained(
    "Kwai-Kolors/Kolors-diffusers",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.enable_model_cpu_offload()

# Set up generation parameters
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")

prompt = "A majestic tiger sitting on a bench"

def callback(pipe, step_index, timestep, callback_kwargs):
    print(step_index, timestep)
    return callback_kwargs

# Run inference
image = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    num_inference_steps=50,
    strength=0.80,
    callback_on_step_end=callback
).images[0]

# Save output
image.save("kolors_inpaint_output.png")

yiyixuxu · 2025-04-23T21:07:46Z

thanks a lot @Teriks

yiyixuxu approved these changes Apr 21, 2025

View reviewed changes

yiyixuxu added the close-to-merge label Apr 21, 2025

Teriks and others added 4 commits April 21, 2025 17:48

make style

324db21

Merge branch 'main' into kolors_additional_community

d631171

Merge branch 'kolors_additional_community' of github.com:Teriks/diffu…

d5ec147

…sers into kolors_additional_community

Teriks added a commit to Teriks/dgenerate that referenced this pull request Apr 23, 2025

updated kolors pipelines huggingface/diffusers#11372

9098c35

yiyixuxu merged commit b4be422 into huggingface:main Apr 23, 2025
12 checks passed

yiyixuxu removed the close-to-merge label Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kolors additional pipelines, community contrib #11372

Kolors additional pipelines, community contrib #11372

Uh oh!

Teriks commented Apr 21, 2025

Uh oh!

yiyixuxu left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 21, 2025

Uh oh!

yiyixuxu commented Apr 21, 2025

Uh oh!

yiyixuxu commented Apr 21, 2025

Uh oh!

Teriks commented Apr 21, 2025 •

edited

Loading

Uh oh!

Teriks commented Apr 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

yiyixuxu commented Apr 23, 2025

Uh oh!

Uh oh!

Kolors additional pipelines, community contrib #11372

Kolors additional pipelines, community contrib #11372

Uh oh!

Conversation

Teriks commented Apr 21, 2025

Before submitting

Who can review?

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 21, 2025

Uh oh!

yiyixuxu commented Apr 21, 2025

Uh oh!

yiyixuxu commented Apr 21, 2025

Uh oh!

Teriks commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Teriks commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

KolorsControlNetImg2ImgPipeline

KolorsControlNetInpaintPipeline

KolorsControlNetPipeline

KolorsInpaintPipeline

Uh oh!

Uh oh!

yiyixuxu commented Apr 23, 2025

Uh oh!

Uh oh!

Teriks commented Apr 21, 2025 •

edited

Loading

Teriks commented Apr 23, 2025 •

edited

Loading