-
Notifications
You must be signed in to change notification settings - Fork 6k
Kolors additional pipelines, community contrib #11372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kolors additional pipelines, community contrib #11372
Conversation
Adapted from: https://github.com/Kwai-Kolors/Kolors Mostly for direct use with diffusers existing ControlNetModel code + an additional pipeline for inpainting. Pipelines here: KolorsControlNetPipeline KolorsControlNetImg2ImgPipeline KolorsControlNetInpaintPipeline KolorsInpaintPipeline Complete doc, but could use a look over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@bot /style |
thanks for the PR! |
Other than the formatting issues I see an issue with callback / callback_on_step_end, and a dup method definition, from make style that I should fix. So I will fix the formatting and then try to clean that up this week. |
…arity Example string doc fixes, make sure variant=fp16 Fix device mismatch for encoder_hidden_states in ControlNetModel patch when sequential offload is enabled. Fix _get_add_time_ids implementations and add proper callback_on_step_end implementations In KolorsControlNetImg2ImgPipeline & KolorsControlNetInpaintPipeline Properly implement __call__ arguments: negative_original_size negative_crops_coords_top_left negative_target_size aesthetic_score negative_aesthetic_score In KolorsControlNetPipeline Properly implement __call__ arguments: negative_original_size negative_crops_coords_top_left negative_target_size This covers all typical SDXL conditioning arguments KolorsControlNetPipeline.__call__ argument "control_image" rename to -> "image" in order to match StableDiffusionXLControlNetPipeline
…sers into kolors_additional_community
Should be good to go I have resolved the issues revealed by Also resolved additional issues related to device mismatch caused by sequential offload, which had not existed for me in diffusers 0.33.1 but exists in the current main branch with these pipelines. All pipelines in this pull now have proper implementation for controlnet pipelines for Img2img and Inpaint now have the typical SDXL conditioning arguments.
===== #11372 ===== Example string doc fixes, make sure Fix device mismatch for Fix In Properly implement negative_original_size In Properly implement negative_original_size This covers all typical SDXL conditioning arguments
===== Here are example scripts I used for hand testing: KolorsControlNetImg2ImgPipelineimport torch
import numpy as np
from PIL import Image
from transformers import DPTImageProcessor, DPTForDepthEstimation
from examples.community.pipeline_controlnet_xl_kolors_img2img import KolorsControlNetImg2ImgPipeline
from diffusers.utils import load_image
from diffusers import ControlNetModel
depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to("cuda")
feature_extractor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
controlnet = ControlNetModel.from_pretrained(
"Kwai-Kolors/Kolors-ControlNet-Depth",
use_safetensors=True,
torch_dtype=torch.float16,
)
pipe = KolorsControlNetImg2ImgPipeline.from_pretrained(
"Kwai-Kolors/Kolors-diffusers",
controlnet=controlnet,
variant="fp16",
use_safetensors=True,
torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
def get_depth_map(image):
image = feature_extractor(images=image, return_tensors="pt").pixel_values.to("cuda")
with torch.no_grad(), torch.autocast("cuda"):
depth_map = depth_estimator(image).predicted_depth
depth_map = torch.nn.functional.interpolate(
depth_map.unsqueeze(1),
size=(1024, 1024),
mode="bicubic",
align_corners=False,
)
depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True)
depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True)
depth_map = (depth_map - depth_min) / (depth_max - depth_min)
image = torch.cat([depth_map] * 3, dim=1)
image = image.permute(0, 2, 3, 1).cpu().numpy()[0]
image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8))
return image
prompt = "A robot, 4k photo"
image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
"/kandinsky/cat.png"
).resize((1024, 1024))
controlnet_conditioning_scale = 0.5 # recommended for good generalization
depth_image = get_depth_map(image)
def callback(pipe, step_index, timestep, callback_kwargs):
print(step_index, timestep)
return callback_kwargs
images = pipe(
prompt,
image=image,
control_image=depth_image,
strength=0.80,
num_inference_steps=50,
controlnet_conditioning_scale=controlnet_conditioning_scale,
callback_on_step_end=callback
).images
images[0].save("kolors_controlnet_img2img_output.png") KolorsControlNetInpaintPipelineimport torch
import numpy as np
from PIL import Image
import cv2
from examples.community.pipeline_controlnet_xl_kolors_inpaint import KolorsControlNetInpaintPipeline
from diffusers import ControlNetModel
from diffusers.utils import load_image
init_image = load_image(
"https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy.png"
)
init_image = init_image.resize((1024, 1024))
generator = torch.Generator(device="cpu").manual_seed(1)
mask_image = load_image(
"https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy_mask.png"
)
mask_image = mask_image.resize((1024, 1024))
def make_canny_condition(image):
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
image = Image.fromarray(image)
return image
control_image = make_canny_condition(init_image)
controlnet = ControlNetModel.from_pretrained(
"Kwai-Kolors/Kolors-ControlNet-Canny",
use_safetensors=True,
torch_dtype=torch.float16
)
pipe = KolorsControlNetInpaintPipeline.from_pretrained(
"Kwai-Kolors/Kolors-diffusers",
controlnet=controlnet,
variant="fp16",
use_safetensors=True,
torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()
def callback(pipe, step_index, timestep, callback_kwargs):
print(step_index, timestep)
return callback_kwargs
image = pipe(
"a handsome man with ray-ban sunglasses",
num_inference_steps=20,
generator=generator,
eta=1.0,
image=init_image,
mask_image=mask_image,
control_image=control_image,
callback_on_step_end=callback
).images[0]
image.save("kolors_controlnet_inpaint_output.png") KolorsControlNetPipelineimport torch
from diffusers import ControlNetModel
from examples.community.pipeline_controlnet_xl_kolors import KolorsControlNetPipeline
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image
prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = "low quality, bad quality, sketches"
# download an image
image = load_image(
"https://hf.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
)
# initialize the models and pipeline
controlnet_conditioning_scale = 0.5 # recommended for good generalization
controlnet = ControlNetModel.from_pretrained(
"Kwai-Kolors/Kolors-ControlNet-Canny", torch_dtype=torch.float16
)
pipe = KolorsControlNetPipeline.from_pretrained(
"Kwai-Kolors/Kolors-diffusers", controlnet=controlnet, torch_dtype=torch.float16, variant='fp16'
)
pipe.enable_model_cpu_offload()
# get canny image
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
def callback(pipe, step_index, timestep, callback_kwargs):
print(step_index, timestep)
return callback_kwargs
# generate image
image = pipe(
prompt,
controlnet_conditioning_scale=controlnet_conditioning_scale,
image=canny_image,
callback_on_step_end=callback
).images[0]
image.save("kolors_controlnet_output.png") KolorsInpaintPipelineimport torch
from examples.community.pipeline_kolors_inpainting import KolorsInpaintPipeline
from diffusers.utils import load_image
# Initialize the pipeline
pipe = KolorsInpaintPipeline.from_pretrained(
"Kwai-Kolors/Kolors-diffusers",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.enable_model_cpu_offload()
# Set up generation parameters
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")
prompt = "A majestic tiger sitting on a bench"
def callback(pipe, step_index, timestep, callback_kwargs):
print(step_index, timestep)
return callback_kwargs
# Run inference
image = pipe(
prompt=prompt,
image=init_image,
mask_image=mask_image,
num_inference_steps=50,
strength=0.80,
callback_on_step_end=callback
).images[0]
# Save output
image.save("kolors_inpaint_output.png") |
thanks a lot @Teriks |
These are primarily ControlNet pipelines for Kolors.
Adapted from: https://github.com/Kwai-Kolors/Kolors
For direct use with diffusers existing
ControlNetModel
+ an additional pipeline for inpainting.Compatibility with diffusers
ControlNetModel
is accomplished via temporary patching of theControlNetModel
instance, (no global side effects such as modifying the class itself)This supports
MultiControlNetModel
as well.Pipelines here:
KolorsControlNetPipeline
KolorsControlNetImg2ImgPipeline
KolorsControlNetInpaintPipeline
KolorsInpaintPipeline
Complete doc with example strings, but could use a look over.
Wish to contribute these to community pipelines, these could be of reference (or adapted) to implement these pipelines in mainline diffusers if desired.
I wrote these for integration into a personal project (CLI/GUI) tool, and they have been tested somewhat thoroughly.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@sayakpaul @yiyixuxu @asomoza
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.