Skip to content

Commit 87b9db6

Browse files
authored
[Core] Add Kolors (huggingface#8812)
* initial draft
1 parent b8cf84a commit 87b9db6

File tree

13 files changed

+3614
-0
lines changed

13 files changed

+3614
-0
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,8 @@
322322
title: Kandinsky 2.2
323323
- local: api/pipelines/kandinsky3
324324
title: Kandinsky 3
325+
- local: api/pipelines/kolors
326+
title: Kolors
325327
- local: api/pipelines/latent_consistency_models
326328
title: Latent Consistency Models
327329
- local: api/pipelines/latent_diffusion
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis
14+
15+
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/kolors/kolors_header_collage.png)
16+
17+
Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by [the Kuaishou Kolors team]([email protected]). Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this [technical report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf).
18+
19+
The abstract from the technical report is:
20+
21+
*We present Kolors, a latent diffusion model for text-to-image synthesis, characterized by its profound understanding of both English and Chinese, as well as an impressive degree of photorealism. There are three key insights contributing to the development of Kolors. Firstly, unlike large language model T5 used in Imagen and Stable Diffusion 3, Kolors is built upon the General Language Model (GLM), which enhances its comprehension capabilities in both English and Chinese. Moreover, we employ a multimodal large language model to recaption the extensive training dataset for fine-grained text understanding. These strategies significantly improve Kolors’ ability to comprehend intricate semantics, particularly those involving multiple entities, and enable its advanced text rendering capabilities. Secondly, we divide the training of Kolors into two phases: the concept learning phase with broad knowledge and the quality improvement phase with specifically curated high-aesthetic data. Furthermore, we investigate the critical role of the noise schedule and introduce a novel schedule to optimize high-resolution image generation. These strategies collectively enhance the visual appeal of the generated high-resolution images. Lastly, we propose a category-balanced benchmark KolorsPrompts, which serves as a guide for the training and evaluation of Kolors. Consequently, even when employing the commonly used U-Net backbone, Kolors has demonstrated remarkable performance in human evaluations, surpassing the existing open-source models and achieving Midjourney-v6 level performance, especially in terms of visual appeal. We will release the code and weights of Kolors at <https://github.com/Kwai-Kolors/Kolors>, and hope that it will benefit future research and applications in the visual generation community.*
22+
23+
## Usage Example
24+
25+
```python
26+
import torch
27+
28+
from diffusers import DPMSolverMultistepScheduler, KolorsPipeline
29+
30+
pipe = KolorsPipeline.from_pretrained("Kwai-Kolors/Kolors-diffusers", torch_dtype=torch.float16, variant="fp16")
31+
pipe.to("cuda")
32+
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
33+
34+
image = pipe(
35+
prompt='一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着"可图"',
36+
negative_prompt="",
37+
guidance_scale=6.5,
38+
num_inference_steps=25,
39+
).images[0]
40+
41+
image.save("kolors_sample.png")
42+
```
43+
44+
## KolorsPipeline
45+
46+
[[autodoc]] KolorsPipeline
47+
48+
- all
49+
- __call__

src/diffusers/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,8 @@
237237
"AudioLDMPipeline",
238238
"BlipDiffusionControlNetPipeline",
239239
"BlipDiffusionPipeline",
240+
"ChatGLMModel",
241+
"ChatGLMTokenizer",
240242
"CLIPImageProjection",
241243
"CycleDiffusionPipeline",
242244
"HunyuanDiTControlNetPipeline",
@@ -268,6 +270,8 @@
268270
"KandinskyV22Pipeline",
269271
"KandinskyV22PriorEmb2EmbPipeline",
270272
"KandinskyV22PriorPipeline",
273+
"KolorsImg2ImgPipeline",
274+
"KolorsPipeline",
271275
"LatentConsistencyModelImg2ImgPipeline",
272276
"LatentConsistencyModelPipeline",
273277
"LattePipeline",
@@ -642,6 +646,8 @@
642646
AudioLDM2ProjectionModel,
643647
AudioLDM2UNet2DConditionModel,
644648
AudioLDMPipeline,
649+
ChatGLMModel,
650+
ChatGLMTokenizer,
645651
CLIPImageProjection,
646652
CycleDiffusionPipeline,
647653
HunyuanDiTControlNetPipeline,
@@ -673,6 +679,8 @@
673679
KandinskyV22Pipeline,
674680
KandinskyV22PriorEmb2EmbPipeline,
675681
KandinskyV22PriorPipeline,
682+
KolorsImg2ImgPipeline,
683+
KolorsPipeline,
676684
LatentConsistencyModelImg2ImgPipeline,
677685
LatentConsistencyModelPipeline,
678686
LattePipeline,

src/diffusers/pipelines/__init__.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,12 @@
198198
"Kandinsky3Img2ImgPipeline",
199199
"Kandinsky3Pipeline",
200200
]
201+
_import_structure["kolors"] = [
202+
"KolorsPipeline",
203+
"KolorsImg2ImgPipeline",
204+
"ChatGLMModel",
205+
"ChatGLMTokenizer",
206+
]
201207
_import_structure["latent_consistency_models"] = [
202208
"LatentConsistencyModelImg2ImgPipeline",
203209
"LatentConsistencyModelPipeline",
@@ -481,6 +487,12 @@
481487
Kandinsky3Img2ImgPipeline,
482488
Kandinsky3Pipeline,
483489
)
490+
from .kolors import (
491+
ChatGLMModel,
492+
ChatGLMTokenizer,
493+
KolorsImg2ImgPipeline,
494+
KolorsPipeline,
495+
)
484496
from .latent_consistency_models import (
485497
LatentConsistencyModelImg2ImgPipeline,
486498
LatentConsistencyModelPipeline,
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
from typing import TYPE_CHECKING
2+
3+
from ...utils import (
4+
DIFFUSERS_SLOW_IMPORT,
5+
OptionalDependencyNotAvailable,
6+
_LazyModule,
7+
get_objects_from_module,
8+
is_torch_available,
9+
is_transformers_available,
10+
)
11+
12+
13+
_dummy_objects = {}
14+
_import_structure = {}
15+
16+
try:
17+
if not (is_transformers_available() and is_torch_available()):
18+
raise OptionalDependencyNotAvailable()
19+
except OptionalDependencyNotAvailable:
20+
from ...utils import dummy_torch_and_transformers_objects # noqa F403
21+
22+
_dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
23+
else:
24+
_import_structure["pipeline_kolors"] = ["KolorsPipeline"]
25+
_import_structure["pipeline_kolors_img2img"] = ["KolorsImg2ImgPipeline"]
26+
_import_structure["text_encoder"] = ["ChatGLMModel"]
27+
_import_structure["tokenizer"] = ["ChatGLMTokenizer"]
28+
29+
if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
30+
try:
31+
if not (is_transformers_available() and is_torch_available()):
32+
raise OptionalDependencyNotAvailable()
33+
except OptionalDependencyNotAvailable:
34+
from ...utils.dummy_torch_and_transformers_objects import *
35+
36+
else:
37+
from .pipeline_kolors import KolorsPipeline
38+
from .pipeline_kolors_img2img import KolorsImg2ImgPipeline
39+
from .text_encoder import ChatGLMModel
40+
from .tokenizer import ChatGLMTokenizer
41+
42+
else:
43+
import sys
44+
45+
sys.modules[__name__] = _LazyModule(
46+
__name__,
47+
globals()["__file__"],
48+
_import_structure,
49+
module_spec=__spec__,
50+
)
51+
52+
for name, value in _dummy_objects.items():
53+
setattr(sys.modules[__name__], name, value)

0 commit comments

Comments
 (0)