Skip to content

Commit b202127

Browse files
[Docs] add an example use for StableUnCLIPPipeline in the pipeline docs (huggingface#2897)
* improve stable unclip doc. * add: entry of StableUnCLIPPipeline to the docs * Apply suggestions from code review Co-authored-by: apolinario <[email protected]> --------- Co-authored-by: apolinario <[email protected]>
1 parent e47459c commit b202127

File tree

1 file changed

+40
-2
lines changed

1 file changed

+40
-2
lines changed

docs/source/en/api/pipelines/stable_unclip.mdx

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,50 @@ we do not add any additional noise to the image embeddings i.e. `noise_level = 0
3232
* [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip)
3333
* [stabilityai/stable-diffusion-2-1-unclip-small](https://hf.co/stabilityai/stable-diffusion-2-1-unclip-small)
3434
* Text-to-image
35-
* Coming soon!
35+
* [stabilityai/stable-diffusion-2-1-unclip-small](https://hf.co/stabilityai/stable-diffusion-2-1-unclip-small)
3636

3737
### Text-to-Image Generation
38+
Stable unCLIP can be leveraged for text-to-image generation by pipelining it with the prior model of KakaoBrain's open source DALL-E 2 replication [Karlo](https://huggingface.co/kakaobrain/karlo-v1-alpha)
39+
40+
```python
41+
import torch
42+
from diffusers import UnCLIPScheduler, DDPMScheduler, StableUnCLIPPipeline
43+
from diffusers.models import PriorTransformer
44+
from transformers import CLIPTokenizer, CLIPTextModelWithProjection
45+
46+
prior_model_id = "kakaobrain/karlo-v1-alpha"
47+
data_type = torch.float16
48+
prior = PriorTransformer.from_pretrained(prior_model_id, subfolder="prior", torch_dtype=data_type)
49+
50+
prior_text_model_id = "openai/clip-vit-large-patch14"
51+
prior_tokenizer = CLIPTokenizer.from_pretrained(prior_text_model_id)
52+
prior_text_model = CLIPTextModelWithProjection.from_pretrained(prior_text_model_id, torch_dtype=data_type)
53+
prior_scheduler = UnCLIPScheduler.from_pretrained(prior_model_id, subfolder="prior_scheduler")
54+
prior_scheduler = DDPMScheduler.from_config(prior_scheduler.config)
55+
56+
stable_unclip_model_id = "stabilityai/stable-diffusion-2-1-unclip-small"
57+
58+
pipe = StableUnCLIPPipeline.from_pretrained(
59+
stable_unclip_model_id,
60+
torch_dtype=data_type,
61+
variant="fp16",
62+
prior_tokenizer=prior_tokenizer,
63+
prior_text_encoder=prior_text_model,
64+
prior=prior,
65+
prior_scheduler=prior_scheduler,
66+
)
67+
68+
pipe = pipe.to("cuda")
69+
wave_prompt = "dramatic wave, the Oceans roar, Strong wave spiral across the oceans as the waves unfurl into roaring crests; perfect wave form; perfect wave shape; dramatic wave shape; wave shape unbelievable; wave; wave shape spectacular"
70+
71+
images = pipe(prompt=wave_prompt).images
72+
images[0].save("waves.png")
73+
```
74+
<Tip warning={true}>
3875

39-
Coming soon!
76+
For text-to-image we use `stabilityai/stable-diffusion-2-1-unclip-small` as it was trained on CLIP ViT-L/14 embedding, the same as the Karlo model prior. [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip) was trained on OpenCLIP ViT-H, so we don't recommend its use.
4077

78+
</Tip>
4179

4280
### Text guided Image-to-Image Variation
4381

0 commit comments

Comments
 (0)