Skip to content

Commit 14d224d

Browse files
asomozasayakpaul
andauthored
[Docs] SD3 T5 Token limit doc (huggingface#8654)
* doc for max_sequence_length * better position and changed note to tip * apply suggestions --------- Co-authored-by: Sayak Paul <[email protected]>
1 parent 540399f commit 14d224d

File tree

1 file changed

+41
-1
lines changed

1 file changed

+41
-1
lines changed

docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,6 @@ The SD3 pipeline uses three text encoders to generate an image. Model offloading
3535

3636
</Tip>
3737

38-
3938
```python
4039
import torch
4140
from diffusers import StableDiffusion3Pipeline
@@ -197,6 +196,47 @@ image.save("sd3_hello_world.png")
197196

198197
Check out the full script [here](https://gist.github.com/sayakpaul/508d89d7aad4f454900813da5d42ca97).
199198

199+
## Using Long Prompts with the T5 Text Encoder
200+
201+
By default, the T5 Text Encoder prompt uses a maximum sequence length of `256`. This can be adjusted by setting the `max_sequence_length` to accept fewer or more tokens. Keep in mind that longer sequences require additional resources and result in longer generation times, such as during batch inference.
202+
203+
```python
204+
prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature’s body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
205+
206+
image = pipe(
207+
prompt=prompt,
208+
negative_prompt="",
209+
num_inference_steps=28,
210+
guidance_scale=4.5,
211+
max_sequence_length=512,
212+
).images[0]
213+
```
214+
215+
### Sending a different prompt to the T5 Text Encoder
216+
217+
You can send a different prompt to the CLIP Text Encoders and the T5 Text Encoder to prevent the prompt from being truncated by the CLIP Text Encoders and to improve generation.
218+
219+
<Tip>
220+
221+
The prompt with the CLIP Text Encoders is still truncated to the 77 token limit.
222+
223+
</Tip>
224+
225+
```python
226+
prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. A river of warm, melted butter, pancake-like foliage in the background, a towering pepper mill standing in for a tree."
227+
228+
prompt_3 = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature’s body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
229+
230+
image = pipe(
231+
prompt=prompt,
232+
prompt_3=prompt_3,
233+
negative_prompt="",
234+
num_inference_steps=28,
235+
guidance_scale=4.5,
236+
max_sequence_length=512,
237+
).images[0]
238+
```
239+
200240
## Tiny AutoEncoder for Stable Diffusion 3
201241

202242
Tiny AutoEncoder for Stable Diffusion (TAESD3) is a tiny distilled version of Stable Diffusion 3's VAE by [Ollin Boer Bohan](https://github.com/madebyollin/taesd) that can decode [`StableDiffusion3Pipeline`] latents almost instantly.

0 commit comments

Comments
 (0)