Open
Description
@wjf5203
Does Liquid support generating both image and text tokens in a single generation sequence, or is it limited to producing either an image sequence or a text sequence only?
For example, given the input:
"Please show an astronaut riding a horse, and briefly describe which planet is in the background."
I would like the output to be:
[generated image tokens of the astronaut on a horse] + "Behind him is Mars, with a reddish-brown surface and a thin atmosphere."
Metadata
Metadata
Assignees
Labels
No labels