[docs] Model cards #11112

stevhliu · 2025-03-18T20:47:05Z

~~🚧 WIP 🚧~~

Based on our discussions about making it easier to run video models by including some minimal code optimized for memory and inference speed, this PR refactors the model card (starting with CogVideoX, but eventually expanding to other models as well) to reflect that. This provides users with easy copy/paste code they can run.

Parallel to this effort is to also improve the generic video generation guide.

HuggingFaceDocBuilderDev · 2025-03-18T20:53:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky · 2025-04-03T06:00:26Z

Looks good on first impression, I will review it in depth later today, wanted to raise #10301 with you as it will help simplify the examples (in combination with #11130 for the quantization cases). Also, it would be cool to have the examples be configurable/update with options, to demonstrate here's an artist's (4o) impression of what it could look like:

sayakpaul

This is a very good start. Left some comments, let me know if they make sense.

docs/source/en/_toctree.yml

docs/source/en/api/loaders/lora.md

sayakpaul · 2025-04-14T13:46:58Z

docs/source/en/api/pipelines/cogvideox.md

-|:---:|:---:|
-| [`THUDM/CogVideoX-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-5b-I2V) | torch.bfloat16 |
-| [`THUDM/CogVideoX-1.5-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-1.5-5b-I2V) | torch.bfloat16 |
+[CogVideoX](https://huggingface.co/papers/2408.06072) is a large diffusion transformer model - available in 2B and 5B parameters - designed to generate longer and more consistent videos from text. This model uses a 3D causal variational autoencoder to more efficiently process video data by reducing sequence length (and associated training compute) and preventing flickering in generated videos. An "expert" transformer with adaptive LayerNorm improves alignment between text and video, and 3D full attention helps accurately capture motion and time in generated videos.


This is okay but I would perhaps tackle the removal of the abstract section in a separate PR. Also, this does add an additional overload of coming up with a description for the paper. I would like to avoid that for now.

I think it'd be good to also tackle this now since for the new pipeline cards, we want to have a nice and complete example of what it should look like no?

Good point that adding a description of the paper adds additional overload, but I think its necessary, since we want to give users a version of the abstract that is more accessible (meaning using common everyday language) versus academic (inspired by @asomoza 's comment here)

I am a bit spread thin on this one. So, I will go with what the team prefers.

docs/source/en/api/pipelines/cogvideox.md

docs/source/en/api/pipelines/hunyuan_video.md

stevhliu · 2025-04-23T18:33:17Z

Thanks @hlky, those PRs look to be super nice for user experience and I'll update the code examples once it's merged! The configurable example is also really neat and maybe we can make a Space out of it and embed it in the docs? I'll probably have to follow up on this one in a separate PR though 😅

sayakpaul · 2025-04-28T15:30:23Z

@stevhliu sorry for the delay on my end. The changes look nice and I responded to some of the questions/comments you had. Perhaps after #11130, we could simplify the quantization examples a bit.

@a-r-r-o-w do we want to touch any other video models in this PR?

stevhliu · 2025-05-19T20:28:14Z

@sayakpaul, I simplified the examples with the new PipelineQuantizationConfig! Let me know if there are any other changes you'd like to see, otherwise I think we can merge!

sayakpaul

Looking much better and another round of feedback.

sayakpaul · 2025-05-20T02:56:11Z

docs/source/en/api/pipelines/cogvideox.md


-### Memory optimization
+- CogVideoX supports LoRAs with [`~loaders.CogVideoXLoraLoaderMixin.load_lora_weights`].


Do we need this separate note besides having the LoRA marker button at the top of the page?

I think it'd be nice to have a easy copy/paste example for users who want to use this specific model, will fold under collapsible section as suggested. I also added a link to the LoRA marker button at the top :)

docs/source/en/api/pipelines/hunyuan_video.md

sayakpaul · 2025-05-20T02:59:16Z

docs/source/en/api/pipelines/hunyuan_video.md


-Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`HunyuanVideoPipeline`] for inference with bitsandbytes.
+Compilation is slow the first time but subsequent calls to the pipeline are faster.


For compilation, should we also refer the readers to our compilation guide?

Added link to the compile section in fp16.md and will combine torch2.0.md with it in a separate PR as discussed!

docs/source/en/api/pipelines/hunyuan_video.md

hlky mentioned this pull request Mar 19, 2025

[Wan LoRAs] make T2V LoRAs compatible with Wan I2V #11107

Merged

stevhliu force-pushed the video branch 2 times, most recently from 9cda27b to b31ed8e Compare March 27, 2025 18:11

stevhliu force-pushed the video branch from 417c6a1 to b54c46b Compare March 31, 2025 23:05

stevhliu requested review from a-r-r-o-w, asomoza, sayakpaul and hlky March 31, 2025 23:38

sayakpaul reviewed Apr 14, 2025

View reviewed changes

stevhliu force-pushed the video branch from b54c46b to 2a4090b Compare April 23, 2025 18:14

stevhliu marked this pull request as ready for review April 23, 2025 18:21

stevhliu force-pushed the video branch from 09d9bb4 to 75e6a50 Compare April 29, 2025 20:14

stevhliu force-pushed the video branch from ed7d298 to df85f84 Compare May 19, 2025 18:15

sayakpaul reviewed May 20, 2025

View reviewed changes

stevhliu added 11 commits May 20, 2025 09:16

initial

13c4d5f

update

50639f6

hunyuanvideo

0fe791e

ltx

26f0d19

fix

8db9073

wan

590c27a

gen guide

b403cf6

feedback

6e955ac

feedback

df13fcc

pipeline-level quant config

ec5594c

feedback

0d3f911

ltx

ddab9b4

stevhliu force-pushed the video branch from df85f84 to ddab9b4 Compare May 20, 2025 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Model cards #11112

[docs] Model cards #11112

stevhliu commented Mar 18, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 18, 2025

hlky commented Apr 3, 2025

sayakpaul left a comment

sayakpaul Apr 14, 2025

stevhliu Apr 23, 2025

sayakpaul Apr 28, 2025

stevhliu commented Apr 23, 2025

sayakpaul commented Apr 28, 2025 •

edited

Loading

stevhliu commented May 19, 2025

sayakpaul left a comment

sayakpaul May 20, 2025

stevhliu May 20, 2025

sayakpaul May 20, 2025

stevhliu May 20, 2025


		### Memory optimization
		- CogVideoX supports LoRAs with [`~loaders.CogVideoXLoraLoaderMixin.load_lora_weights`].


		Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`HunyuanVideoPipeline`] for inference with bitsandbytes.
		Compilation is slow the first time but subsequent calls to the pipeline are faster.

[docs] Model cards #11112

Are you sure you want to change the base?

[docs] Model cards #11112

Conversation

stevhliu commented Mar 18, 2025 • edited Loading

HuggingFaceDocBuilderDev commented Mar 18, 2025

hlky commented Apr 3, 2025

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul Apr 14, 2025

Choose a reason for hiding this comment

stevhliu Apr 23, 2025

Choose a reason for hiding this comment

sayakpaul Apr 28, 2025

Choose a reason for hiding this comment

stevhliu commented Apr 23, 2025

sayakpaul commented Apr 28, 2025 • edited Loading

stevhliu commented May 19, 2025

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul May 20, 2025

Choose a reason for hiding this comment

stevhliu May 20, 2025

Choose a reason for hiding this comment

sayakpaul May 20, 2025

Choose a reason for hiding this comment

stevhliu May 20, 2025

Choose a reason for hiding this comment

stevhliu commented Mar 18, 2025 •

edited

Loading

sayakpaul commented Apr 28, 2025 •

edited

Loading