Open
Description
Nowadays, the number of parameters in video generation models is increasing, and the video length is increasing. When training video models, it is difficult to fit a complete video sequence(200k~ tokens) on a single GPU. Some sequence parallel training technologies can solve this problem, such as the fastvideo training framework, but the imperfection of this framework makes it difficult to use. Can the diffusers framework support sequence parallel training?
Metadata
Metadata
Assignees
Labels
No labels