DanceGRPO is the first unified RL-based framework for visual generation.
This is the official implementation for paper, DanceGRPO: Unleashing GRPO on Visual Generation. We develop DanceGRPO based on FastVideo, a scalable and efficient framework for video and image generation.
DanceGRPO has the following features:
- Support Stable Diffusion
- Support FLUX
- Support HunyuanVideo
You should use "mkdir" for these folders first.
For image generation,
- Download the Stable Diffusion v1.4 checkpoints from here to
"./data/stable-diffusion-v1-4". - Download the FLUX checkpoints from here to
"./data/flux". - Download the HPS-v2.1 checkpoint (HPS_v2.1_compressed.pt) from here to
"./hps_ckpt". - Download the CLIP H-14 checkpoint (open_clip_pytorch_model.bin) from here to
"./hps_ckpt".
For video generation,
- Download the HunyuanVideo checkpoints from here to
"./data/HunyuanVideo". - Download the Qwen2-VL-2B-Instruct checkpoints from here to
"./Qwen2-VL-2B-Instruct". - Download the VideoAlign checkpoints from here to
"./videoalign_ckpt".
./env_setup.sh fastvideo# for Stable Diffusion, with 8 H800s
bash scripts/finetune/finetune_sd_grpo.sh # for FLUX, preprocessing with 8 H800s
bash scripts/preprocess/preprocess_flux_rl_embeddings.sh
# for FLUX, training with 16 H800s
bash scripts/finetune/finetune_flux_grpo.sh For image generation open-source version, we use the prompts in HPD dataset for training, as shown in "./prompts.txt".
# for HunyuanVideo, preprocessing with 8 H800s
bash scripts/preprocess/preprocess_hunyuan_rl_embeddings.sh
# for HunyuanVideo, training with 16/32 H800s
bash scripts/finetune/finetune_hunyuan_grpo.sh For video generation open-source version, we filter the prompts from VidProM dataset for training, as shown in "./video_prompts.txt".
We give the (moving average) reward curves (also the results in reward.txt or hps_reward.txt) of Stable Diffusion (left or upper) and FLUX (right or lower). We can complete the FLUX training (200 iterations) within 12 hours with 16 H800s.
- We provide more visualization examples (base, 80 iters rlhf, 160 iters rlhf) in
"./assets/flux_visualization". We always use larger resolutions and more sampling steps than RLHF training for visualization, because we use lower resolutions and less sampling steps for speeding up the RLHF training. - Here is the visualization script
"./scripts/visualization/vis_flux.py"for FLUX. First, runrm -rf ./data/flux/transformer/*to clear the directory, then copy the files from a trained checkpoint (e.g.,checkpoint-160-0) into./data/flux/transformer. After that, you can run the visualization. If it's trained for 160 iterations, the results are already provided in my repo. - We don't recommend using 8 H800s for the FLUX training script, because we find a global prompt batch size of 8 is not enough.
- More discussion on FLUX can be found in
"./fastvideo/README.md".
We give (moving average) reward curves (also the results in vq_reward.txt) of HunyuanVideo with 16/32 H800s.
With 16 H800s,
With 32 H800s,
- For open-source version, our mission is to reduce the training cost. So we reduce the number of frames, sampling steps, and GPUs compared with the settings in paper. So the reward curves will be different, but the VQ improvements are similar (50%~60%).
- For visualization, run
rm -rf ./data/HunyuanVideo/transformer/*to clear the directory, then copy the files from a trained checkpoint (e.g.,checkpoint-100-0) into./data/HunyuanVideo/transformer. After that, you can run the visualization script"./scripts/visualization/vis_hunyuanvideo.sh". - Although training with 16 H800s has similar rewards with 32 H800s, but I still find that 32 H800s leads to better visulization results.
- We plot the rewards by de-normalizing, with the formula VQ = VQ * 2.2476 + 3.6757 by following here.
The Multi-reward training code and reward curves can be find here.
We learned and reused code from the following projects:
If you use DanceGRPO for your research, please cite our paper:
@article{xue2025dancegrpo,
title={DanceGRPO: Unleashing GRPO on Visual Generation},
author={Xue, Zeyue and Wu, Jie and Gao, Yu and Kong, Fangyuan and Zhu, Lingting and Chen, Mengzhao and Liu, Zhiheng and Liu, Wei and Guo, Qiushan and Huang, Weilin and others},
journal={arXiv preprint arXiv:2505.07818},
year={2025}
}


