A collection of paper/projects that trains flow matching model/policies via RL. We will focus on the application to CV/Robotics/NLP. The list will be updated on a regular basis.
Please give it a star ⭐ if you like this project!
Contributors: Tonghe Zhang, Kang Chen, Zeyue Xue, Yixiao Huang, Feng Chen
| Method | Paper | Code | Website | W&B | Domain | On/Offline | On/Off-policy | Pre-train/Fine-tune |
|---|---|---|---|---|---|---|---|---|
| FQL | arXiv | GitHub | Link | N/A | Robotics | Off2On | Off-policy | Pre-train + Fine-tune |
| ReinFlow | arXiv | GitHub | Link | Link | Robotics | Online | On-policy | Fine-tune |
| FPO | arXiv | GitHub | Link | N/A | Robotics | Online | On-policy | Pre-train |
| DSRL | arXiv | GitHub | Link | N/A | Robotics | Online | Off-policy | Fine-tune |
| FlowRL | arXiv | GitHub | N/A | N/A | Robotics | Online | Off-policy | Fine-tune |
| SAC Flow | arXiv | GitHub | Link | N/A | Robotics | Online | Off-policy | Fine-tune |
| Flow-GRPO | arXiv | GitHub | Link | N/A | CV | Online | On-policy | Fine-tune |
| DanceGRPO | arXiv | GitHub | Link | N/A | CV | Online | On-policy | Fine-tune |
| Mix-GRPO | arXiv | GitHub | Link | N/A | CV | Online | On-policy | Fine-tune |
| TempFlow-GRPO | arXiv | GitHub | Link | N/A | CV | Online | On-policy | Fine-tune |
| Pref-GRPO | arXiv | GitHub | Link | N/A | CV | Online | On-policy | Fine-tune |
| Flow-CPS | arXiv | GitHub | Link | N/A | CV | Online | On-policy | Fine-tune |
| BranchGRPO | arXiv | GitHub | Link | N/A | CV | Online | On-policy | Fine-tune |
| DiffusionNet | arXiv | GitHub | Link | N/A | CV | Online | On-policy | Fine-tune |
| DSRL-pi0 | arXiv | GitHub | N/A | N/A | Robotics | Both | Off-policy | Fine-tune |
| FPMD | arXiv | N/A | N/A | N/A | Robotics | Online | Off-policy | Pre-train |
| RLFM | arXiv | GitHub | N/A | N/A | Robotics | Online | On-policy | Fine-tune |
| GPPO | arXiv | GitHub | N/A | N/A | NLP | Online | Off-policy | Fine-tune |