You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GRPO uses a single(float) reward (i.e., ORM) as the reward for the entire completion. However, you can use PRM internally to process the completion and eventually integrate it into a single reward.
目前我看grpo直接可以指定orm python文件导入自定义奖励,prm也是一样么
The text was updated successfully, but these errors were encountered: