Skip to content

使用qwen2.5-vl训练一个reward model,怎么在命令行中设置RewardConfig/RewardTrainer中需要的参数? #3916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Rexwangchao opened this issue Apr 17, 2025 · 3 comments

Comments

@Rexwangchao
Copy link

Describe the bug
RewardTrainer中是支持设置coeffi参数的
Image
但是如何通过swift rlhf --rlhf_type rm ...命令控制这个参数的传入呢?
如果不设置的话,从args.json文件可以看到
json { ..., "training_args": "RewardConfig(...,center_rewards_coefficient=None,...) }

@hjh0119
Copy link
Collaborator

hjh0119 commented Apr 17, 2025

#3917 You can now pass it using --center_rewards_coefficient.

@hjh0119 hjh0119 closed this as completed Apr 17, 2025
@Rexwangchao
Copy link
Author

还是不行的。

Image
ps,我用的版本是

Image

@Rexwangchao
Copy link
Author

不好意思,没注意到你已经fix过了,我更新版本试试 @hjh0119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants