Skip to content

对于一个已经完成sft之后的任务,如果我想加入新的知识但不想掉点,我应该选择ms-swift实现的强化微调和GRPO哪个来完成呢? #4107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1212wuhu opened this issue May 7, 2025 · 0 comments

Comments

@1212wuhu
Copy link

1212wuhu commented May 7, 2025

我已经完成了一些基础sft的任务,现在我想让我的模型在新任务上表现的更好,同时不损失之前的知识。
我注意到ms-swift中提供了rft的脚本,并作出了相关说明,但是文档描述中好像grpo也是强化微调的一部分?
那我是使用强化微调还是直接使用grpo呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant