We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am confused about the max_step in this work: In the GRPO training documentation at https://github.com/modelscope/ms-swift/blob/3a33b7df5b8bb26982ee4f6c65b5fd4fb6b1813c/docs/source/BestPractices/GRPO%E5%A4%9A%E6%A8%A1%E6%80%81%E8%AE%AD%E7%BB%83.md, the author mentions training on 8k data for 1200+ steps. However, I observed that only 1 epoch was trained, and the batch size was 6 * 8 * 8. Shouldn't the max_step be calculated as 8000 / (6 * 8 * 8)?
The text was updated successfully, but these errors were encountered:
Note: per_device_batch_size is completion-level, so when calculating the prompt-level batch_size, it needs to be divided by num_generations.
per_device_batch_size
num_generations
So based on the parameters of the script you mentioned:
total_prompt_data_size = 8000 * 8 (num_generations) / 8 (per_device_batch_size) / 6 (dp_size) * 0.99 (train_data_ratio) = 1320
total_prompt_data_size
dp_size
train_data_ratio
max_step =total_prompt_data_size / 2 (ga_steps) * 2 (num_iterations) = 1320
max_step
ga_steps
num_iterations
Sorry, something went wrong.
Hope your question has been resolved. If you have any further issues, feel free to reopen the issue.
Hello, thank you for your reply. The issue has been resolved.
No branches or pull requests
I am confused about the max_step in this work: In the GRPO training documentation at https://github.com/modelscope/ms-swift/blob/3a33b7df5b8bb26982ee4f6c65b5fd4fb6b1813c/docs/source/BestPractices/GRPO%E5%A4%9A%E6%A8%A1%E6%80%81%E8%AE%AD%E7%BB%83.md, the author mentions training on 8k data for 1200+ steps. However, I observed that only 1 epoch was trained, and the batch size was 6 * 8 * 8. Shouldn't the max_step be calculated as 8000 / (6 * 8 * 8)?
The text was updated successfully, but these errors were encountered: