GRPO 训练100 steps后性能骤降，请问是什么原因 #3876

xxzhang0927 · 2025-04-15T01:35:48Z

训练脚本如下：
CUDA_VISIBLE_DEVICES=0,1,2,3,4
NPROC_PER_NODE=4
swift rlhf
--rlhf_type grpo
--model /models/qwen/qwen_v2_5_7b_chat
--model_type qwen2_5
--external_plugins /rlhf/plugin/plugin.py
--reward_funcs external_countdown format
--use_vllm true
--vllm_device auto
--vllm_gpu_memory_utilization 0.6
--vllm_max_model_len 1024
--train_type full
--torch_dtype bfloat16
--dataset '/rlhf/input/data.jsonl'
--max_length 2048
--max_completion_length 1024
--num_train_epochs 3
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--learning_rate 5e-7
--gradient_accumulation_steps 4
--eval_steps 100
--save_steps 100
--save_total_limit 5
--logging_steps 1
--output_dir output
--warmup_ratio 0.01
--dataloader_num_workers 4
--dataset_num_proc 4
--num_generations 4
--temperature 1.0
--system 'You are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer.'
--log_completions true
--deepspeed zero3
--num_iterations 1
--beta 0.001

hjh0119 · 2025-04-23T06:02:03Z

It might be due to excessively long generated sequences. You can monitor the completion_length in the logs and use the --overlong_filter parameter.

hjh0119 mentioned this issue Apr 23, 2025

grpo训练qwen2.5 7B 100steps后性能直线下降 #3875

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO 训练100 steps后性能骤降，请问是什么原因 #3876

GRPO 训练100 steps后性能骤降，请问是什么原因 #3876

xxzhang0927 commented Apr 15, 2025

hjh0119 commented Apr 23, 2025

GRPO 训练100 steps后性能骤降，请问是什么原因 #3876

GRPO 训练100 steps后性能骤降，请问是什么原因 #3876

Comments

xxzhang0927 commented Apr 15, 2025

hjh0119 commented Apr 23, 2025