Skip to content

grpo训练qwen2.5 7B 100steps后性能直线下降 #3875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xxzhang0927 opened this issue Apr 15, 2025 · 1 comment
Closed

grpo训练qwen2.5 7B 100steps后性能直线下降 #3875

xxzhang0927 opened this issue Apr 15, 2025 · 1 comment
Labels
duplicate This issue or pull request already exists

Comments

@xxzhang0927
Copy link

Image
Image

训练脚本如下:
CUDA_VISIBLE_DEVICES=0,1,2,3,4
NPROC_PER_NODE=4
swift rlhf
--rlhf_type grpo
--model /models/qwen/qwen_v2_5_7b_chat
--model_type qwen2_5
--external_plugins /rlhf/plugin/plugin.py
--reward_funcs external_countdown format
--use_vllm true
--vllm_device auto
--vllm_gpu_memory_utilization 0.6
--vllm_max_model_len 1024
--train_type full
--torch_dtype bfloat16
--dataset '/rlhf/input/data.jsonl'
--max_length 2048
--max_completion_length 1024
--num_train_epochs 3
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--learning_rate 5e-7
--gradient_accumulation_steps 4
--eval_steps 100
--save_steps 100
--save_total_limit 5
--logging_steps 1
--output_dir output
--warmup_ratio 0.01
--dataloader_num_workers 4
--dataset_num_proc 4
--num_generations 4
--temperature 1.0
--system 'You are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer.'
--log_completions true
--deepspeed zero3
--num_iterations 1
--beta 0.001

数据集如下:

@hjh0119 hjh0119 added the duplicate This issue or pull request already exists label Apr 23, 2025
@hjh0119
Copy link
Collaborator

hjh0119 commented Apr 23, 2025

Closed due to duplicate issue. #3876

@hjh0119 hjh0119 closed this as completed Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants