Skip to content

fix bug: grpo train error for deepseek model #4833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 7, 2025

Conversation

aacedar
Copy link
Contributor

@aacedar aacedar commented Jul 4, 2025

PR type

use deepseek-ai/deepseek-coder-6.7b-base model run grpo algorithm, will find template_meta_prefix error: issue4808
we will find the error in swift/llm/template/base.py _swift_encode(),the concret code as following:

if self.template_meta.is_post_system or not system:

    prefix = template_meta.prefix

else:
    prefix = template_meta.system_prefix
self._concat_context_list(prefix, res_context_list, res_context_types, system=system)

PR information

debug the code will find the prefix value is [[32013]], the token of the prefix befor encode is ‘<|begin▁of▁sentence|>’.
the prompt join the prfix value and user input(str),will raise error, as the issue4808

solution:
we just modify the file swift/trainers/rlhf_trainer/grpo_trainer.py, as this pr:

test script as following, just replace your_own_path in --model, --output_dir, you can test:

    CUDA_VISIBLE_DEVICES=7 \
    swift rollout \
      --model your_own_path/deepseek-ai/deepseek-coder-6.7b-base \ 
     > logs/log-grpo-rollout.log 2>&1 &

    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
    NPROC_PER_NODE=6 \
    swift rlhf \
      --rlhf_type grpo \
      --model your_own_path/deepseek-ai/deepseek-coder-6.7b-base \
      --reward_funcs accuracy \
      --use_vllm true \
      --vllm_mode server \
      --vllm_server_host 127.0.0.1 \
      --vllm_server_port 8000 \
      --train_type full \
      --torch_dtype bfloat16 \
      --dataset AI-MO/NuminaMath-TIR#1000 \
      --split_dataset_ratio 0 \
      --max_completion_length 512 \
      --num_train_epochs 1 \
      --per_device_train_batch_size 2 \
      --learning_rate 1e-6 \
      --gradient_accumulation_steps 2 \
      --save_total_limit 2 \
      --logging_steps 1 \
      --deepspeed zero2 \
      --max_length 4096 \
      --warmup_ratio 0.05 \
      --dataloader_num_workers 2 \
      --dataset_num_proc 4 \
      --num_generations 6 \
      --temperature 0.9 \
      --top_p 0.9 \
      --top_k 50 \
      --log_completions true \
      --num_iterations 1 \
      --beta 0.01 \
      --output_dir ./ds-base-grpo \
      --report_to tensorboard \
      > logs/log-grpo-test.log 2>&1 &

@aacedar aacedar changed the title Aacedar patch 4 fix bug: grpo train error for deepseek model Jul 4, 2025
@hjh0119
Copy link
Collaborator

hjh0119 commented Jul 5, 2025

Please do not use Chinese comments.

@aacedar
Copy link
Contributor Author

aacedar commented Jul 6, 2025

Please do not use Chinese comments.

has updated

@hjh0119
Copy link
Collaborator

hjh0119 commented Jul 6, 2025

plz resolve the conflicts

@aacedar
Copy link
Contributor Author

aacedar commented Jul 7, 2025

plz resolve the conflicts

done

@hjh0119 hjh0119 merged commit b059291 into modelscope:main Jul 7, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants