fix bug: grpo train error for deepseek model #4833

aacedar · 2025-07-04T09:29:43Z

PR type

use deepseek-ai/deepseek-coder-6.7b-base model run grpo algorithm， will find template_meta_prefix error： issue4808
we will find the error in swift/llm/template/base.py _swift_encode()，the concret code as following：

if self.template_meta.is_post_system or not system:

    prefix = template_meta.prefix

else:
    prefix = template_meta.system_prefix
self._concat_context_list(prefix, res_context_list, res_context_types, system=system)

PR information

debug the code will find the prefix value is [[32013]], the token of the prefix befor encode is ‘<｜begin▁of▁sentence｜>’.
the prompt join the prfix value and user input(str)，will raise error, as the issue4808

solution:
we just modify the file swift/trainers/rlhf_trainer/grpo_trainer.py, as this pr:

test script as following, just replace your_own_path in --model, --output_dir, you can test：

    CUDA_VISIBLE_DEVICES=7 \
    swift rollout \
      --model your_own_path/deepseek-ai/deepseek-coder-6.7b-base \ 
     > logs/log-grpo-rollout.log 2>&1 &

    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
    NPROC_PER_NODE=6 \
    swift rlhf \
      --rlhf_type grpo \
      --model your_own_path/deepseek-ai/deepseek-coder-6.7b-base \
      --reward_funcs accuracy \
      --use_vllm true \
      --vllm_mode server \
      --vllm_server_host 127.0.0.1 \
      --vllm_server_port 8000 \
      --train_type full \
      --torch_dtype bfloat16 \
      --dataset AI-MO/NuminaMath-TIR#1000 \
      --split_dataset_ratio 0 \
      --max_completion_length 512 \
      --num_train_epochs 1 \
      --per_device_train_batch_size 2 \
      --learning_rate 1e-6 \
      --gradient_accumulation_steps 2 \
      --save_total_limit 2 \
      --logging_steps 1 \
      --deepspeed zero2 \
      --max_length 4096 \
      --warmup_ratio 0.05 \
      --dataloader_num_workers 2 \
      --dataset_num_proc 4 \
      --num_generations 6 \
      --temperature 0.9 \
      --top_p 0.9 \
      --top_k 50 \
      --log_completions true \
      --num_iterations 1 \
      --beta 0.01 \
      --output_dir ./ds-base-grpo \
      --report_to tensorboard \
      > logs/log-grpo-test.log 2>&1 &

Update template_meta.prefix bug

hjh0119 · 2025-07-05T03:23:25Z

Please do not use Chinese comments.

aacedar · 2025-07-06T07:33:06Z

Please do not use Chinese comments.

has updated

hjh0119 · 2025-07-06T10:44:35Z

plz resolve the conflicts

aacedar · 2025-07-07T02:10:13Z

plz resolve the conflicts

done

aacedar added 4 commits July 2, 2025 21:52

Update bos_token bug

1f76604

Merge pull request #1 from aacedar/aacedar-patch-2

d7e81cf

Update template_meta.prefix bug

Update grpo_trainer.py for special token encode error for deepseek model

7290409

Update grpo_trainer.py for special token encode error for deepseek model

db23796

aacedar changed the title ~~Aacedar patch 4~~ fix bug: grpo train error for deepseek model Jul 4, 2025

aacedar mentioned this pull request Jul 5, 2025

grpo can not support deepseek-6.7b-base model #4785

Closed

Update grpo_trainer.py

6f97b10

Merge branch 'main' into aacedar-patch-4

9bc8352

hjh0119 approved these changes Jul 7, 2025

View reviewed changes

hjh0119 merged commit b059291 into modelscope:main Jul 7, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix bug: grpo train error for deepseek model #4833

fix bug: grpo train error for deepseek model #4833

Uh oh!

aacedar commented Jul 4, 2025 •

edited

Loading

Uh oh!

hjh0119 commented Jul 5, 2025

Uh oh!

aacedar commented Jul 6, 2025

Uh oh!

hjh0119 commented Jul 6, 2025

Uh oh!

aacedar commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

fix bug: grpo train error for deepseek model #4833

fix bug: grpo train error for deepseek model #4833

Uh oh!

Conversation

aacedar commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Uh oh!

hjh0119 commented Jul 5, 2025

Uh oh!

aacedar commented Jul 6, 2025

Uh oh!

hjh0119 commented Jul 6, 2025

Uh oh!

aacedar commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

aacedar commented Jul 4, 2025 •

edited

Loading