max_length of completion exceed max_completion_length

During GRPO training, I noticed something odd in the wandb logs:
`train/completions/max_length` sometimes exceeds the `max_completion_length` I set in the script (which is 512). Also `completions/clipped_ratio` stays at 0, even when completions clearly run past the limit.

Running the exact same code with Huggingface TRL, however:

`train/completions/max_length` never goes beyond `max_completion_length`, and
`completions/clipped_ratio` is non-zero, which makes sense because that metric should reflect the fraction of truncated completions.

MS-SWIFT GRPO training with `max_completion_length`=512
<img width="1460" height="624" alt="Image" src="https://github.com/user-attachments/assets/e01f587a-cd06-4c22-8e4c-1cdc3cf63889" />
 
HF TRL GRPO training with `max_completion_length`= 512
<img width="976" height="319" alt="Image" src="https://github.com/user-attachments/assets/aed4c2d5-6b2c-4875-90a4-1f2f7d67a573" />

Also performance of trained model using ms-swift is worse than TRL's one even with same hyper-parameters (not sure about reason...) 

----
ms-swift == 3.6.0.dev (installing from July 7's git repo)
trl == 0.18.1 

----
script I used:

CUDA_VISIBLE_DEVICES=3 \
NPROC_PER_NODE=1 \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-7B \
    --external_plugins PLUG_IN \
    --reward_funcs REWARD \
    --train_type full \
    --loss_type bnpo \
    --torch_dtype bfloat16 \
    --dataset CUSTOM_DATASET  \
    --max_length 512 \
    --max_completion_length 512 \
    --num_train_epochs 3 \
    --seed 42 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 4 \
    --learning_rate 1e-6 \
    --lr_scheduler_type="constant_with_warmup" \
    --temperature 0.9 \
    --warmup_ratio 0.05 \
    --max_grad_norm 0.2 \
    --save_strategy="steps" \
    --save_steps 250 \
    --save_total_limit 20 \
    --logging_steps 1 \
    --output_dir output/BLEUBERI_1000/1gpu \
    --dataloader_num_workers 4 \
    --num_generations 8 \
    --system 'You are a helpful assistant.' \
    --deepspeed zero3_offload \
    --log_completions true \
    --report_to wandb \
    --num_iterations 1 \
    --use_hf 1 \
    --split_dataset_ratio 0 \
    --weight_decay 0.0 \
    --adam_beta2 0.999 \
    --top_p 1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

max_length of completion exceed max_completion_length #4877

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

max_length of completion exceed max_completion_length #4877

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions