Skip to content

max_length of completion exceed max_completion_length #4877

Open
@mungg

Description

@mungg

During GRPO training, I noticed something odd in the wandb logs:
train/completions/max_length sometimes exceeds the max_completion_length I set in the script (which is 512). Also completions/clipped_ratio stays at 0, even when completions clearly run past the limit.

Running the exact same code with Huggingface TRL, however:

train/completions/max_length never goes beyond max_completion_length, and
completions/clipped_ratio is non-zero, which makes sense because that metric should reflect the fraction of truncated completions.

MS-SWIFT GRPO training with max_completion_length=512
Image

HF TRL GRPO training with max_completion_length= 512
Image

Also performance of trained model using ms-swift is worse than TRL's one even with same hyper-parameters (not sure about reason...)


ms-swift == 3.6.0.dev (installing from July 7's git repo)
trl == 0.18.1


script I used:

CUDA_VISIBLE_DEVICES=3
NPROC_PER_NODE=1
swift rlhf
--rlhf_type grpo
--model Qwen/Qwen2.5-7B
--external_plugins PLUG_IN
--reward_funcs REWARD
--train_type full
--loss_type bnpo
--torch_dtype bfloat16
--dataset CUSTOM_DATASET
--max_length 512
--max_completion_length 512
--num_train_epochs 3
--seed 42
--per_device_train_batch_size 8
--per_device_eval_batch_size 8
--gradient_accumulation_steps 4
--learning_rate 1e-6
--lr_scheduler_type="constant_with_warmup"
--temperature 0.9
--warmup_ratio 0.05
--max_grad_norm 0.2
--save_strategy="steps"
--save_steps 250
--save_total_limit 20
--logging_steps 1
--output_dir output/BLEUBERI_1000/1gpu
--dataloader_num_workers 4
--num_generations 8
--system 'You are a helpful assistant.'
--deepspeed zero3_offload
--log_completions true
--report_to wandb
--num_iterations 1
--use_hf 1
--split_dataset_ratio 0
--weight_decay 0.0
--adam_beta2 0.999
--top_p 1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions