Closed
Description
I finished my GRPO training, however on the final epoch it gave me this message:
Train: 100%|█████████████████████████████████████| 811/812 [53:11:32<03:56, 236.12s/it]
[WARNING:swift] No training was carried out, which may be due to the dataset being too small or incorrect usage of resume_from_checkpoint.
[INFO:swift] End time of running main: 2025-04-13 06:05:03.922258
And so I did not get any trained checkpoints from this after such a long training time. I have been able to get checkpoints before, but I just changed some hyperparameters (data remains the same 16k size) and I do not use resume_from_checkpoint
, so I am confused why this happened and how to fix.
My bash script is this, which I run with the command bash train_GRPO.sh
:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 \
NPROC_PER_NODE=5 \
swift rlhf \
--rlhf_type grpo \
--model /vault/ultraz/open-r1/data/llama3-3b-lora-checkpoint_1023 \
--model_type llama3_2 \
--train_type lora \
--dataset /vault/ultraz/unsloth_grpo/skythought_no_tokens.csv \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--eval_steps 2000 \
--save_steps 2000 \
--learning_rate 1e-6 \
--save_total_limit 2 \
--logging_steps 1 \
--output_dir output \
--warmup_ratio 0.05 \
--dataloader_num_workers 8 \
--reward_funcs accuracy format cosine repetition \
--num_generations 4 \
--use_vllm true \
--vllm_gpu_memory_utilization 0.9 \
--vllm_max_model_len 5000 \
--max_completion_length 5000 \
--max_length 7000 \
--num_infer_workers 2 \
--deepspeed zero3 \
--temperature 1.0 \
--system examples/train/grpo/prompt.txt \
--deepspeed zero2 \
--log_completions true \
My library versions are:
vllm==0.8.3
ms_swift version: 3.3.0.dev0
Metadata
Metadata
Assignees
Labels
No labels