Skip to content

[WARNING:swift] No training was carried out, which may be due to the dataset being too small or incorrect usage of resume_from_checkpoint. #3863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Henchen99 opened this issue Apr 13, 2025 · 6 comments

Comments

@Henchen99
Copy link

Henchen99 commented Apr 13, 2025

I finished my GRPO training, however on the final epoch it gave me this message:

Train: 100%|█████████████████████████████████████| 811/812 [53:11:32<03:56, 236.12s/it]
[WARNING:swift] No training was carried out, which may be due to the dataset being too small or incorrect usage of resume_from_checkpoint.
[INFO:swift] End time of running main: 2025-04-13 06:05:03.922258

And so I did not get any trained checkpoints from this after such a long training time. I have been able to get checkpoints before, but I just changed some hyperparameters (data remains the same 16k size) and I do not use resume_from_checkpoint, so I am confused why this happened and how to fix.

My bash script is this, which I run with the command bash train_GRPO.sh:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 \
NPROC_PER_NODE=5 \
swift rlhf \
    --rlhf_type grpo \
    --model /vault/ultraz/open-r1/data/llama3-3b-lora-checkpoint_1023 \
    --model_type llama3_2 \
    --train_type lora \
    --dataset /vault/ultraz/unsloth_grpo/skythought_no_tokens.csv \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --eval_steps 2000 \
    --save_steps 2000 \
    --learning_rate 1e-6 \
    --save_total_limit 2 \
    --logging_steps 1 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 8 \
    --reward_funcs accuracy format cosine repetition \
    --num_generations 4 \
    --use_vllm true \
    --vllm_gpu_memory_utilization 0.9 \
    --vllm_max_model_len 5000 \
    --max_completion_length 5000 \
    --max_length 7000 \
    --num_infer_workers 2 \
    --deepspeed zero3 \
    --temperature 1.0 \
    --system examples/train/grpo/prompt.txt \
    --deepspeed zero2 \
    --log_completions true \

My library versions are:

vllm==0.8.3
ms_swift version: 3.3.0.dev0
@zhshj0110
Copy link

same problem

@hjh0119 hjh0119 mentioned this issue Apr 14, 2025
4 tasks
@TerenceXue-tech
Copy link

同样的问题,发生在Qwen2.5-32B

@QingKong-THR
Copy link

Same problem, help please. @Jintao-Huang

@dengdeng-cat
Copy link

同样的问题,发生在Qwen2.5-VL-7B

@Jintao-Huang
Copy link
Collaborator

GRPO训练嘛 还是其他的

@Jintao-Huang
Copy link
Collaborator

--dataloader_drop_last true

或者使用最新的代码试试

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants