读完第一个batch，更新参数时卡住

**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)

<img width="1072" alt="Image" src="https://github.com/user-attachments/assets/74654da3-5add-48b5-9751-4379801bffab" />
第一个batch读取推理结束后，会进入这个状态，查看显卡使用状况

<img width="718" alt="Image" src="https://github.com/user-attachments/assets/cba4d507-5dc9-44b9-bd76-83a3f94b8f71" />
发现利用率都是0
**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

bash如下
MAX_PIXELS=262144 \
MASTER_PORT=29600 \
NPROC_PER_NODE=6 \
swift rlhf \
    --rlhf_type grpo \
    --model /root/Qwen2.5vl-3B \
    --external_plugins /root/lml/wcq/ms-swift/examples/train/grpo/plugin/plugin.py \
    --reward_funcs external_r1v_acc format \
    --use_vllm true \
    --vllm_device auto \
    --vllm_gpu_memory_utilization 0.9 \
    --train_type full \
    --torch_dtype bfloat16 \
    --dataset 'lmms-lab/multimodal-open-r1-8k-verified' \
    --max_length 8192 \
    --max_completion_length 1024 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --learning_rate 1e-6 \
    --gradient_accumulation_steps 2 \
    --save_strategy 'steps' \
    --eval_strategy 'steps' \
    --eval_steps 400 \
    --save_steps 400 \
    --save_total_limit 10 \
    --logging_steps 1 \
    --output_dir output/GRPO_GEOQA \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --num_generations 2 \
    --temperature 1.0 \
    --repetition_penalty 1.1 \
    --system '/root/lml/wcq/ms-swift/examples/train/grpo/prompt.txt' \
    --deepspeed zero3 \
    --log_completions true \
    --num_iterations 2 \
    --num_infer_workers 2 \
    --async_generate false \
    --beta 0.001 \
    --max_grad_norm 0.5 \
**Additional context**
Add any other context about the problem here(在这里补充其他信息)
系统是8张3090

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

读完第一个batch，更新参数时卡住 #3809

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

读完第一个batch，更新参数时卡住 #3809

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions