Skip to content

读完第一个batch,更新参数时卡住 #3809

Open
@wcq1744352243

Description

@wcq1744352243

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

Image 第一个batch读取推理结束后,会进入这个状态,查看显卡使用状况 Image 发现利用率都是0 **Your hardware and system info** Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

bash如下
MAX_PIXELS=262144
MASTER_PORT=29600
NPROC_PER_NODE=6
swift rlhf
--rlhf_type grpo
--model /root/Qwen2.5vl-3B
--external_plugins /root/lml/wcq/ms-swift/examples/train/grpo/plugin/plugin.py
--reward_funcs external_r1v_acc format
--use_vllm true
--vllm_device auto
--vllm_gpu_memory_utilization 0.9
--train_type full
--torch_dtype bfloat16
--dataset 'lmms-lab/multimodal-open-r1-8k-verified'
--max_length 8192
--max_completion_length 1024
--num_train_epochs 1
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--learning_rate 1e-6
--gradient_accumulation_steps 2
--save_strategy 'steps'
--eval_strategy 'steps'
--eval_steps 400
--save_steps 400
--save_total_limit 10
--logging_steps 1
--output_dir output/GRPO_GEOQA
--warmup_ratio 0.05
--dataloader_num_workers 4
--num_generations 2
--temperature 1.0
--repetition_penalty 1.1
--system '/root/lml/wcq/ms-swift/examples/train/grpo/prompt.txt'
--deepspeed zero3
--log_completions true
--num_iterations 2
--num_infer_workers 2
--async_generate false
--beta 0.001
--max_grad_norm 0.5
Additional context
Add any other context about the problem here(在这里补充其他信息)
系统是8张3090

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions