Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

sys-reasoner · 2025-03-20T09:38:17Z

Hi guys,

I use 8 * A100 80g to implement qwen25vl 72B grpo training. But when starting, the whole process hang in the begining.

Do you have any ideas to solve this problem?

Or any best practice about using qwen25vl 72B into grpo training?

Here is my sh command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=8
MAX_PIXELS=640000
swift rlhf
--rlhf_type grpo
--model /mnt2/models/Qwen__Qwen2.5-VL-72B-Instruct
--train_type lora
--dataset /ossfs/workspace/data_process/xx.json
--torch_dtype bfloat16
--num_train_epochs 1
--max_length 2048
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--eval_steps 1000
--save_steps 2000
--learning_rate 1e-6
--save_total_limit 2
--logging_steps 1
--output_dir /mnt2/xx
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 2048
--external_plugins examples/train/grpo/plugin/plugin.py
--reward_funcs external_ui_acc uiformat
--num_generations 4
--use_vllm true
--vllm_gpu_memory_utilization 0.3
--vllm_max_model_len 2048
--deepspeed zero3_offload
--temperature 1.1
--top_p 1.0
--top_k 80
--log_completions true
--num_infer_workers 8
--tensor_parallel_size 8
--async_generate false
--offload_optimizer true
--offload_model true
--gc_collect_after_offload true
--move_model_batches 16
--sleep_level 1
--report_to swanlab \

Here are my related libraries:
vllm 0.7.3
trl 0.16.0.dev0
transformers 4.49.0
torch 2.5.1+cu121
peft 0.14.0

sys-reasoner · 2025-03-20T09:48:55Z

I use ms-swift main branch of Mar 17th.

JoyTim-777 · 2025-03-24T07:28:50Z

+1

sys-reasoner · 2025-03-24T11:49:25Z

@Jintao-Huang @hjh0119 Hi guys, have you ever tried Qwen25VL 72B GRPO training with lora? Could you please share any possible best practice?

The hanging problem is quite weird. I don't even modify the source code.

hjh0119 · 2025-03-24T15:10:40Z

checking

hjh0119 · 2025-03-27T03:20:08Z

the training script for VL72B is on the way

hjh0119 · 2025-03-27T16:23:29Z

72B VL GRPO training scipt https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/lora_qwenvl72b.sh

sys-reasoner · 2025-03-28T02:38:51Z

72B VL GRPO training scipt https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/lora_qwenvl72b.sh

I would try ASAP! Thank you for this great job!

sys-reasoner closed this as completed Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

sys-reasoner commented Mar 20, 2025

sys-reasoner commented Mar 20, 2025

JoyTim-777 commented Mar 24, 2025 •

edited

Loading

sys-reasoner commented Mar 24, 2025

hjh0119 commented Mar 24, 2025

hjh0119 commented Mar 27, 2025

hjh0119 commented Mar 27, 2025

sys-reasoner commented Mar 28, 2025

Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

Comments

sys-reasoner commented Mar 20, 2025

sys-reasoner commented Mar 20, 2025

JoyTim-777 commented Mar 24, 2025 • edited Loading

sys-reasoner commented Mar 24, 2025

hjh0119 commented Mar 24, 2025

hjh0119 commented Mar 27, 2025

hjh0119 commented Mar 27, 2025

sys-reasoner commented Mar 28, 2025

JoyTim-777 commented Mar 24, 2025 •

edited

Loading