Skip to content

Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sys-reasoner opened this issue Mar 20, 2025 · 7 comments
Closed

Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

sys-reasoner opened this issue Mar 20, 2025 · 7 comments

Comments

@sys-reasoner
Copy link

Hi guys,

I use 8 * A100 80g to implement qwen25vl 72B grpo training. But when starting, the whole process hang in the begining.

Image

Do you have any ideas to solve this problem?

Or any best practice about using qwen25vl 72B into grpo training?

Here is my sh command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=8
MAX_PIXELS=640000
swift rlhf
--rlhf_type grpo
--model /mnt2/models/Qwen__Qwen2.5-VL-72B-Instruct
--train_type lora
--dataset /ossfs/workspace/data_process/xx.json
--torch_dtype bfloat16
--num_train_epochs 1
--max_length 2048
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--eval_steps 1000
--save_steps 2000
--learning_rate 1e-6
--save_total_limit 2
--logging_steps 1
--output_dir /mnt2/xx
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 2048
--external_plugins examples/train/grpo/plugin/plugin.py
--reward_funcs external_ui_acc uiformat
--num_generations 4
--use_vllm true
--vllm_gpu_memory_utilization 0.3
--vllm_max_model_len 2048
--deepspeed zero3_offload
--temperature 1.1
--top_p 1.0
--top_k 80
--log_completions true
--num_infer_workers 8
--tensor_parallel_size 8
--async_generate false
--offload_optimizer true
--offload_model true
--gc_collect_after_offload true
--move_model_batches 16
--sleep_level 1
--report_to swanlab \

Here are my related libraries:
vllm 0.7.3
trl 0.16.0.dev0
transformers 4.49.0
torch 2.5.1+cu121
peft 0.14.0

@sys-reasoner
Copy link
Author

I use ms-swift main branch of Mar 17th.

@JoyTim-777
Copy link

JoyTim-777 commented Mar 24, 2025

+1

@sys-reasoner
Copy link
Author

@Jintao-Huang @hjh0119 Hi guys, have you ever tried Qwen25VL 72B GRPO training with lora? Could you please share any possible best practice?

The hanging problem is quite weird. I don't even modify the source code.

@hjh0119
Copy link
Collaborator

hjh0119 commented Mar 24, 2025

checking

@hjh0119
Copy link
Collaborator

hjh0119 commented Mar 27, 2025

the training script for VL72B is on the way

@hjh0119
Copy link
Collaborator

hjh0119 commented Mar 27, 2025

72B VL GRPO training scipt https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/lora_qwenvl72b.sh

@sys-reasoner
Copy link
Author

72B VL GRPO training scipt https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/lora_qwenvl72b.sh

I would try ASAP! Thank you for this great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants