DPO训练效率很低 #4146

leileilin · 2025-05-09T06:49:25Z

Qwen2.5-7B-Ins 的DPO训练需要开zero3才能训练，而SFT只需要开zero1就可以训练，并且计算reference model logits似乎都在主卡上推理，导致进一步效率降低？

运行命令如下：
deepspeed --hostfile=/etc/mpi/hostfile
swift/cli/rlhf.py
--rlhf_type dpo
--model $PRETRAIN_MODEL
--torch_dtype bfloat16
--train_type full
--use_chat_template
--dataset $data_path
--num_train_epochs 2
--per_device_train_batch_size 1
--weight_decay 0.1
--learning_rate 5e-7
--deepspeed zero3
--gradient_accumulation_steps 4
--warmup_ratio 0.01
--dataset_num_proc 16
--system "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."
--save_total_limit 5
--save_strategy epoch
--attn_impl flash_attn
--max_length 4096
--save_on_each_node False
--truncation_strategy delete
--split_dataset_ratio 0
--eval_strategy no
--output_dir $output_dir
--lazy_tokenize true
--use_liger_kernel true
--rpo_alpha 0.1
--use_hf

训练资源：32H800
主节点显存资源：

有什么能提升DPO训练效率的配置吗？十分感谢！

Jintao-Huang · 2025-05-09T08:09:58Z

试试--deepspeed zero2

leileilin · 2025-05-09T09:06:45Z

试试--deepspeed zero2

会直接oom，试过了。
最新训练状态zero3 oom，大佬能不能帮忙看一下？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO训练效率很低 #4146

DPO训练效率很低 #4146

leileilin commented May 9, 2025 •

edited

Loading

Jintao-Huang commented May 9, 2025

leileilin commented May 9, 2025 •

edited

Loading

DPO训练效率很低 #4146

DPO训练效率很低 #4146

Comments

leileilin commented May 9, 2025 • edited Loading

Jintao-Huang commented May 9, 2025

leileilin commented May 9, 2025 • edited Loading

leileilin commented May 9, 2025 •

edited

Loading

leileilin commented May 9, 2025 •

edited

Loading