Skip to content

DPO训练效率很低 #4146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
leileilin opened this issue May 9, 2025 · 2 comments
Open

DPO训练效率很低 #4146

leileilin opened this issue May 9, 2025 · 2 comments

Comments

@leileilin
Copy link

leileilin commented May 9, 2025

Qwen2.5-7B-Ins 的DPO训练需要开zero3才能训练,而SFT只需要开zero1就可以训练,并且计算reference model logits似乎都在主卡上推理,导致进一步效率降低?

运行命令如下:
deepspeed --hostfile=/etc/mpi/hostfile
swift/cli/rlhf.py
--rlhf_type dpo
--model $PRETRAIN_MODEL
--torch_dtype bfloat16
--train_type full
--use_chat_template
--dataset $data_path
--num_train_epochs 2
--per_device_train_batch_size 1
--weight_decay 0.1
--learning_rate 5e-7
--deepspeed zero3
--gradient_accumulation_steps 4
--warmup_ratio 0.01
--dataset_num_proc 16
--system "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."
--save_total_limit 5
--save_strategy epoch
--attn_impl flash_attn
--max_length 4096
--save_on_each_node False
--truncation_strategy delete
--split_dataset_ratio 0
--eval_strategy no
--output_dir $output_dir
--lazy_tokenize true
--use_liger_kernel true
--rpo_alpha 0.1
--use_hf

训练资源:32H800
主节点显存资源:

Image

有什么能提升DPO训练效率的配置吗?十分感谢!

@Jintao-Huang
Copy link
Collaborator

试试--deepspeed zero2

@leileilin
Copy link
Author

leileilin commented May 9, 2025

试试--deepspeed zero2

会直接oom,试过了。
最新训练状态zero3 oom,大佬能不能帮忙看一下?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants