We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2.5-7B-Ins 的DPO训练需要开zero3才能训练,而SFT只需要开zero1就可以训练,并且计算reference model logits似乎都在主卡上推理,导致进一步效率降低?
运行命令如下: deepspeed --hostfile=/etc/mpi/hostfile swift/cli/rlhf.py --rlhf_type dpo --model $PRETRAIN_MODEL --torch_dtype bfloat16 --train_type full --use_chat_template --dataset $data_path --num_train_epochs 2 --per_device_train_batch_size 1 --weight_decay 0.1 --learning_rate 5e-7 --deepspeed zero3 --gradient_accumulation_steps 4 --warmup_ratio 0.01 --dataset_num_proc 16 --system "You are Qwen, created by Alibaba Cloud. You are a helpful assistant." --save_total_limit 5 --save_strategy epoch --attn_impl flash_attn --max_length 4096 --save_on_each_node False --truncation_strategy delete --split_dataset_ratio 0 --eval_strategy no --output_dir $output_dir --lazy_tokenize true --use_liger_kernel true --rpo_alpha 0.1 --use_hf
训练资源:32H800 主节点显存资源:
有什么能提升DPO训练效率的配置吗?十分感谢!
The text was updated successfully, but these errors were encountered:
试试--deepspeed zero2
--deepspeed zero2
Sorry, something went wrong.
会直接oom,试过了。 最新训练状态zero3 oom,大佬能不能帮忙看一下?
No branches or pull requests
Qwen2.5-7B-Ins 的DPO训练需要开zero3才能训练,而SFT只需要开zero1就可以训练,并且计算reference model logits似乎都在主卡上推理,导致进一步效率降低?
运行命令如下:
deepspeed --hostfile=/etc/mpi/hostfile
swift/cli/rlhf.py
--rlhf_type dpo
--model $PRETRAIN_MODEL
--torch_dtype bfloat16
--train_type full
--use_chat_template
--dataset $data_path
--num_train_epochs 2
--per_device_train_batch_size 1
--weight_decay 0.1
--learning_rate 5e-7
--deepspeed zero3
--gradient_accumulation_steps 4
--warmup_ratio 0.01
--dataset_num_proc 16
--system "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."
--save_total_limit 5
--save_strategy epoch
--attn_impl flash_attn
--max_length 4096
--save_on_each_node False
--truncation_strategy delete
--split_dataset_ratio 0
--eval_strategy no
--output_dir $output_dir
--lazy_tokenize true
--use_liger_kernel true
--rpo_alpha 0.1
--use_hf
训练资源:32H800
主节点显存资源:
有什么能提升DPO训练效率的配置吗?十分感谢!
The text was updated successfully, but these errors were encountered: