-
Notifications
You must be signed in to change notification settings - Fork 637
关于qLoRA训练 #4007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
下面是一个qlora+grpo的例子, CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
--rlhf_type grpo \
--model Qwen/Qwen2.5-7B \
--reward_funcs accuracy format \
--train_type lora \
--bnb_4bit_compute_dtype bfloat16 \
--bnb_4bit_quant_type nf4 \
--bnb_4bit_use_double_quant true \
--quant_method bnb \
--quant_bits 4 \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--torch_dtype bfloat16 \
--dataset 'AI-MO/NuminaMath-TIR#1000' \
--max_completion_length 1024 \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--learning_rate 1e-5 \
--gradient_accumulation_steps 1 \
--eval_steps 100 \
--save_steps 100 \
--save_total_limit 2 \
--logging_steps 5 \
--max_length 2048 \
--output_dir output \
--warmup_ratio 0.05 \
--dataloader_num_workers 4 \
--dataset_num_proc 4 \
--num_generations 4 \
--temperature 0.9 \
--system 'examples/train/grpo/prompt.txt' \
--log_completions true |
好的,我的量化模型是awq模型,我看对应的示例脚本基本上没有加什么特殊的参数 |
是的 |
是否和deepspeed zero3 冲突?我同时用qlora和zero3_offload会报错:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
看了一下训练列表中,强化学习算法(如GRPO)是支持QLORA这些高效参数微调方法的。请问具体是怎么实现的呢,有没有训练脚本的例子可以提供一下
The text was updated successfully, but these errors were encountered: