Skip to content

关于qLoRA训练 #4007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Mrkkew opened this issue Apr 27, 2025 · 5 comments
Open

关于qLoRA训练 #4007

Mrkkew opened this issue Apr 27, 2025 · 5 comments

Comments

@Mrkkew
Copy link

Mrkkew commented Apr 27, 2025

看了一下训练列表中,强化学习算法(如GRPO)是支持QLORA这些高效参数微调方法的。请问具体是怎么实现的呢,有没有训练脚本的例子可以提供一下

@slin000111
Copy link
Collaborator

参考这两处的例子,https://github.com/modelscope/ms-swift/tree/main/examples/train/grpo,,https://github.com/modelscope/ms-swift/blob/main/examples/train/qlora/bnb.sh

下面是一个qlora+grpo的例子,

CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-7B \
    --reward_funcs accuracy format \
    --train_type lora \
    --bnb_4bit_compute_dtype bfloat16 \
    --bnb_4bit_quant_type nf4 \
    --bnb_4bit_use_double_quant true \
    --quant_method bnb \
    --quant_bits 4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --torch_dtype bfloat16 \
    --dataset 'AI-MO/NuminaMath-TIR#1000' \
    --max_completion_length 1024 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --learning_rate 1e-5 \
    --gradient_accumulation_steps 1 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --dataset_num_proc 4 \
    --num_generations 4 \
    --temperature 0.9 \
    --system 'examples/train/grpo/prompt.txt' \
    --log_completions true

@Mrkkew
Copy link
Author

Mrkkew commented Apr 28, 2025

好的,我的量化模型是awq模型,我看对应的示例脚本基本上没有加什么特殊的参数

@Mrkkew
Copy link
Author

Mrkkew commented Apr 28, 2025

@slin000111
Copy link
Collaborator

好的,我的量化模型是awq模型,我看对应的示例脚本基本上没有加什么特殊的参数

是的

@skepsun
Copy link

skepsun commented May 7, 2025

是否和deepspeed zero3 冲突?我同时用qlora和zero3_offload会报错:

output tensor must have the same type as input tensor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants