关于qLoRA训练 #4007

Mrkkew · 2025-04-27T04:00:21Z

看了一下训练列表中，强化学习算法（如GRPO）是支持QLORA这些高效参数微调方法的。请问具体是怎么实现的呢，有没有训练脚本的例子可以提供一下

slin000111 · 2025-04-27T10:28:12Z

参考这两处的例子，https://github.com/modelscope/ms-swift/tree/main/examples/train/grpo,，https://github.com/modelscope/ms-swift/blob/main/examples/train/qlora/bnb.sh

下面是一个qlora+grpo的例子，

CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-7B \
    --reward_funcs accuracy format \
    --train_type lora \
    --bnb_4bit_compute_dtype bfloat16 \
    --bnb_4bit_quant_type nf4 \
    --bnb_4bit_use_double_quant true \
    --quant_method bnb \
    --quant_bits 4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --torch_dtype bfloat16 \
    --dataset 'AI-MO/NuminaMath-TIR#1000' \
    --max_completion_length 1024 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --learning_rate 1e-5 \
    --gradient_accumulation_steps 1 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --dataset_num_proc 4 \
    --num_generations 4 \
    --temperature 0.9 \
    --system 'examples/train/grpo/prompt.txt' \
    --log_completions true

Mrkkew · 2025-04-28T01:59:17Z

好的，我的量化模型是awq模型，我看对应的示例脚本基本上没有加什么特殊的参数

Mrkkew · 2025-04-28T01:59:27Z

https://github.com/modelscope/ms-swift/blob/main/examples/train/qlora/awq.sh

slin000111 · 2025-04-28T03:26:36Z

好的，我的量化模型是awq模型，我看对应的示例脚本基本上没有加什么特殊的参数

是的

skepsun · 2025-05-07T10:12:17Z

是否和deepspeed zero3 冲突？我同时用qlora和zero3_offload会报错：

output tensor must have the same type as input tensor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于qLoRA训练 #4007

关于qLoRA训练 #4007

Mrkkew commented Apr 27, 2025

slin000111 commented Apr 27, 2025

Mrkkew commented Apr 28, 2025

Mrkkew commented Apr 28, 2025

slin000111 commented Apr 28, 2025

skepsun commented May 7, 2025

关于qLoRA训练 #4007

关于qLoRA训练 #4007

Comments

Mrkkew commented Apr 27, 2025

slin000111 commented Apr 27, 2025

Mrkkew commented Apr 28, 2025

Mrkkew commented Apr 28, 2025

slin000111 commented Apr 28, 2025

skepsun commented May 7, 2025