2张3090训练70B模型的脚本为啥是7B的模型 #3929

asdasas1212 · 2025-04-18T07:43:08Z

14GiB * 2

nproc_per_node=2

CUDA_VISIBLE_DEVICES=0,1
accelerate launch --config_file "./examples/train/multi-gpu/fsdp_qlora/fsdp_offload.json"
swift/cli/sft.py
--model Qwen/Qwen2.5-7B-Instruct
--train_type lora
--dataset 'swift/self-cognition#1000'
--torch_dtype bfloat16
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--quant_bits 4
--bnb_4bit_compute_dtype bfloat16
--bnb_4bit_quant_storage bfloat16
--learning_rate 1e-4
--lora_rank 8
--lora_alpha 32
--gradient_checkpointing true
--weight_decay 0.1
--target_modules all-linear
--gradient_accumulation_steps $(expr 16 / $nproc_per_node)
--eval_steps 100
--save_steps 100
--save_total_limit 2
--logging_steps 5
--max_length 2048
--output_dir output
--system 'You are a helpful assistant.'
--warmup_ratio 0.05
--dataloader_num_workers 4
--model_author swift
--model_name swift-robot

Jintao-Huang added the bug Something isn't working label Apr 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2张3090训练70B模型的脚本为啥是7B的模型 #3929

2张3090训练70B模型的脚本为啥是7B的模型 #3929

asdasas1212 commented Apr 18, 2025

2张3090训练70B模型的脚本为啥是7B的模型 #3929

2张3090训练70B模型的脚本为啥是7B的模型 #3929

Comments

asdasas1212 commented Apr 18, 2025

14GiB * 2