Skip to content

2张3090训练70B模型的脚本为啥是7B的模型 #3929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asdasas1212 opened this issue Apr 18, 2025 · 0 comments
Open

2张3090训练70B模型的脚本为啥是7B的模型 #3929

asdasas1212 opened this issue Apr 18, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@asdasas1212
Copy link

14GiB * 2

nproc_per_node=2

CUDA_VISIBLE_DEVICES=0,1
accelerate launch --config_file "./examples/train/multi-gpu/fsdp_qlora/fsdp_offload.json"
swift/cli/sft.py
--model Qwen/Qwen2.5-7B-Instruct
--train_type lora
--dataset 'swift/self-cognition#1000'
--torch_dtype bfloat16
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--quant_bits 4
--bnb_4bit_compute_dtype bfloat16
--bnb_4bit_quant_storage bfloat16
--learning_rate 1e-4
--lora_rank 8
--lora_alpha 32
--gradient_checkpointing true
--weight_decay 0.1
--target_modules all-linear
--gradient_accumulation_steps $(expr 16 / $nproc_per_node)
--eval_steps 100
--save_steps 100
--save_total_limit 2
--logging_steps 5
--max_length 2048
--output_dir output
--system 'You are a helpful assistant.'
--warmup_ratio 0.05
--dataloader_num_workers 4
--model_author swift
--model_name swift-robot

@Jintao-Huang Jintao-Huang added the bug Something isn't working label Apr 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants