Too slow sft process #3971

hienhayho · 2025-04-24T00:59:17Z

Describe the bug
I'm finetuning Qwen2.5-3B-Instruct but encounter a very slow finetuning process.

Step to reproduce

Installation

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift/
pip install -v -e .
pip install polars polars-lts-cpu deepspeed wandb datasets

Prepare data file

Run python gen_data.py

import polars as pl
from tqdm import tqdm
from datasets import load_dataset

data = load_dataset(
    "BlossomsAI/reduced_vietnamese_instruction_dataset",
    split="train",
    cache_dir="cache_data",
)

results = []
for d in tqdm(data, total=len(data)):
    # print(d)
    r = {
        "instruction": d["instruction"],
        "input": d["input"],
        "output": d["output"],
    }

    results.append(r)

df = pl.DataFrame(results)

df.write_ndjson("data.jsonl")

Run script

Run bash sft_qwen2_5_3b.sh

#!/bin/bash

NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 \
    swift sft \
    --model Qwen/Qwen2.5-3B-Instruct \
    --train_type lora \
    --dataset 'data.jsonl' \
    --torch_dtype bfloat16 \
    --report_to wandb \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --deepspeed zero3 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --gradient_accumulation_steps 16 \
    --eval_steps 500 \
    --save_steps 500 \
    --save_total_limit 2 \
    --logging_steps 50 \
    --max_length 4096 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataset_num_proc 1 \
    --dataloader_num_workers 4 \
    --use_hf true

Your hardware and system info

OS: ubuntu 22.04.3 LTS
CUDA toolkit: 12.1
GPUs: 4 NVIDIA GeForce RTX 2080 Ti
torch: 2.7.0

Additional context

I can't use flash attn 2 since my GPUs are not supported

The text was updated successfully, but these errors were encountered:

Jintao-Huang · 2025-04-24T02:04:11Z

total_batch_size = 4 * 2 * 16

ZeRO-3 is relatively slow, which I think is normal.

hienhayho · 2025-04-24T09:26:59Z

Hi @Jintao-Huang, thanks a lot, I have changed to zero2 and it's much faster

hienhayho closed this as completed Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too slow sft process #3971

Too slow sft process #3971

hienhayho commented Apr 24, 2025

Jintao-Huang commented Apr 24, 2025

hienhayho commented Apr 24, 2025

Too slow sft process #3971

Too slow sft process #3971

Comments

hienhayho commented Apr 24, 2025

Jintao-Huang commented Apr 24, 2025

hienhayho commented Apr 24, 2025