Description
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
#!/bin/bash
NPROC_PER_NODE=4
CUDA_VISIBLE_DEVICES=0,1,2,3
swift pt
--model $XLF/downloads/models/Qwen/Qwen2-VL-2B
--dataset $XLF/downloads/datasets/VisualStar/visualwebinstruct/pt-event-0608-01-230M.jsonl
--output_dir $XLF/scripts/logs/Qwen2-VL-2B-CPT/C230
--add_version false
--train_type full
--torch_dtype bfloat16
--learning_rate 1e-5
--warmup_ratio 0.05
--freeze_llm false
--freeze_vit false
--freeze_aligner false
--num_train_epochs 1
--gradient_accumulation_steps 2
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--dataset_num_proc 8
--dataloader_num_workers 4
--split_dataset_ratio 0
--max_length 4096
--truncation_strategy delete
--attn_impl flash_attn
--packing true
--save_strategy epoch
--save_steps 1
--save_only_model true
--eval_strategy epoch
--eval_steps 1
--logging_steps 1
--deepspeed zero3 \


Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
(/gpfs/work/int/xinlongfu24/xinlong_fu/conda/env/swift) [xinlongfu24@xpxecdtn1 xinlong_fu]$ python -c "import sys; import torch; print('Python Version:', sys.version); print('CUDA Version:', torch.version.cuda); print('PyTorch Version:', torch.version); print('CXX11 ABI Enabled:', torch._C._GLIBCXX_USE_CXX11_ABI); print('CUDA Available:', torch.cuda.is_available()); print('GPU Count:', torch.cuda.device_count());"
Python Version: 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0]
CUDA Version: 12.4
PyTorch Version: 2.6.0+cu124
CXX11 ABI Enabled: False
CUDA Available: False
GPU Count: 0
(/gpfs/work/int/xinlongfu24/xinlong_fu/conda/env/swift) [xinlongfu24@xpxecdtn1 xinlong_fu]$ pip show ms-swift
Name: ms_swift
Version: 3.5.3
Summary: Swift: Scalable lightWeight Infrastructure for Fine-Tuning
Home-page: https://github.com/modelscope/swift
Author: DAMO ModelScope teams
Author-email: [email protected]
License: Apache License 2.0
Location: /gpfs/work/int/xinlongfu24/xinlong_fu/programs/swift
Editable project location: /gpfs/work/int/xinlongfu24/xinlong_fu/programs/swift
Requires: accelerate, addict, aiohttp, attrdict, binpacking, charset_normalizer, cpm_kernels, dacite, datasets, einops, fastapi, gradio, importlib_metadata, jieba, matplotlib, modelscope, nltk, numpy, openai, oss2, pandas, peft, pillow, requests, rouge, safetensors, scipy, sentencepiece, simplejson, sortedcontainers, tensorboard, tiktoken, tqdm, transformers, transformers_stream_generator, trl, uvicorn, zstandard
Required-by:
(/gpfs/work/int/xinlongfu24/xinlong_fu/conda/env/swift) [xinlongfu24@xpxecdtn1 xinlong_fu]$
Additional context
Add any other context about the problem here(在这里补充其他信息)