Qwen2-VL-2B 预训练到后期会出现梯度爆炸，其他VLM不会出现

**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
#!/bin/bash

NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift pt \
    --model $XLF/downloads/models/Qwen/Qwen2-VL-2B \
    --dataset $XLF/downloads/datasets/VisualStar/visualwebinstruct/pt-event-0608-01-230M.jsonl \
    --output_dir $XLF/scripts/logs/Qwen2-VL-2B-CPT/C230 \
    --add_version false \
    \
    --train_type full \
    --torch_dtype bfloat16 \
    --learning_rate 1e-5 \
    --warmup_ratio 0.05 \
    \
    --freeze_llm false \
    --freeze_vit false \
    --freeze_aligner false \
    \
    --num_train_epochs 1 \
    --gradient_accumulation_steps 2 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    \
    --dataset_num_proc 8 \
    --dataloader_num_workers 4 \
    --split_dataset_ratio 0 \
    \
    --max_length 4096 \
    --truncation_strategy delete \
    --attn_impl flash_attn \
    --packing true \
    \
    --save_strategy epoch \
    --save_steps 1 \
    --save_only_model true \
    --eval_strategy epoch \
    --eval_steps 1 \
    --logging_steps 1 \
    \
    --deepspeed zero3 \

<img width="923" alt="Image" src="https://github.com/user-attachments/assets/99344721-8cc4-47df-8e10-763ec15813c9" />

<img width="515" alt="Image" src="https://github.com/user-attachments/assets/6ce00be6-5be2-49ab-9525-e2ce53e98198" />

**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
(/gpfs/work/int/xinlongfu24/xinlong_fu/conda/env/swift) [xinlongfu24@xpxecdtn1 xinlong_fu]$ python -c "import sys; import torch; print('Python Version:', sys.version); print('CUDA Version:', torch.version.cuda); print('PyTorch Version:', torch.__version__); print('CXX11 ABI Enabled:', torch._C._GLIBCXX_USE_CXX11_ABI); print('CUDA Available:', torch.cuda.is_available()); print('GPU Count:', torch.cuda.device_count());"
Python Version: 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0]
CUDA Version: 12.4
PyTorch Version: 2.6.0+cu124
CXX11 ABI Enabled: False
CUDA Available: False
GPU Count: 0
(/gpfs/work/int/xinlongfu24/xinlong_fu/conda/env/swift) [xinlongfu24@xpxecdtn1 xinlong_fu]$ pip show ms-swift
Name: ms_swift
Version: 3.5.3
Summary: Swift: Scalable lightWeight Infrastructure for Fine-Tuning
Home-page: https://github.com/modelscope/swift
Author: DAMO ModelScope teams
Author-email: contact@modelscope.cn
License: Apache License 2.0
Location: /gpfs/work/int/xinlongfu24/xinlong_fu/programs/swift
Editable project location: /gpfs/work/int/xinlongfu24/xinlong_fu/programs/swift
Requires: accelerate, addict, aiohttp, attrdict, binpacking, charset_normalizer, cpm_kernels, dacite, datasets, einops, fastapi, gradio, importlib_metadata, jieba, matplotlib, modelscope, nltk, numpy, openai, oss2, pandas, peft, pillow, requests, rouge, safetensors, scipy, sentencepiece, simplejson, sortedcontainers, tensorboard, tiktoken, tqdm, transformers, transformers_stream_generator, trl, uvicorn, zstandard
Required-by: 
(/gpfs/work/int/xinlongfu24/xinlong_fu/conda/env/swift) [xinlongfu24@xpxecdtn1 xinlong_fu]$ 
**Additional context**
Add any other context about the problem here(在这里补充其他信息)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen2-VL-2B 预训练到后期会出现梯度爆炸，其他VLM不会出现 #4819

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen2-VL-2B 预训练到后期会出现梯度爆炸，其他VLM不会出现 #4819

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions