QwenVL2.5 图文数据和纯文本数据混合训练会卡住

**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)

QwenVL2.5 图文数据和纯文本数据混合训练会卡住，我看到有类似的问题在 https://github.com/modelscope/ms-swift/issues/2198 讨论过，但是3.x 版本似乎还是有这个错误。经过检查应该是计算图的问题。

这是启动脚本：
```
NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
MAX_PIXELS=50176 \
MIN_PIXELS=784 \
swift sft \
    --model /mnt/models/Qwen2.5VL_3B \
    --train_type full \
    --dataset '/mnt/annotations/MSSwift/tulu-3-sft-mixture.jsonl' \
    --torch_dtype bfloat16 \
    --attn_impl flash_attn \
    --freeze_vit false \
    --freeze_llm false \
    --freeze_aligner false \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-5 \
    --lr_scheduler_type constant \
    --gradient_accumulation_steps 16 \
    --save_steps 500 \
    --save_total_limit 1 \
    --logging_steps 5 \
    --max_length 8192 \
    --output_dir output/verify/tulu-3-sft-mixture \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 0 \
    --report_to wandb \
    --model_author swift \
    --model_name swift-robot \
    --eval_steps -1
```
运行结果是：
<img width="1115" height="851" alt="Image" src="https://github.com/user-attachments/assets/49a01a33-cc87-4a75-ba23-6521a5d41565" />

将ViT 和 Aligner 的参数 fix，才可以正常运行：
```
    --freeze_vit true \
    --freeze_llm false \
    --freeze_aligner true \
```

但是计算图这个问题不是之前已经解决了么？

求问现在想图文混合训练应该怎么设置？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QwenVL2.5 图文数据和纯文本数据混合训练会卡住 #4918

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QwenVL2.5 图文数据和纯文本数据混合训练会卡住 #4918

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions