sequence classification inference #4093

Lettie-LIU · 2025-05-06T07:22:08Z

I fine-tuned a multi-label classification model using the following script:
swift sft \ --model Qwen/Qwen2.5-0.5B \ --train_type lora \ --dataset 'training.json' \ --torch_dtype bfloat16 \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --learning_rate 1e-4 \ --lora_rank 8 \ --lora_alpha 32 \ --target_modules all-linear \ --gradient_accumulation_steps 16 \ --eval_steps 50 \ --save_steps 50 \ --save_total_limit 2 \ --logging_steps 5 \ --max_length 2048 \ --output_dir output \ --warmup_ratio 0.05 \ --dataloader_num_workers 4 \ --num_labels 8 \ --task_type seq_cls \ --use_chat_template false

Once the model was fine-tuned, I ran inference on the validation set using the following script:
swift infer \ --adapters checkpoint-200 \ #fine-tuned ckpt --val_dataset validation.json \ --infer_backend pt \ --temperature 0 \ --max_new_tokens 4096 \ --logprobs true \ --load_data_args true \ --stream true

Training and Validation Dataset Format:
Both the training and validation datasets are structured as follows:
[ {"messages": [{"role": "user", "content": "<video>question?"}], "label": [1, 6], "videos": ["video1.mp4"]}, {"messages": [{"role": "user", "content": "<video>question?"}], "label": [1, 5], "videos": ["video2.mp4"]}, {"messages": [{"role": "user", "content": "<video>question"}], "label": [1, 3, 4], "videos": ["videos3.mp4"]},........]

Problem:
After running the inference, the model does not produce the expected classification labels. Instead, it outputs the same log_probs for all videos and repeats the label input from validation.json without providing proper label predictions.
{"response": [], "labels": [], "logprobs": {"content": [{"index": [], "logprobs": [], "top_logprobs": [{"index": 6, "logprob": -1.1640625}, {"index": 7, "logprob": -1.171875}, {"index": 4, "logprob": -1.1796875}, {"index": 2, "logprob": -1.203125}, {"index": 5, "logprob": -1.296875}, {"index": 3, "logprob": -1.3515625}, {"index": 0, "logprob": -1.3515625}, {"index": 1, "logprob": -1.375}]}]}, "messages": [{"role": "user", "content": "<video>question?"}, {"role": "assistant", "content": []}], "videos": ["val01.mp4"]} {"response": [], "labels": [], "logprobs": {"content": [{"index": [], "logprobs": [], "top_logprobs": [{"index": 6, "logprob": -1.1640625}, {"index": 7, "logprob": -1.171875}, {"index": 4, "logprob": -1.1796875}, {"index": 2, "logprob": -1.203125}, {"index": 5, "logprob": -1.296875}, {"index": 3, "logprob": -1.3515625}, {"index": 0, "logprob": -1.3515625}, {"index": 1, "logprob": -1.375}]}]}, "messages": [{"role": "user", "content": "<video>question?"}, {"role": "assistant", "content": []}], "videos": ["val02.mp4"].....}
I am wondering if there might be any issues with my inference script or the data format. Thank you.

The text was updated successfully, but these errors were encountered:

Jintao-Huang · 2025-05-06T07:47:06Z

Qwen/Qwen2.5-0.5B

Does not support multimodal input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sequence classification inference #4093

sequence classification inference #4093

Lettie-LIU commented May 6, 2025 •

edited

Loading

Jintao-Huang commented May 6, 2025 •

edited

Loading

sequence classification inference #4093

sequence classification inference #4093

Comments

Lettie-LIU commented May 6, 2025 • edited Loading

Jintao-Huang commented May 6, 2025 • edited Loading

Lettie-LIU commented May 6, 2025 •

edited

Loading

Jintao-Huang commented May 6, 2025 •

edited

Loading