Skip to content

多标签分类训练数据问题请教 #3984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xuyao178 opened this issue Apr 24, 2025 · 2 comments
Closed

多标签分类训练数据问题请教 #3984

xuyao178 opened this issue Apr 24, 2025 · 2 comments

Comments

@xuyao178
Copy link

按文档准备了多标签的训练数据,训练会报错。

有2个标签,数据样例:
{"messages": [{"role": "user", "content": ""}], "label": [0]}
{"messages": [{"role": "user", "content": ""}], "label": [1]}
{"messages": [{"role": "user", "content": ""}], "label": [0, 1]}

训练命令:
swift sft
--model ./models/Qwen2.5-0.5B
--train_type lora
--dataset ./train_multi.jsonl
--torch_dtype bfloat16
--num_train_epochs 1
--per_device_train_batch_size 16
--per_device_eval_batch_size 16
--learning_rate 1e-4
--lora_rank 8
--lora_alpha 32
--target_modules all-linear
--gradient_accumulation_steps 1
--eval_steps 100
--save_steps 100
--save_total_limit 2
--logging_steps 5
--max_length 32768
--output_dir output
--warmup_ratio 0.05
--dataloader_num_workers 4
--dataset_num_proc 4
--num_labels 2
--task_type seq_cls
--use_chat_template false
--problem_type multi_label_classification

报错提示:

Image

看报错意思是label里的元素要一致,训练数据是不是要按下面这种方式来提供(标签存在为1,不存在为0),望大佬指点。
{"messages": [{"role": "user", "content": ""}], "label": [0, 1]}
{"messages": [{"role": "user", "content": ""}], "label": [1, 0]}
{"messages": [{"role": "user", "content": ""}], "label": [0, 1]}

@Jintao-Huang
Copy link
Collaborator

main分支修复了

@xuyao178
Copy link
Author

更新到main分支问题解决

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants