Skip to content

[Bug]: 计算AccuracyAndF1出错 #6309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
xiehuanyi opened this issue Jul 5, 2023 · 1 comment
Closed
1 task done

[Bug]: 计算AccuracyAndF1出错 #6309

xiehuanyi opened this issue Jul 5, 2023 · 1 comment
Assignees
Labels
bug Something isn't working triage

Comments

@xiehuanyi
Copy link

软件环境

paddle-bfloat                  0.1.7
paddle2onnx                    1.0.0
paddlefsl                      1.1.0
paddlehub                      2.3.0
paddlenlp                      2.5.2
paddlepaddle-gpu               2.3.2.post112
tb-paddle                      0.3.6

重复问题

  • I have searched the existing issues

错误描述

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_3067/2457985955.py in <module>
     89     compute_metrics=paddlenlp.metrics.AccuracyAndF1,
     90 )
---> 91 trainer.train()

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/trainer/trainer.py in train(self, resume_from_checkpoint, ignore_keys_for_eval)
    714 
    715                     self.control = self.callback_handler.on_step_end(args, self.state, self.control)
--> 716                     self._maybe_log_save_evaluate(tr_loss, model, epoch, ignore_keys_for_eval, inputs=inputs)
    717                 else:
    718                     self.control = self.callback_handler.on_substep_end(args, self.state, self.control)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/trainer/trainer.py in _maybe_log_save_evaluate(self, tr_loss, model, epoch, ignore_keys_for_eval, **kwargs)
    846                     )
    847             else:
--> 848                 metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
    849 
    850         if self.control.should_save:

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/trainer/trainer.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   1614             prediction_loss_only=True if self.compute_metrics is None else None,
   1615             ignore_keys=ignore_keys,
-> 1616             metric_key_prefix=metric_key_prefix,
   1617         )
   1618 

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/trainer/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix, max_eval_iters)
   1787         # Metrics!
   1788         if self.compute_metrics is not None and all_preds is not None and all_labels is not None:
-> 1789             metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
   1790         else:
   1791             metrics = {}

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/metrics/glue.py in __init__(self, topk, pos_label, name, *args, **kwargs)
     64         self.pos_label = pos_label
     65         self._name = name
---> 66         self.acc = Accuracy(self.topk, *args, **kwargs)
     67         self.precision = Precision(*args, **kwargs)
     68         self.recall = Recall(*args, **kwargs)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/metric/metrics.py in __init__(self, topk, name, *args, **kwargs)
    238         super(Accuracy, self).__init__(*args, **kwargs)
    239         self.topk = topk
--> 240         self.maxk = max(topk)
    241         self._init_name(name)
    242         self.reset()

ValueError: operands could not be broadcast together with shapes (1000,) (1000,2) 


### 稳定复现步骤 & 代码

这是一个二分类的任务,我的数据读取是这样的:

def collate_fn(data):
feats = [d[0] for d in data]
labels = paddle.to_tensor([d[1] for d in data], dtype='int64')
encodings = tokenizer.encode_batch(feats)
input_ids = paddle.to_tensor([enc.ids for enc in encodings], dtype='int64')
attn_mask = paddle.to_tensor([enc.attention_mask for enc in encodings], dtype='int64')
return {
"input_ids": input_ids,
"attention_mask": attn_mask,
"labels": labels
}

我使用了Trainer,相关的设置:

args = paddlenlp.trainer.TrainingArguments(
output_dir='models',
do_train=True,
do_eval=True,
per_device_train_batch_size=8,
gradient_accumulation_steps=32,
evaluation_strategy='steps',
per_device_eval_batch_size=16,
eval_steps=10,
save_total_limit=1,
report_to='visualdl',
logging_steps=50,
)

trainer = paddlenlp.trainer.Trainer(
model=model,
criterion=paddle.nn.CrossEntropyLoss(),
args=args,
data_collator=collate_fn,
train_dataset=MyDataset('train_data/0.txt'),
eval_dataset=MyDataset('train_data/1.txt', 'eval'),
optimizers=[opt, paddle.optimizer.lr.NoamDecay(512, 25000, 5)],
compute_metrics=paddlenlp.metrics.AccuracyAndF1,
)
trainer.train()

@xiehuanyi xiehuanyi added the bug Something isn't working label Jul 5, 2023
@github-actions github-actions bot added the triage label Jul 5, 2023
@xiehuanyi xiehuanyi changed the title [Bug]: 计算出错 [Bug]: 计算AccuracyAndF1出错 Jul 5, 2023
@w5688414
Copy link
Contributor

w5688414 commented May 8, 2024

这个格式有点乱,请提供一下数据和代码,方便我们快速定位。

@paddle-bot paddle-bot bot closed this as completed May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants