Skip to content

Conversation

@jphdotam
Copy link

@jphdotam jphdotam commented May 2, 2019

Signed-off-by: James P Howard [email protected]

Fixes: Feature added - label-wise accuracy option for Accuracy metric.

Description: Accuracy() metrics with is_multilabel=True can now be passed labelwise=True. When present, the metric returns a tensor of accuracies for each class. For example:

evaluator = create_supervised_evaluator(model,
    metrics={'loss': Loss(loss),
    'accuracy': Accuracy(output_transform=thresholded_output_transform, is_multilabel=True),
    'precision': Precision(output_transform=thresholded_output_transform, is_multilabel=True, average=True),
    'label_acc':Accuracy(output_transform=thresholded_output_transform, is_multilabel=True, labelwise=True)},
    device=device)

@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(engine):
    evaluator.run(train_loader)
    metrics = evaluator.state.metrics
    acc, loss, precision, label_acc = metrics['accuracy'], metrics['loss'], metrics['precision'], metrics['label_acc']
    print(f"\rEnd of epoch {engine.state.epoch:03d}")
    print(f"TRAINING Accuracy: {acc:.3f} | Loss: {loss:.3f} | Precision: {precision:.3f} | Label-wise accuracy: {label_acc}")
    writer.add_scalar("training/accuracy", acc, engine.state.epoch)

@trainer.on(Events.EPOCH_COMPLETED)
def log_validation_results(engine):
    evaluator.run(test_loader)
    metrics = evaluator.state.metrics
    acc, loss, precision, label_acc = metrics['accuracy'], metrics['loss'], metrics['precision'], metrics['label_acc']
    print(f"TESTING  Accuracy: {acc:.3f} | Loss: {loss:.3f} | Precision: {precision:.3f} | Label-wise accuracy: {label_acc}\n")
    writer.add_scalar("testing/loss", loss, engine.state.epoch)
    writer.add_scalar("testing/accuracy", acc, engine.state.epoch)

trainer.run(train_loader, max_epochs=30)

Yields:

End of epoch 001
TRAINING Accuracy: 0.753 | Loss: 0.334 | Precision: 0.212 | Label-wise accuracy: tensor([0.8662, 0.8662], device='cuda:0')
TESTING  Accuracy: 0.725 | Loss: 0.341 | Precision: 0.221 | Label-wise accuracy: tensor([0.8302, 0.8755], device='cuda:0')

End of epoch 002
TRAINING Accuracy: 0.748 | Loss: 0.363 | Precision: 0.160 | Label-wise accuracy: tensor([0.8134, 0.9022], device='cuda:0')
TESTING  Accuracy: 0.672 | Loss: 0.537 | Precision: 0.087 | Label-wise accuracy: tensor([0.7057, 0.8792], device='cuda:0') 

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@anmolsjoshi
Copy link
Contributor

anmolsjoshi commented May 2, 2019

@jphdotam thanks for the PR! Could you add a few tests?

Have a look here, we tests accuracy against scikit-learn's implementation.

Let us know if you get stuck or have any questions!

PS: It seems that Travis CI failed due to flake8 errors. See here

@jphdotam
Copy link
Author

jphdotam commented May 2, 2019

Thanks @anmolsjoshi - I've written some tests and hopefully fixed the flake8 errors.

There unfortunately is no scikit-learn equivalent of labelwise accuracy, so I have written an analogous way of doing it in numpy.

…sification & text8 clean-up.

Signed-off-by: James P Howard <[email protected]>
@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 2, 2019

@jphdotam thanks for the PR ! To merge it I think we need to discuss about the API. I'm not a fan of introducing another flag. Maybe we can opt something like in torch the arguments of nn.CrossEntropy: deprecated reduce and new arg reduction which can have text values.
Can we generalize this PR to cover two issues : #513 and #467 ?

@jphdotam could you please provide a very simple example of manually computing such accuracy score labelwise. For example, I have y_true = [(1, 1, 0), (0, 0, 0,), (1, 1, 1)] and y_pred=[(1, 0, 1), (0, 0, 1), (0, 1, 1)] what is the score and how it is computed in details ?

@jphdotam
Copy link
Author

jphdotam commented May 2, 2019

Hi @vfdev-5.

Your example shows a batch size of 3 for a binary classifier with 3 labels.
The label-wise accuracy is essentially an accuracy for each position within the tuples.
If I expand your example to a batch size of 4 (just to make samples versus labels clearer), this is essentially how it works:

y_true = np.array([(1, 1, 0), (0, 0, 0), (1, 1, 1), (0, 1 ,0)])
y_pred = np.array([(1, 0, 1), (0, 0, 1), (0, 1, 1), (0, 1, 0)])
correct = y_true == y_pred
correct
Out[38]: 
array([[ True, False, False],
       [ True,  True, False],
       [False,  True,  True],
       [ True,  True,  True]])
np.mean(correct, axis=0)
Out[39]: array([0.75, 0.75, 0.5 ])

So there is 75% accuracy for the 1st category, 75% accuracy for the second, and 50% for the third.

It's very useful if one wishes to see which label in a multi-label classifier is compromising the overall accuracy.

Re: merging, would you rather I instead created a new separate metric from Accuracy called MultilabelAccuracy, or something? And submit it to either ignite.metrics or ignite.contrib.metrics?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 2, 2019

@jphdotam thanks for the explanation. Now it is clear that we speak about the same computation method.

Re: merging, would you rather I instead created a new separate metric from Accuracy called MultilabelAccuracy, or something?

Previously, we had BinaryAccuracy and CategorialAccuracy that we merged into a single class. Next we added the support of multilabel same as in sklearn. IMO we should keep a single class.

Let me think about the new API and I'll comment out here. If you have other ideas on the API we can discuss about.

@jphdotam
Copy link
Author

jphdotam commented May 3, 2019

Ok great. In the mean time I will just use it as a new class as I've posted here #513 , since that's probably easier until we decide.

@anmolsjoshi
Copy link
Contributor

anmolsjoshi commented May 13, 2019

@jphdotam thanks for providing the code base. In discussion with @vfdev-5, we were thinking the following:

  • Add a labelwise parameter, so the constructor would be Accuracy(is_multilabel=True, labelwise=True) Already handled
  • Add a check that labelwise can only be True with multilabel cases, maybe with a warning and no error raising Already handled
  • The same would need to be applied to Precision and Recall, as these metrics are
    closely related in the way they are written.

Would you be interested in continuing this PR?

We might be working towards a minor release for now, so we shouldn't do major API changes. For the next major release (0.3.0), we can introduce a new_multilabel_arg with options None (binary/multiclass), multilabel (single accuracy value), labelwise (each label for multilabel).

What are your thoughts?

@anmolsjoshi anmolsjoshi self-requested a review May 13, 2019 05:05
@Oktai15
Copy link

Oktai15 commented Nov 26, 2019

Description: Accuracy() metrics with is_multilabel=True can now be passed labelwise=True

@jphdotam why do you want to add this feature only for multilabel case? It can be useful also with multiclass case too, isn't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants