Evaluating Outlier Detection Models
Evaluating outlier detection models is more nuanced than evaluating traditional supervised models. Outliers are typically rare, and labels may not always be available, which limits the use of standard metrics like accuracy. Instead, we use metrics suited for imbalanced datasets and binary decisions, such as precision, recall, F1-score, ROC-AUC, and confusion matrices – all of which we’ve utilized several times up to this point. When true labels are available (as in synthetic datasets), we can directly assess how our models identify anomalous points.
In this recipe, we’ll walk through evaluation strategies for outlier detection models using labeled data, compare model performance, and visualize the results for interpretability.
Getting ready
We’ll generate a labeled dataset with a clear distinction between inliers and outliers.
Load the libraries:
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import...