Skip to content

Polish train catalog (renaming only) #3030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 21, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions src/Microsoft.ML.Data/TrainCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -484,21 +484,22 @@ internal MulticlassClassificationTrainers(MulticlassClassificationCatalog catalo
/// <param name="labelColumnName">The name of the label column in <paramref name="data"/>.</param>
/// <param name="scoreColumnName">The name of the score column in <paramref name="data"/>.</param>
/// <param name="predictedLabelColumnName">The name of the predicted label column in <paramref name="data"/>.</param>
/// <param name="topK">If given a positive value, the <see cref="MulticlassClassificationMetrics.TopKAccuracy"/> will be filled with
/// <param name="topPredictionCount">If given a positive value, the <see cref="MulticlassClassificationMetrics.TopKAccuracy"/> will be filled with
Copy link
Member

@abgoswam abgoswam Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If given a positive value [](start = 45, length = 25)

it seems the existing behavior is to just ignore negative values. is that correct ? should we raise an exception if user gives a negative value #Resolved

Copy link
Member Author

@wschin wschin Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. We throw now. #Resolved

Copy link
Contributor

@rogancarr rogancarr Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also rename TopKAccuracy? If we make this change, we won't have parallelism with the evaluation metrics. #Resolved

Copy link
Member Author

@wschin wschin Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to. TopKAccuracy is as good as Accuracy. This original topK is a parameter when computing TopKAccuracy, not TopKAccuracy itself. #Resolved

Copy link
Member Author

@wschin wschin Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. We will do topKPredictionCount to associate it with TopKAccuracy. #Resolved

/// the top-K accuracy, that is, the accuracy assuming we consider an example with the correct class within
/// the top-K values as being stored "correctly."</param>
/// <returns>The evaluation results for these calibrated outputs.</returns>
public MulticlassClassificationMetrics Evaluate(IDataView data, string labelColumnName = DefaultColumnNames.Label, string scoreColumnName = DefaultColumnNames.Score,
string predictedLabelColumnName = DefaultColumnNames.PredictedLabel, int topK = 0)
string predictedLabelColumnName = DefaultColumnNames.PredictedLabel, int topPredictionCount = 0)
{
Environment.CheckValue(data, nameof(data));
Environment.CheckNonEmpty(labelColumnName, nameof(labelColumnName));
Environment.CheckNonEmpty(scoreColumnName, nameof(scoreColumnName));
Environment.CheckNonEmpty(predictedLabelColumnName, nameof(predictedLabelColumnName));
Environment.CheckUserArg(topPredictionCount >= 0, nameof(topPredictionCount), "Must be non-negative");

var args = new MulticlassClassificationEvaluator.Arguments() { };
if (topK > 0)
args.OutputTopKAcc = topK;
if (topPredictionCount > 0)
args.OutputTopKAcc = topPredictionCount;
var eval = new MulticlassClassificationEvaluator(Environment, args);
return eval.Evaluate(data, labelColumnName, scoreColumnName, predictedLabelColumnName);
}
Expand Down Expand Up @@ -673,18 +674,18 @@ internal AnomalyDetectionTrainers(AnomalyDetectionCatalog catalog)
/// <param name="labelColumnName">The name of the label column in <paramref name="data"/>.</param>
/// <param name="scoreColumnName">The name of the score column in <paramref name="data"/>.</param>
/// <param name="predictedLabelColumnName">The name of the predicted label column in <paramref name="data"/>.</param>
/// <param name="k">The number of false positives to compute the <see cref="AnomalyDetectionMetrics.DetectionRateAtKFalsePositives"/> metric. </param>
/// <param name="falsePositiveCount">The number of false positives to compute the <see cref="AnomalyDetectionMetrics.DetectionRateAtKFalsePositives"/> metric. </param>
/// <returns>Evaluation results.</returns>
public AnomalyDetectionMetrics Evaluate(IDataView data, string labelColumnName = DefaultColumnNames.Label, string scoreColumnName = DefaultColumnNames.Score,
string predictedLabelColumnName = DefaultColumnNames.PredictedLabel, int k = 10)
string predictedLabelColumnName = DefaultColumnNames.PredictedLabel, int falsePositiveCount = 10)
{
Environment.CheckValue(data, nameof(data));
Environment.CheckNonEmpty(labelColumnName, nameof(labelColumnName));
Environment.CheckNonEmpty(scoreColumnName, nameof(scoreColumnName));
Environment.CheckNonEmpty(predictedLabelColumnName, nameof(predictedLabelColumnName));

var args = new AnomalyDetectionEvaluator.Arguments();
args.K = k;
args.K = falsePositiveCount;

var eval = new AnomalyDetectionEvaluator(Environment, args);
return eval.Evaluate(data, labelColumnName, scoreColumnName, predictedLabelColumnName);
Expand Down
2 changes: 1 addition & 1 deletion test/Microsoft.ML.Tests/AnomalyDetectionTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ public void RandomizedPcaTrainerBaselineTest()
var transformedData = DetectAnomalyInMnistOneClass(trainPath, testPath);

// Evaluate
var metrics = ML.AnomalyDetection.Evaluate(transformedData, k: 5);
var metrics = ML.AnomalyDetection.Evaluate(transformedData, falsePositiveCount: 5);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

falsePositiveCount [](start = 72, length = 18)

AnomalyDetectionMetrics has DetectionRateAtKFalsePositives
If you remove K from here, I think you need to remove K from metric class as well. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do DetectionRateAtFalsePositiveCount.


In reply to: 267591549 [](ancestors = 267591549)


Assert.Equal(0.98667, metrics.AreaUnderRocCurve, 5);
Assert.Equal(0.90000, metrics.DetectionRateAtKFalsePositives, 5);
Expand Down
2 changes: 1 addition & 1 deletion test/Microsoft.ML.Tests/Scenarios/Api/TestApi.cs
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,7 @@ public void TestTrainTestSplit()

// Let's do same thing, but this time we will choose different seed.
// Stratification column should still break dataset properly without same values in both subsets.
var stratSeed = mlContext.Data.TrainTestSplit(input, samplingKeyColumnName:"Workclass", seed: 1000000);
var stratSeed = mlContext.Data.TrainTestSplit(input, samplingKeyColumnName: "Workclass", seed: 1000000);
var stratTrainWithSeedWorkclass = getWorkclass(stratSeed.TrainSet);
var stratTestWithSeedWorkClass = getWorkclass(stratSeed.TestSet);
// Let's get unique values for "Workclass" column from train subset.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ public void TrainAndPredictIrisModelTest()

// Evaluate the trained pipeline
var predicted = trainedModel.Transform(testData);
var metrics = mlContext.MulticlassClassification.Evaluate(predicted, topK: 3);
var metrics = mlContext.MulticlassClassification.Evaluate(predicted, topPredictionCount: 3);

Assert.Equal(.98, metrics.MacroAccuracy);
Assert.Equal(.98, metrics.MicroAccuracy, 2);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ public void TrainAndPredictIrisModelWithStringLabelTest()

// Evaluate the trained pipeline
var predicted = trainedModel.Transform(testData);
var metrics = mlContext.MulticlassClassification.Evaluate(predicted, topK: 3);
var metrics = mlContext.MulticlassClassification.Evaluate(predicted, topPredictionCount: 3);

Assert.Equal(.98, metrics.MacroAccuracy);
Assert.Equal(.98, metrics.MicroAccuracy, 2);
Expand Down
4 changes: 2 additions & 2 deletions test/Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ public void SdcaMulticlassLogisticRegression()

// Step 4: Make prediction and evaluate its quality (on training set).
var prediction = model.Transform(data);
var metrics = mlContext.MulticlassClassification.Evaluate(prediction, labelColumnName: "LabelIndex", topK: 1);
var metrics = mlContext.MulticlassClassification.Evaluate(prediction, labelColumnName: "LabelIndex", topPredictionCount: 1);

// Check a few metrics to make sure the trained model is ok.
Assert.InRange(metrics.TopKAccuracy, 0.8, 1);
Expand Down Expand Up @@ -192,7 +192,7 @@ public void SdcaMulticlassSupportVectorMachine()

// Step 4: Make prediction and evaluate its quality (on training set).
var prediction = model.Transform(data);
var metrics = mlContext.MulticlassClassification.Evaluate(prediction, labelColumnName: "LabelIndex", topK: 1);
var metrics = mlContext.MulticlassClassification.Evaluate(prediction, labelColumnName: "LabelIndex", topPredictionCount: 1);

// Check a few metrics to make sure the trained model is ok.
Assert.InRange(metrics.TopKAccuracy, 0.8, 1);
Expand Down