Skip to content

Using PFI with AutoML, possible? #3972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
famschopman opened this issue Jul 8, 2019 · 6 comments
Closed

Using PFI with AutoML, possible? #3972

famschopman opened this issue Jul 8, 2019 · 6 comments
Assignees
Labels
P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@famschopman
Copy link

Playing with AutoML and so far having much fun with it.

I have a trained model and now trying to retrieve the feature weights. None of the objects returned expose a LastTransformer object that I need to

Code snippet:

var mlContext = new MLContext();
var _appPath = AppDomain.CurrentDomain.BaseDirectory;
 var _dataPath = Path.Combine(_appPath, "Datasets", "dataset.csv");
var _modelPath = Path.Combine(_appPath, "Datasets", "TrainedModels");


ColumnInferenceResults columnInference = mlContext.Auto().InferColumns(_dataPath, LabelColumnName, groupColumns: false);
            ColumnInformation columnInformation = columnInference.ColumnInformation;

            TextLoader textLoader = mlContext.Data.CreateTextLoader(columnInference.TextLoaderOptions);
            IDataView data = textLoader.Load(_dataPath);

            DataOperationsCatalog.TrainTestData dataSplit = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
            IDataView trainData = dataSplit.TrainSet;
            IDataView testData = dataSplit.TestSet;

            var cts = new CancellationTokenSource();
            var experimentSettings = CreateExperimentSettings(mlContext, cts);

            var progressHandler = new BinaryExperimentProgressHandler();

            ExperimentResult<BinaryClassificationMetrics> experimentResult = mlContext.Auto()
                .CreateBinaryClassificationExperiment(experimentSettings)
                .Execute(trainData, labelColumnName: "Attrition", progressHandler: new BinaryExperimentProgressHandler());

            RunDetail<BinaryClassificationMetrics> bestRun = experimentResult.BestRun;
            ITransformer trainedModel = bestRun.Model;
            var predictions = trainedModel.Transform(testData);
            var metrics = mlContext.BinaryClassification.EvaluateNonCalibrated(data: predictions, labelColumnName: "Attrition", scoreColumnName: "Score");

            mlContext.Model.Save(trainedModel, trainData.Schema, _modelPath);

Then I want to get the PFI information and I get stuck. There appears no way to get the LastTransformer object from the trainedModel.

            var transformedData = trainedModel.Transform(trainData);
            var linearPredictor = model.LastTransformer; 

            var permutationMetrics = mlContext.BinaryClassification.PermutationFeatureImportance(
                linearPredictor, transformedData, permutationCount: 30);

Hope someone can help me with some guidance.

@jedsmallwood
Copy link

I'm interested in a solution to this also. It seems like a good way to reduce the number of features if you can identify which features are important.

@justinormont
Copy link
Contributor

@daholste: Do you think this simply needs to be cast into the right type which has .LastTransformer as a property?

Possibly related comic: https://blog.toggl.com/build-horse-programming/

@daholste
Copy link
Contributor

First and foremost, I love that comic, @justinormont

+1, the C# segment of the comic feels apropos. If you inspect the model in the debugger GUI, you should be able to navigate to the last transformer. Thru casting C# objects as you see them in the debugger, you could write lines of C# code that correspond to the navigation in the GUI

Of course, this is terribly hacky. Off-hand, I'm not aware of an officially supported / less hacky way to do this. It could be a great area of focus for future development

@jedsmallwood
Copy link

The following cast lets me access the LastTransformer, however I cannot use it for PFI until I provide a better type for predictor. Debugging I can see it is of type Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.IPredictorProducing> but I am unable to cast to that because Microsoft.ML.IPredictorProducing is not visible, so it seems like we're still stuck.

//setup code similar to famschopman 
RegressionExperiment experiment = mlContext.Auto().CreateRegressionExperiment(experimentSettings);

var experimentResults = experiment.Execute(split.TrainSet, split.TestSet);
var predictor = ((TransformerChain<ITransformer>)experimentResults.BestRun.Model).LastTransformer;
          
//this will not compile.
var permutationMetrics = mlContext.Regression.PermutationFeatureImportance(predictor, transformedData, permutationCount: 30);

The following compile error is produced.

The type arguments for method 'PermutationFeatureImportanceExtensions.PermutationFeatureImportance<TModel>(RegressionCatalog, ISingleFeaturePredictionTransformer<TModel>, IDataView, string, bool, int?, int)' cannot be inferred from the usage. Try specifying the type arguments explicitly.	

@eerhardt
Copy link
Member

See my analysis on #3976 as well. These two issues feel like they are the same thing.

@gvashishtha gvashishtha added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jan 9, 2020
@antoniovs1029
Copy link
Member

antoniovs1029 commented Jun 4, 2020

The only thing that was needed to make this build and run was to add the (TransformerChain<ITransformer>) cast to the BestRun.Model (recommended in #3972 (comment) , and then add another cast to (ISingleFeaturePredictionTransformer<object>) for the LinearPredictor, and that would have been enough to let you run PFI:

            RunDetail<BinaryClassificationMetrics> bestRun = experimentResult.BestRun;
            TransformerChain<ITransformer> trainedModel = (TransformerChain <ITransformer>) bestRun.Model;
            var predictions = trainedModel.Transform(testData);

            var linearPredictor = (ISingleFeaturePredictionTransformer<object>)trainedModel.LastTransformer;

            var permutationMetrics = mlContext.BinaryClassification.PermutationFeatureImportance(
                linearPredictor, predictions, permutationCount: 30);

PS: There was a bug (#4517) when running PFI particularly with Binary classification models, so even after getting this running, if AutoML had returned a non-calibrated binary model, then running PFI would have thrown an exception. This bug got fixed on #4587 , which got included in ML.NET 1.5.0-preview2 and 1.5.0, so that is fixed.

See my analysis on #3976 as well. These two issues feel like they are the same thing.

The problem described there got fixed on #4262 and #4292. Still, that problem wasn't really causing this problem, as the solution I mentioned above would have worked even then. The problem you refer to is not being able to cast a model loaded from disk to their actual type (e.g. BinaryPredictionTransformer<ParameterMixingCalibratedModelParameters<IPredictorProducing<float>, ICalibrator>> ). After that problem got fixed, users can now cast to the actual type, but they could always cast to (ISingleFeaturePredictionTransformer<object>) (which is more appropriate when using AutoML.NET since users won't know in advance the actual type of the model being returned by the experiment). So the point is that it was always possible to use PFI with AutoML by using the (ISingleFeaturePredictionTransformer<object>) cast I described above.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

No branches or pull requests

7 participants