-
Notifications
You must be signed in to change notification settings - Fork 1.9k
API Proposal: Update PFI API to be easier to use #5625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Changing return type is a binary breaking change. Is it possible to make this change additive instead? |
I'm not sure the old one is really worth keeping ... if there is ever a good time to deprecate it .... but it could be an overload instead of replacing it. |
We can't take that position. Someone might have gone through the trouble to get the old one working. Is adding this new method worth breaking that customer? What if they cannot recompile the component that used this API. You can't overload on return type, we'd need to give it a different name. |
It has different parameters as well Microsoft.ML.ISingleFeaturePredictionTransformer predictionTransformer -> Microsoft.ML.IEstimator<Microsoft.ML.ITransformer> estimator |
Totally ... but if we ever V2 ... is it worth keeping then? |
We have some guidance now for how to better obsolete members. I'd expect us to follow that: https://github.com/dotnet/designs/blob/main/accepted/2020/better-obsoletion/better-obsoletion.md
Perfect! That allows for a compatible addition. |
cc @michaelgsharp @eerhardt what do you think about the feasibility of this addition? |
I like the idea of having more convenient return information, i.e. the The only concern I have is changing the input to be a Maybe an alternate solution would be to create another convenience API that gets the // Option 1: to extract predictor, requires to know the type in advance:
// var predictor = ((TransformerChain<RegressionPredictionTransformer<LightGbmRegressionModelParameters>>)mlModel).LastTransformer;
// Option 2: Should work always, as long as you _know_ the predictor is the last transformer in the chain.
var predictor = ((IEnumerable<ITransformer>)mlModel).Last(); That would allow users to easily get the input they need to pass into |
@eerhardt Re: convenient return information |
Adding links to similar issues calling for clarification or improvements to the PFI API: AutoML
Handling Features and Weights |
Thanks @houghj16!! |
Background and Motivation
The current PFI API is difficult to use. We've had a few issues opened to make it easier but we can use this issue to track a proposed API.
Prior Issue:
#4216
Example Support Issue to help developers use it:
dotnet/machinelearning-modelbuilder#1031 (comment)
The main issue with the API is that it returns an array and it's not easy to get back to the column name/feature name from the index.
The second biggest issue (which actually comes earlier in the process :). Is that it's hard to know what to pass for ISingleFeaturePredictionTransformer argument. Perhaps this is something we can figure out how to extract for them from the training pipeline?
If we can do that ... then we can just take in "Microsoft.ML.IEstimator<Microsoft.ML.ITransformer> estimator" similar to the CrossValidate APIs.
Proposed API
namespace Microsoft.ML { public static class PermutationFeatureImportanceExtensions { public static System.Collections.Immutable.ImmutableArray<Microsoft.ML.Data.RegressionMetricsStatistics> PermutationFeatureImportance<TModel> (this Microsoft.ML.RegressionCatalog catalog, Microsoft.ML.ISingleFeaturePredictionTransformer<TModel> predictionTransformer, Microsoft.ML.IDataView data, string labelColumnName = "Label", bool useFeatureWeightFilter = false, int? numberOfExamplesToUse = default, int permutationCount = 1) where TModel : class; + public static System.Collections.Dictionary<string, Microsoft.ML.Data.RegressionMetricsStatistics> PermutationFeatureImportance<TModel> (this Microsoft.ML.RegressionCatalog catalog, Microsoft.ML.IEstimator<Microsoft.ML.ITransformer> estimator, Microsoft.ML.IDataView data, string labelColumnName = "Label", bool useFeatureWeightFilter = false, int? numberOfExamplesToUse = default, int permutationCount = 1) where TModel : class; }
You may find the Framework Design Guidelines helpful.
Usage Examples
This is how it works today: dotnet/machinelearning-modelbuilder#1031 (comment)
Below is how I think it should work. The key things to note is the similarities to CrossValidate API.
Alternative Designs
If there is any opposition or technical challenges for making PFI have a similar API to CrossValidate ... I'm open to alternatives but I don't know the ML.NET APIs well enough to come up with other patterns.
Risks
I think the biggest risk/challenge is that folks can do a lot of things with pipelines and models to make them incompatible with PFI. I believe it takes exponentially longer to calculate PFI relative to number of columns. Certain things like OneHotHash can create hundreds of columns ...
The text was updated successfully, but these errors were encountered: