-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix Recursive Parsing for PFI #6084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @ericjohannsen, This issue was actually fixed just yesterday by pr #6085. You can try building the latest from main and see if that fixes your issue, or you can the latest preview package here, https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json. Closing this issue for now as it should be resolved, but please re-open the issue if it isn't resolved. |
Is there any further details on the preview package route or any estimate when a released package might be available? I added that JSON as a NuGet package source, but the latest available version of AutoML and Microsoft.ML from that source match the latest on NuGet and both have this issue as well. I had hoped to include explainability in a user group talk / workshop I'm giving Thursday on AutoML, but it looks like that might be too optimistic if I wanted folks to be able to follow along. |
@michaelgsharp @IntegerMan
|
@michaelgsharp Any word on this? |
In the preview version instead of passing in We are planning on doing a servicing release which will include this bug fix next week as well. |
@michaelgsharp I was already passing in bestRun.Model var bestModel = bestRun.Model;
var permutationFeatureImportance = mlContext
.Regression
.PermutationFeatureImportance(bestRun.Model, testDataViewWithBestScore, permutationCount: 3); |
This issue has been automatically marked |
@ericjohannsen the servicing release is out with the fix for this. Can you try with the latest 1.7.1 and let me know if it solves your issue? |
@michaelgsharp No joy with 1.7.1. If it helps: Let me know what other information might be helpful diagnosing this. |
Hmm.. Could you share a couple lines of the data you used so that I can run it on my end and see what I can find? |
@michaelgsharp Sure. I changed the headers to obfuscate the domain. Not sure whether it matters, but I'm using cross-validation. |
Alright, figured it out. There are 2 things that need to change and both are in the call to PFI. The first is that you are calling PFI in the The second is that PFI re-runs your trainer on the data. This means that the data needs to have the correct columns for the trainer. The easiest way to do this is just to use your pipeline to transform the data you pass into PFI. (Your data may already be correct. I didn't have access to see exactly what your
|
@michaelgsharp Still seeing the issue. I've attached a minimal reproducible example |
Alright @ericjohannsen. Good news is that I have figured out the issue and why I had it working for me before and how to workaround it in your case. Bad news is that it means there is another bug in how we are extracting the transformer. In my original testing I didn't have the pre-featurizer that you do because it had a Type T and I didn't know what it was so I just skipped it. This caused In your case here is how to workaround it. First remove the extra transform call on line 75. You can directly use Finally, we need to extract the transformer chain that is nested inside bestRun.Model. If you use this cast it will get it correctly. All in all the final lines should look like this:
Can you try this and make sure its working for you? In this example PFI should run pretty quick, but with lots of features it does take some time to run so just be prepared for that. |
Success! @michaelgsharp Thank you for sticking with this, and for the detailed explanation. Side note: The PFI calculation took quite a while on my Ryzen 9 3900 system, and I noticed only one CPU core is utilized. |
Only 1 core is used? I haven't looked too much into how we are actually doing the calculations, but it seems like there is potential for some serious optimizations there. I'm going to rename this issue to track fixing the recursive parsing issue for PFI. I'm glad we were able to get this solved! |
@michaelgsharp are we okay to close this issue? |
System Information (please complete the following information):
Describe the bug
ArgumentNullException while attempting to retrieve PFI
The model provided does not have a compatible predictor (Parameter 'lastTransformer')
Stack trace
at Microsoft.ML.Runtime.Contracts.CheckValue[T](IExceptionContext ctx, T val, String paramName, String msg)
at Microsoft.ML.PermutationFeatureImportanceExtensions.PermutationFeatureImportance[TMetric,TResult](IHostEnvironment env, ITransformer model, IDataView data, Func
1 resultInitializer, Func
2 evaluationFunc, Func3 deltaFunc, Int32 permutationCount, Boolean useFeatureWeightFilter, Nullable
1 numberOfExamplesToUse)at Microsoft.ML.PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog catalog, ITransformer model, IDataView data, String labelColumnName, Boolean useFeatureWeightFilter, Nullable`1 numberOfExamplesToUse, Int32 permutationCount)
To Reproduce
Not sure how best to create a minimal reproducible example including data, but here's the core of what I did (based on #5934)
The
LastTransformer
ofbestRun.Model
isBinaryPredictionTransformer<Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.lightGbm.LightGbmBinaryModelParameters, ...>>
Expected behavior
Retrieve the PFI information.
The text was updated successfully, but these errors were encountered: