-
Notifications
You must be signed in to change notification settings - Fork 6k
Need for a sample or clarification on how to use PFI with AutoML in ML.NET #19006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As shown more recently here: dotnet/machinelearning#5247 (comment) , it seems that extracting the As shown there, the steps are still similar to the ones I've mentioned on my post above, and the goal is still finding the PredictionTransformer to cast it to |
@antoniovs1029 thanks for providing additional examples / context. Documenting this can certainly go a long way towards unblocking users. Would it also be good to start a thread/issue on how the overall PFI experience for users could be improved on the API side since it's not as straightforward? |
Hi, @luisquintanilla . Thanks for your suggestion. I agree that ML.NET's API for PFI isn't the best, and that this is an area of opportunity for ML.NET, but there's no need to open new threads/issues on ML.NET's repository as there been these 2 issues opened for quiet some time now:
I plan to bring this up to @harishsk and see if we can prioritize these feature requests. But in the meantime, and given that it's been a recurrent problem for a over a year, I think that it would be helpful to include some sort of general recommendations in the docs on how to extract the |
Sounds good. Thanks |
Here are a couple of tricks to ease working with PFI with AutoML output. let modelFile = @"..\LightGbmBinary.bin"
let mutable schema = null
let model = ctx.Model.Load(modelFile, &schema)
let scored = model.Transform(trainView)
let lastTx = (model :?> TransformerChain<ITransformer>).LastTransformer
//concrete type of lastTx (in my case) is:
//BinaryPredictionTransformer<Calibrators.CalibratedModelParametersBase<Trainers.LightGbm.LightGbmBinaryModelParameters, Calibrators.PlattCalibrator>>
//*** Trick #1 - use a generic function to perform duck typing and avoid knowing the concrete type
let applyPfi<'t when 't : not struct> (model:ITransformer) scored =
let m = model :?> ISingleFeaturePredictionTransformer<'t>
ctx.BinaryClassification.PermutationFeatureImportance(m,scored,labelColumnName=labelCol, permutationCount=5)
let metrics = applyPfi<_> lastTx scored
//*** Trick #2 - get the columns under the "Features" column from the GetSlotNames(...) method
let slotNames (dataView:IDataView) (col:string) =
let mutable vbuffer = new VBuffer<System.ReadOnlyMemory<char>>()
dataView.Schema.[col].GetSlotNames(&vbuffer)
vbuffer.DenseValues() |> Seq.map string |> Seq.toArray
let ftrCols = slotNames scored "Features"
let paired = Seq.zip metrics ftrCols |> Seq.toArray
//print out in order of importance
paired
|> Array.sortBy(fun (x,n)->x.AreaUnderRocCurve.Mean)
|> Array.iter (fun (x,n) -> printfn "%s - %f" n x.AreaUnderRocCurve.Mean) |
Thanks for your suggestion, @fwaris ! 😄 |
Thanks @fwaris! |
I've noticed that the users of ML.NET continuously have had problems when using the
PermutationFeatureImportance
API with models created withAutoML
. For example, we have these issues that are related to this problem:dotnet/machinelearning#5247
dotnet/machinelearning#3972
dotnet/machinelearning#4227
dotnet/machinelearning#4196
dotnet/machinelearning#3976
All of them are caused by the fact that users don't know how to "extract" the
linearPredictor
(needed for PFI) from a model retrieved from AutoML. Doing it would typically look like this: (as explained here).What confuses users is the need for the casts of
(TransformerChain<ITransformer>)
followed by(ISingleFeaturePredictionTransformer<object>)
... and it's understandable since, from my personal experience, the cast toISingleFeaturePredictionTransformer
is only used with PFI, and it's somewhat uncommon, so users are not aware of its existence or use (and I believe it's never mentioned in our docs); so users end up wondering if there's something wrong with their pipeline, with ML.NET, or if we even support using PFI with AutoML.Aside of using these casts with AutoML, they might also be needed in other cases when working with PFI (such as after loading models from disk, or when calling PFI inside a Method that received the model as an
ITransformer
), but the opened issues mainly refer to AutoML.I'm not sure if a whole new article would be needed, but I think that at least some mention to this would be useful in docs such as this one:
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/explain-machine-learning-model-permutation-feature-importance-ml-net#explain-the-model-with-permutation-feature-importance-pfi
Thanks!
The text was updated successfully, but these errors were encountered: