Need for a sample or clarification on how to use PFI with AutoML in ML.NET #19006

antoniovs1029 · 2020-06-18T19:09:05Z

I've noticed that the users of ML.NET continuously have had problems when using the PermutationFeatureImportance API with models created with AutoML. For example, we have these issues that are related to this problem:

dotnet/machinelearning#5247
dotnet/machinelearning#3972
dotnet/machinelearning#4227
dotnet/machinelearning#4196
dotnet/machinelearning#3976

All of them are caused by the fact that users don't know how to "extract" the linearPredictor (needed for PFI) from a model retrieved from AutoML. Doing it would typically look like this: (as explained here).

           // Get BestRun from AutoML experiment:
           RunDetail<BinaryClassificationMetrics> bestRun = experimentResult.BestRun;

            // Extract the predictor:
            var lastTransformer = ((TransformerChain <ITransformer>)bestRun.Model).LastTransformer;
            var linearPredictor = (ISingleFeaturePredictionTransformer<object>)lastTransformer;

            // Compute the permutation metrics for the linear model using the
            // normalized data.
            var permutationMetrics = mlContext.MulticlassClassification
                .PermutationFeatureImportance(linearPredictor, transformedData,
                permutationCount: 30);

What confuses users is the need for the casts of (TransformerChain<ITransformer>) followed by (ISingleFeaturePredictionTransformer<object>)... and it's understandable since, from my personal experience, the cast to ISingleFeaturePredictionTransformer is only used with PFI, and it's somewhat uncommon, so users are not aware of its existence or use (and I believe it's never mentioned in our docs); so users end up wondering if there's something wrong with their pipeline, with ML.NET, or if we even support using PFI with AutoML.

Aside of using these casts with AutoML, they might also be needed in other cases when working with PFI (such as after loading models from disk, or when calling PFI inside a Method that received the model as an ITransformer), but the opened issues mainly refer to AutoML.

I'm not sure if a whole new article would be needed, but I think that at least some mention to this would be useful in docs such as this one:
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/explain-machine-learning-model-permutation-feature-importance-ml-net#explain-the-model-with-permutation-feature-importance-pfi

Thanks!

The text was updated successfully, but these errors were encountered:

antoniovs1029 · 2020-06-19T18:04:48Z

As shown more recently here: dotnet/machinelearning#5247 (comment) , it seems that extracting the linearPredictor from a model (specially one returned by AutoML) might not be as straightforward.

As shown there, the steps are still similar to the ones I've mentioned on my post above, and the goal is still finding the PredictionTransformer to cast it to (ISingleFeaturePredictionTransformer<object>), but the user might need to inspect the structure of the model to know where to get the prediction transformer from. So I think this realization should also be somehow included in the docs addressing this issue. Thanks!

luisquintanilla · 2020-06-19T19:34:13Z

@antoniovs1029 thanks for providing additional examples / context. Documenting this can certainly go a long way towards unblocking users. Would it also be good to start a thread/issue on how the overall PFI experience for users could be improved on the API side since it's not as straightforward?

antoniovs1029 · 2020-06-19T20:10:16Z

Hi, @luisquintanilla . Thanks for your suggestion. I agree that ML.NET's API for PFI isn't the best, and that this is an area of opportunity for ML.NET, but there's no need to open new threads/issues on ML.NET's repository as there been these 2 issues opened for quiet some time now:

PFI (Permutation Feature Importance) API needs to be simpler to use machinelearning#4216 Where, among the changes suggested to PFI, is to make it more user friendly, and not require so many casts.
[AutoML] feature request Include PFI statistics machinelearning#3783 Where the user is pretty much asking to integrate PFI itself into AutoML so it will automatically report the results of PFI.

I plan to bring this up to @harishsk and see if we can prioritize these feature requests. But in the meantime, and given that it's been a recurrent problem for a over a year, I think that it would be helpful to include some sort of general recommendations in the docs on how to extract the linearPredictor manually.

luisquintanilla · 2020-06-19T20:17:39Z

Sounds good. Thanks

fwaris · 2020-06-21T03:53:02Z

Here are a couple of tricks to ease working with PFI with AutoML output.
The code is F# but equivalent C# should be easy to create.

    let modelFile = @"..\LightGbmBinary.bin"
    let mutable schema = null
    let model = ctx.Model.Load(modelFile, &schema) 
    let scored = model.Transform(trainView)

    let lastTx = (model :?> TransformerChain<ITransformer>).LastTransformer

    //concrete type of lastTx (in my case) is:
    //BinaryPredictionTransformer<Calibrators.CalibratedModelParametersBase<Trainers.LightGbm.LightGbmBinaryModelParameters, Calibrators.PlattCalibrator>>    
     
    //*** Trick #1 - use a generic function to perform duck typing and avoid knowing the concrete type
    let applyPfi<'t when 't : not struct>  (model:ITransformer) scored  =
        let m  = model :?> ISingleFeaturePredictionTransformer<'t>
        ctx.BinaryClassification.PermutationFeatureImportance(m,scored,labelColumnName=labelCol, permutationCount=5)

    let metrics = applyPfi<_> lastTx scored  

    //*** Trick #2 - get the columns under the "Features" column from the GetSlotNames(...) method
    let slotNames (dataView:IDataView) (col:string) = 
        let mutable vbuffer = new VBuffer<System.ReadOnlyMemory<char>>()
        dataView.Schema.[col].GetSlotNames(&vbuffer)
        vbuffer.DenseValues() |> Seq.map string |> Seq.toArray

    let ftrCols = slotNames scored "Features" 

    let paired = Seq.zip metrics ftrCols |> Seq.toArray

    //print out in order of importance
    paired 
        |> Array.sortBy(fun (x,n)->x.AreaUnderRocCurve.Mean) 
        |> Array.iter (fun (x,n) -> printfn "%s - %f" n x.AreaUnderRocCurve.Mean)

antoniovs1029 · 2020-06-22T16:49:24Z

Thanks for your suggestion, @fwaris ! 😄

luisquintanilla · 2020-06-23T14:29:23Z

Thanks @fwaris!

dotnet-bot added the ⌚ Not Triaged Not triaged label Jun 18, 2020

luisquintanilla added the 📚 Area - ML.NET Guide label Jun 18, 2020

jacobthamblett mentioned this issue Jun 18, 2020

PermutationFeatureImportance not working with AutoML API dotnet/machinelearning#5247

Closed

luisquintanilla self-assigned this Jun 19, 2020

luisquintanilla removed the ⌚ Not Triaged Not triaged label Jun 19, 2020

fwaris mentioned this issue Jun 21, 2020

It is not possible to use PermutationFeatureImportance from a model loaded from disk in F# dotnet/machinelearning#3976

Closed

tdykstra added the doc-enhancement Improve the current content [org][type][category] label Jul 23, 2020

antoniovs1029 mentioned this issue Aug 20, 2020

The first example is misleading. dotnet/machinelearning#5356

Closed

PRMerger7 added the Pri3 label Nov 11, 2020

luisquintanilla added the dotnet-ml/svc label Jan 20, 2021

BillWagner removed the 📚 Area - ML.NET Guide label Feb 9, 2021

houghj16 mentioned this issue May 25, 2021

API Proposal: Update PFI API to be easier to use dotnet/machinelearning#5625

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need for a sample or clarification on how to use PFI with AutoML in ML.NET #19006

Need for a sample or clarification on how to use PFI with AutoML in ML.NET #19006

antoniovs1029 commented Jun 18, 2020

antoniovs1029 commented Jun 19, 2020 •

edited

luisquintanilla commented Jun 19, 2020 •

edited

antoniovs1029 commented Jun 19, 2020

luisquintanilla commented Jun 19, 2020

fwaris commented Jun 21, 2020

antoniovs1029 commented Jun 22, 2020

luisquintanilla commented Jun 23, 2020

Need for a sample or clarification on how to use PFI with AutoML in ML.NET #19006

Need for a sample or clarification on how to use PFI with AutoML in ML.NET #19006

Comments

antoniovs1029 commented Jun 18, 2020

antoniovs1029 commented Jun 19, 2020 • edited

luisquintanilla commented Jun 19, 2020 • edited

antoniovs1029 commented Jun 19, 2020

luisquintanilla commented Jun 19, 2020

fwaris commented Jun 21, 2020

antoniovs1029 commented Jun 22, 2020

luisquintanilla commented Jun 23, 2020

antoniovs1029 commented Jun 19, 2020 •

edited

luisquintanilla commented Jun 19, 2020 •

edited