Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need for a sample or clarification on how to use PFI with AutoML in ML.NET #19006

Open
antoniovs1029 opened this issue Jun 18, 2020 · 7 comments
Assignees
Labels
doc-enhancement Improve the current content [org][type][category] dotnet-ml/svc Pri3

Comments

@antoniovs1029
Copy link
Member

I've noticed that the users of ML.NET continuously have had problems when using the PermutationFeatureImportance API with models created with AutoML. For example, we have these issues that are related to this problem:

dotnet/machinelearning#5247
dotnet/machinelearning#3972
dotnet/machinelearning#4227
dotnet/machinelearning#4196
dotnet/machinelearning#3976

All of them are caused by the fact that users don't know how to "extract" the linearPredictor (needed for PFI) from a model retrieved from AutoML. Doing it would typically look like this: (as explained here).

           // Get BestRun from AutoML experiment:
           RunDetail<BinaryClassificationMetrics> bestRun = experimentResult.BestRun;

            // Extract the predictor:
            var lastTransformer = ((TransformerChain <ITransformer>)bestRun.Model).LastTransformer;
            var linearPredictor = (ISingleFeaturePredictionTransformer<object>)lastTransformer;

            // Compute the permutation metrics for the linear model using the
            // normalized data.
            var permutationMetrics = mlContext.MulticlassClassification
                .PermutationFeatureImportance(linearPredictor, transformedData,
                permutationCount: 30);

What confuses users is the need for the casts of (TransformerChain<ITransformer>) followed by (ISingleFeaturePredictionTransformer<object>)... and it's understandable since, from my personal experience, the cast to ISingleFeaturePredictionTransformer is only used with PFI, and it's somewhat uncommon, so users are not aware of its existence or use (and I believe it's never mentioned in our docs); so users end up wondering if there's something wrong with their pipeline, with ML.NET, or if we even support using PFI with AutoML.

Aside of using these casts with AutoML, they might also be needed in other cases when working with PFI (such as after loading models from disk, or when calling PFI inside a Method that received the model as an ITransformer), but the opened issues mainly refer to AutoML.

I'm not sure if a whole new article would be needed, but I think that at least some mention to this would be useful in docs such as this one:
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/explain-machine-learning-model-permutation-feature-importance-ml-net#explain-the-model-with-permutation-feature-importance-pfi

Thanks!

@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Jun 19, 2020

As shown more recently here: dotnet/machinelearning#5247 (comment) , it seems that extracting the linearPredictor from a model (specially one returned by AutoML) might not be as straightforward.

As shown there, the steps are still similar to the ones I've mentioned on my post above, and the goal is still finding the PredictionTransformer to cast it to (ISingleFeaturePredictionTransformer<object>), but the user might need to inspect the structure of the model to know where to get the prediction transformer from. So I think this realization should also be somehow included in the docs addressing this issue. Thanks!

@luisquintanilla
Copy link
Contributor

luisquintanilla commented Jun 19, 2020

@antoniovs1029 thanks for providing additional examples / context. Documenting this can certainly go a long way towards unblocking users. Would it also be good to start a thread/issue on how the overall PFI experience for users could be improved on the API side since it's not as straightforward?

@antoniovs1029
Copy link
Member Author

Hi, @luisquintanilla . Thanks for your suggestion. I agree that ML.NET's API for PFI isn't the best, and that this is an area of opportunity for ML.NET, but there's no need to open new threads/issues on ML.NET's repository as there been these 2 issues opened for quiet some time now:

  1. PFI (Permutation Feature Importance) API needs to be simpler to use machinelearning#4216 Where, among the changes suggested to PFI, is to make it more user friendly, and not require so many casts.

  2. [AutoML] feature request Include PFI statistics machinelearning#3783 Where the user is pretty much asking to integrate PFI itself into AutoML so it will automatically report the results of PFI.

I plan to bring this up to @harishsk and see if we can prioritize these feature requests. But in the meantime, and given that it's been a recurrent problem for a over a year, I think that it would be helpful to include some sort of general recommendations in the docs on how to extract the linearPredictor manually.

@luisquintanilla
Copy link
Contributor

Sounds good. Thanks

@fwaris
Copy link

fwaris commented Jun 21, 2020

Here are a couple of tricks to ease working with PFI with AutoML output.
The code is F# but equivalent C# should be easy to create.

    let modelFile = @"..\LightGbmBinary.bin"
    let mutable schema = null
    let model = ctx.Model.Load(modelFile, &schema) 
    let scored = model.Transform(trainView)

    let lastTx = (model :?> TransformerChain<ITransformer>).LastTransformer

    //concrete type of lastTx (in my case) is:
    //BinaryPredictionTransformer<Calibrators.CalibratedModelParametersBase<Trainers.LightGbm.LightGbmBinaryModelParameters, Calibrators.PlattCalibrator>>    
     
    //*** Trick #1 - use a generic function to perform duck typing and avoid knowing the concrete type
    let applyPfi<'t when 't : not struct>  (model:ITransformer) scored  =
        let m  = model :?> ISingleFeaturePredictionTransformer<'t>
        ctx.BinaryClassification.PermutationFeatureImportance(m,scored,labelColumnName=labelCol, permutationCount=5)

    let metrics = applyPfi<_> lastTx scored  

    //*** Trick #2 - get the columns under the "Features" column from the GetSlotNames(...) method
    let slotNames (dataView:IDataView) (col:string) = 
        let mutable vbuffer = new VBuffer<System.ReadOnlyMemory<char>>()
        dataView.Schema.[col].GetSlotNames(&vbuffer)
        vbuffer.DenseValues() |> Seq.map string |> Seq.toArray

    let ftrCols = slotNames scored "Features" 

    let paired = Seq.zip metrics ftrCols |> Seq.toArray

    //print out in order of importance
    paired 
        |> Array.sortBy(fun (x,n)->x.AreaUnderRocCurve.Mean) 
        |> Array.iter (fun (x,n) -> printfn "%s - %f" n x.AreaUnderRocCurve.Mean)

@antoniovs1029
Copy link
Member Author

Thanks for your suggestion, @fwaris ! 😄

@luisquintanilla
Copy link
Contributor

Thanks @fwaris!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-enhancement Improve the current content [org][type][category] dotnet-ml/svc Pri3
Projects
None yet
Development

No branches or pull requests

7 participants