PFI (Permutation Feature Importance) API needs to be simpler to use #4216

CESARDELATORRE · 2019-09-15T22:09:36Z

PFI (Permutation Feature Importance) API needs to be simpler to use

1. First, it is awkward to need to access to the LastTransformer method from the model (Chain of Transformers). In addition, if you are using additional methods to structure your training, evaluation and PFI calculation and try to pass the model as an ITransformer (the usual way) you need to cast it back to the concrete type of transformer chain (such as TransformerChain<RegressionPredictionTransformer<LightGbmRegressionModelParameters>>), which then requires a hard reference to the type of algorithm used.

This is the code to calculate the PFI metrics:

// Make predictions (Transform the dataset)
IDataView transformedData = trainedModel.Transform(trainingDataView);

// Extract the trainer (last transformer in the model)
var singleLightGbmModel = (trainedModel as TransformerChain<RegressionPredictionTransformer<LightGbmRegressionModelParameters>>).LastTransformer;

// or simpler if the trainedModel was 'var' right after the call to Fit(): 
// var singleLightGbmModel = trainedModel.LastTransformer;

//Calculate Feature Permutation
ImmutableArray<RegressionMetricsStatistics> permutationMetrics =
                                mlContext
                                    .Regression.PermutationFeatureImportance(predictionTransformer: singleLightGbmModel,
                                                                                data: transformedData,
                                                                                labelColumnName: "fare_amount",  
                                                                                numberOfExamplesToUse: 100,
                                                                                permutationCount: 1);

Needing to only use/provide the last transformer feels a bit convoluted...
The API should be simpler to use here and make such a thing transparent to the user?

2. Second, once you get the permutation metrics (such as ImmutableArray<RegressionMetricsStatistics> permutationMetrics), you only get the values based on the indexes, but you don't have the names of the input columns. It is then not straightforward to correlate it to the input column names since you need to use the indexes to be used across two separated arrays that , if sorted previously, it won't match...

You need to do something like the following or comparable loops driven by the indexes in the permutationMetrics array:

First, obtain all the column names used in the PFI process and exclude the ones not used:

var usedColumnNamesInPFI = dataView.Schema
                    .Where(col => (col.Name != "SamplingKeyColumn") && (col.Name != "Features") && (col.Name != "Score"))
                    .Select(col => col.Name);

Then you need to correlate and find the column names based on the indexes in the permutationMetrics:

            var results = usedColumnNamesInPFI
                .Select((t, i) => new FeatureImportance
                {
                    Name = t,
                    RSquaredMean = Math.Abs(permutationMetrics[i].RSquared.Mean)
                })
                .OrderByDescending(x => x.RSquaredMean);

This should be directly provided by the API and you'd simply need to access it and show it.
The current code feels very much convoluted...

The text was updated successfully, but these errors were encountered:

antoniovs1029 · 2019-09-16T23:45:18Z

Hi. I was getting started to learn about PFI on ML.net, and I was confused by what you state in the first part of this issue.

I am actually able to run successfully the example given in the docs of PFI for Regression. I was also able to run the test of PFI you mentioned, without any failure, and I used the debugger to see if everything went as expected on that test, and it appears to me that it works just fine.

In general, I think the following is right (and it is pretty much what is done in the documentation and in the test case):

var model = pipeline.Fit(data);

var transformedData = model.Transform(data);

var featureImportance = context.Regression.PermutationFeatureImportance(model.LastTransformer, transformedData);

I believe it's right since there's no problem when accessing the model.LastTransformer attribute in that case, when 'model' is returned by the .Fit() method, since 'model' would be already of type TransformerChain<TLastTransformer> and so it does have direct access to the LastTransformer without doing any casting.

However, I do see that the problem you mention appears when passing the model as an ITransformer parameter of a method (for example, as done in one of the samples) and so I understand the need of a simpler API in such cases.

In short, I wanted to point out that I believe that the documentation and test you cited actually work; and so I wanted to ask you if I am missing something to fully understand and replicate the problems you described about them.

Thanks.

CESARDELATORRE · 2019-09-17T01:44:37Z

@antoniovs1029 - You're right, I thought I also tested it directly in the same root method, but I just tried now and it works with no issues.
So, the issue only happens when passing the model as an ITransformer parameter of a method which is the normal case (instead the specific Chain of Transformer). I'm updating my comment above. Thanks for checking this out! 👍

cyberkoolman · 2019-09-17T01:50:18Z

I especially agree on Cesar's #2 point. Most of PFI samples I found seemed to use only numeric feature columns and you can map back to the original feature columns after PFI permutation with index values. Not perfect, but still okay. However, the most real-world dataset will include both numerical and categorical features and if you would employ OneHotEncoding for categorical columns then complexity increases drastically. I had to find the way by debugging through the runtime and figured out by examining Slots and felt code becomes unnecessarily complex.

nighotatul · 2019-09-17T14:59:51Z

Hi.its works in AutoML also.
because I have a trained model and now trying to retrieve the feature weights. None of the objects returned expose a LastTransformer then I want to get the PFI information and I get stuck. There appears no way to get the LastTransformer object from the trainedModel.

The following cast lets me access the LastTransformer, however I cannot use it for PFI until I provide a better type for predictor. Debugging I can see it is of type Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.IPredictorProducing> but I am unable to cast to that because Microsoft.ML.IPredictorProducing is not visible, so it seems like we're still stuck.

//setup code similar to famschopman
RegressionExperiment experiment = mlContext.Auto().CreateRegressionExperiment(experimentSettings);

var experimentResults = experiment.Execute(split.TrainSet, split.TestSet);
var predictor = ((TransformerChain)experimentResults.BestRun.Model).LastTransformer;

//this will not compile.
var permutationMetrics = mlContext.Regression.PermutationFeatureImportance(predictor, transformedData, permutationCount: 30);

The following compile error is produced.

The type arguments for method 'PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog, ISingleFeaturePredictionTransformer, IDataView, string, bool, int?, int)' cannot be inferred from the usage. Try specifying the type arguments explicitly.

how we get bias and weight using PFI?

artemiusgreat · 2019-12-23T06:44:30Z

The easiest way to fix it would be probably to add some method that knows how to extract appropriate transformer from a TransformationChain, aka Model.

#1 Working example with Multiclass.LightGbm and model produced by trainer-estimator in a real-time

IDataView data = _context.Data.LoadFromTextFile<MyInputModel>();
IEstimator<ITransformer> estimator = _context
        .MulticlassClassification
        .Trainers
        .LightGbm(labelColumnName: "Label", featureColumnName: "Features")
        .Append(_context.Transforms.Conversion.MapKeyToValue(new[] { new InputOutputColumnPair("Prediction", "PredictedLabel") }));
ITransformer model = estimator.Fit(modifications);
var permutations = _context
        .MulticlassClassification
        .PermutationFeatureImportance(model, data, permutationCount: 3);

#2 Broken example with previously saved model

IDataView data = _context.Data.LoadFromTextFile<MyInputModel>();
ITransformer model = _context.Model.Load("C:/Model.zip", out var schema);
var permutations = _context
        .MulticlassClassification
        .PermutationFeatureImportance(model, data, permutationCount: 3);

The type arguments for method 'PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog, ISingleFeaturePredictionTransformer, IDataView, string, bool, int?, int)' cannot be inferred from the usage. Try specifying the type arguments explicitly.

Hacky solution - replace model with a predictor of a specific type

// My model (TransformerChain) contains 2 transformers - LightGbmTrainer and MapKeyToValue 
// PFI requires the first one, but I wouldn't like to hardcode things like First(), Last(), LastTransformer 
// So, turn model to IEnumerable<dynamic>, filter by OfType<TModel>, get First relevant type

IDataView data = _context.Data.LoadFromTextFile<MyInputModel>();
ITransformer model = _context.Model.Load("C:/Model.zip", out var schema);
var predictor = (model as IEnumerable<dynamic>).OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>().FirstOrDefault(); // would be good to move this call to the appropriate extension**  
var permutations = _context
        .MulticlassClassification
        .PermutationFeatureImportance(predictor, data, permutationCount: 3);

R0Wi · 2021-03-22T08:06:06Z

@artemiusgreat the term

var predictor = (model as IEnumerable<dynamic>).OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>().FirstOrDefault();

can be simplified to

 var predictor = model.OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>().FirstOrDefault();

but of course the problem still exists that we have to know exactly which training algorithm was used inside the trained model. I think in this case something more dynamic has to be used to infer the concrete type parameters at runtime, so generics might not be the right way to go in that scenario?

michaelgsharp · 2021-10-27T19:12:37Z

Alright, this should be resolved now. I added a new API with this PR #5934 based on this issue #5625. I'm going to close this issue for now, but if its not addressed by the PR please let us know.

CESARDELATORRE added enhancement New feature or request explainability labels Sep 15, 2019

codemzs assigned antoniovs1029 Oct 2, 2019

antoniovs1029 removed the explainability label Dec 19, 2019

ganik added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jan 10, 2020

antoniovs1029 removed their assignment Jan 10, 2020

antoniovs1029 added the API Issues pertaining the friendly API label Jan 10, 2020

antoniovs1029 mentioned this issue Jun 19, 2020

Need for a sample or clarification on how to use PFI with AutoML in ML.NET dotnet/docs#19006

Open

JakeRadMSFT mentioned this issue Feb 16, 2021

API Proposal: Update PFI API to be easier to use #5625

Closed

michaelgsharp closed this as completed Oct 27, 2021

dotnet locked as resolved and limited conversation to collaborators Mar 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PFI (Permutation Feature Importance) API needs to be simpler to use #4216

PFI (Permutation Feature Importance) API needs to be simpler to use #4216

CESARDELATORRE commented Sep 15, 2019 •

edited

antoniovs1029 commented Sep 16, 2019 •

edited

CESARDELATORRE commented Sep 17, 2019

cyberkoolman commented Sep 17, 2019

nighotatul commented Sep 17, 2019

artemiusgreat commented Dec 23, 2019 •

edited

R0Wi commented Mar 22, 2021

michaelgsharp commented Oct 27, 2021

PFI (Permutation Feature Importance) API needs to be simpler to use #4216

PFI (Permutation Feature Importance) API needs to be simpler to use #4216

Comments

CESARDELATORRE commented Sep 15, 2019 • edited

antoniovs1029 commented Sep 16, 2019 • edited

CESARDELATORRE commented Sep 17, 2019

cyberkoolman commented Sep 17, 2019

nighotatul commented Sep 17, 2019

artemiusgreat commented Dec 23, 2019 • edited

R0Wi commented Mar 22, 2021

michaelgsharp commented Oct 27, 2021

CESARDELATORRE commented Sep 15, 2019 •

edited

antoniovs1029 commented Sep 16, 2019 •

edited

artemiusgreat commented Dec 23, 2019 •

edited