Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PFI (Permutation Feature Importance) API needs to be simpler to use #4216

Closed
CESARDELATORRE opened this issue Sep 15, 2019 · 7 comments
Closed
Labels
API Issues pertaining the friendly API enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@CESARDELATORRE
Copy link
Contributor

CESARDELATORRE commented Sep 15, 2019

PFI (Permutation Feature Importance) API needs to be simpler to use

1. First, it is awkward to need to access to the LastTransformer method from the model (Chain of Transformers). In addition, if you are using additional methods to structure your training, evaluation and PFI calculation and try to pass the model as an ITransformer (the usual way) you need to cast it back to the concrete type of transformer chain (such as TransformerChain<RegressionPredictionTransformer<LightGbmRegressionModelParameters>>), which then requires a hard reference to the type of algorithm used.

This is the code to calculate the PFI metrics:

// Make predictions (Transform the dataset)
IDataView transformedData = trainedModel.Transform(trainingDataView);

// Extract the trainer (last transformer in the model)
var singleLightGbmModel = (trainedModel as TransformerChain<RegressionPredictionTransformer<LightGbmRegressionModelParameters>>).LastTransformer;

// or simpler if the trainedModel was 'var' right after the call to Fit(): 
// var singleLightGbmModel = trainedModel.LastTransformer;

//Calculate Feature Permutation
ImmutableArray<RegressionMetricsStatistics> permutationMetrics =
                                mlContext
                                    .Regression.PermutationFeatureImportance(predictionTransformer: singleLightGbmModel,
                                                                                data: transformedData,
                                                                                labelColumnName: "fare_amount",  
                                                                                numberOfExamplesToUse: 100,
                                                                                permutationCount: 1);

Needing to only use/provide the last transformer feels a bit convoluted...
The API should be simpler to use here and make such a thing transparent to the user?

2. Second, once you get the permutation metrics (such as ImmutableArray<RegressionMetricsStatistics> permutationMetrics), you only get the values based on the indexes, but you don't have the names of the input columns. It is then not straightforward to correlate it to the input column names since you need to use the indexes to be used across two separated arrays that , if sorted previously, it won't match...

You need to do something like the following or comparable loops driven by the indexes in the permutationMetrics array:

First, obtain all the column names used in the PFI process and exclude the ones not used:

var usedColumnNamesInPFI = dataView.Schema
                    .Where(col => (col.Name != "SamplingKeyColumn") && (col.Name != "Features") && (col.Name != "Score"))
                    .Select(col => col.Name);

Then you need to correlate and find the column names based on the indexes in the permutationMetrics:

            var results = usedColumnNamesInPFI
                .Select((t, i) => new FeatureImportance
                {
                    Name = t,
                    RSquaredMean = Math.Abs(permutationMetrics[i].RSquared.Mean)
                })
                .OrderByDescending(x => x.RSquaredMean);

This should be directly provided by the API and you'd simply need to access it and show it.
The current code feels very much convoluted...

@CESARDELATORRE CESARDELATORRE added enhancement New feature or request explainability labels Sep 15, 2019
@antoniovs1029
Copy link
Member

antoniovs1029 commented Sep 16, 2019

Hi. I was getting started to learn about PFI on ML.net, and I was confused by what you state in the first part of this issue.

I am actually able to run successfully the example given in the docs of PFI for Regression. I was also able to run the test of PFI you mentioned, without any failure, and I used the debugger to see if everything went as expected on that test, and it appears to me that it works just fine.

In general, I think the following is right (and it is pretty much what is done in the documentation and in the test case):

var model = pipeline.Fit(data);

var transformedData = model.Transform(data);

var featureImportance = context.Regression.PermutationFeatureImportance(model.LastTransformer, transformedData);

I believe it's right since there's no problem when accessing the model.LastTransformer attribute in that case, when 'model' is returned by the .Fit() method, since 'model' would be already of type TransformerChain<TLastTransformer> and so it does have direct access to the LastTransformer without doing any casting.

However, I do see that the problem you mention appears when passing the model as an ITransformer parameter of a method (for example, as done in one of the samples) and so I understand the need of a simpler API in such cases.

In short, I wanted to point out that I believe that the documentation and test you cited actually work; and so I wanted to ask you if I am missing something to fully understand and replicate the problems you described about them.

Thanks.

@CESARDELATORRE
Copy link
Contributor Author

@antoniovs1029 - You're right, I thought I also tested it directly in the same root method, but I just tried now and it works with no issues.
So, the issue only happens when passing the model as an ITransformer parameter of a method which is the normal case (instead the specific Chain of Transformer). I'm updating my comment above. Thanks for checking this out! 👍

@cyberkoolman
Copy link

I especially agree on Cesar's #2 point. Most of PFI samples I found seemed to use only numeric feature columns and you can map back to the original feature columns after PFI permutation with index values. Not perfect, but still okay. However, the most real-world dataset will include both numerical and categorical features and if you would employ OneHotEncoding for categorical columns then complexity increases drastically. I had to find the way by debugging through the runtime and figured out by examining Slots and felt code becomes unnecessarily complex.

@nighotatul
Copy link

Hi.its works in AutoML also.
because I have a trained model and now trying to retrieve the feature weights. None of the objects returned expose a LastTransformer then I want to get the PFI information and I get stuck. There appears no way to get the LastTransformer object from the trainedModel.

The following cast lets me access the LastTransformer, however I cannot use it for PFI until I provide a better type for predictor. Debugging I can see it is of type Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.IPredictorProducing> but I am unable to cast to that because Microsoft.ML.IPredictorProducing is not visible, so it seems like we're still stuck.

//setup code similar to famschopman
RegressionExperiment experiment = mlContext.Auto().CreateRegressionExperiment(experimentSettings);

var experimentResults = experiment.Execute(split.TrainSet, split.TestSet);
var predictor = ((TransformerChain)experimentResults.BestRun.Model).LastTransformer;

//this will not compile.
var permutationMetrics = mlContext.Regression.PermutationFeatureImportance(predictor, transformedData, permutationCount: 30);

The following compile error is produced.

The type arguments for method 'PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog, ISingleFeaturePredictionTransformer, IDataView, string, bool, int?, int)' cannot be inferred from the usage. Try specifying the type arguments explicitly.

how we get bias and weight using PFI?

@artemiusgreat
Copy link

artemiusgreat commented Dec 23, 2019

The easiest way to fix it would be probably to add some method that knows how to extract appropriate transformer from a TransformationChain, aka Model.

#1 Working example with Multiclass.LightGbm and model produced by trainer-estimator in a real-time

IDataView data = _context.Data.LoadFromTextFile<MyInputModel>();
IEstimator<ITransformer> estimator = _context
        .MulticlassClassification
        .Trainers
        .LightGbm(labelColumnName: "Label", featureColumnName: "Features")
        .Append(_context.Transforms.Conversion.MapKeyToValue(new[] { new InputOutputColumnPair("Prediction", "PredictedLabel") }));
ITransformer model = estimator.Fit(modifications);
var permutations = _context
        .MulticlassClassification
        .PermutationFeatureImportance(model, data, permutationCount: 3);

#2 Broken example with previously saved model

IDataView data = _context.Data.LoadFromTextFile<MyInputModel>();
ITransformer model = _context.Model.Load("C:/Model.zip", out var schema);
var permutations = _context
        .MulticlassClassification
        .PermutationFeatureImportance(model, data, permutationCount: 3);

The type arguments for method 'PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog, ISingleFeaturePredictionTransformer, IDataView, string, bool, int?, int)' cannot be inferred from the usage. Try specifying the type arguments explicitly.

Hacky solution - replace model with a predictor of a specific type

// My model (TransformerChain) contains 2 transformers - LightGbmTrainer and MapKeyToValue 
// PFI requires the first one, but I wouldn't like to hardcode things like First(), Last(), LastTransformer 
// So, turn model to IEnumerable<dynamic>, filter by OfType<TModel>, get First relevant type

IDataView data = _context.Data.LoadFromTextFile<MyInputModel>();
ITransformer model = _context.Model.Load("C:/Model.zip", out var schema);
var predictor = (model as IEnumerable<dynamic>).OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>().FirstOrDefault(); // would be good to move this call to the appropriate extension**  
var permutations = _context
        .MulticlassClassification
        .PermutationFeatureImportance(predictor, data, permutationCount: 3);

@ganik ganik added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jan 10, 2020
@antoniovs1029 antoniovs1029 removed their assignment Jan 10, 2020
@antoniovs1029 antoniovs1029 added the API Issues pertaining the friendly API label Jan 10, 2020
@R0Wi
Copy link
Contributor

R0Wi commented Mar 22, 2021

@artemiusgreat the term

var predictor = (model as IEnumerable<dynamic>).OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>().FirstOrDefault(); 

can be simplified to

 var predictor = model.OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>().FirstOrDefault();

but of course the problem still exists that we have to know exactly which training algorithm was used inside the trained model. I think in this case something more dynamic has to be used to infer the concrete type parameters at runtime, so generics might not be the right way to go in that scenario?

@michaelgsharp
Copy link
Member

Alright, this should be resolved now. I added a new API with this PR #5934 based on this issue #5625. I'm going to close this issue for now, but if its not addressed by the PR please let us know.

@dotnet dotnet locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

No branches or pull requests

8 participants