Fix Recursive Parsing for PFI #6084

ericjohannsen · 2022-02-14T23:44:58Z

System Information (please complete the following information):

Windows 11
ML.NET Version 1.7, AutoML 0.19.0
.NET Version: .NET Core 3.1

Describe the bug
ArgumentNullException while attempting to retrieve PFI
The model provided does not have a compatible predictor (Parameter 'lastTransformer')

Stack trace
at Microsoft.ML.Runtime.Contracts.CheckValue[T](IExceptionContext ctx, T val, String paramName, String msg)
at Microsoft.ML.PermutationFeatureImportanceExtensions.PermutationFeatureImportance[TMetric,TResult](IHostEnvironment env, ITransformer model, IDataView data, Func1 resultInitializer, Func2 evaluationFunc, Func3 deltaFunc, Int32 permutationCount, Boolean useFeatureWeightFilter, Nullable1 numberOfExamplesToUse)
at Microsoft.ML.PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog catalog, ITransformer model, IDataView data, String labelColumnName, Boolean useFeatureWeightFilter, Nullable`1 numberOfExamplesToUse, Int32 permutationCount)

To Reproduce

Not sure how best to create a minimal reproducible example including data, but here's the core of what I did (based on #5934)

MLContext mlContext = new MLContext();

ColumnInferenceResults columnInference = mlContext.Auto().InferColumns(trainDataPath, "Label", groupColumns: false);

TextLoader textLoader = mlContext.Data.CreateTextLoader(columnInference.TextLoaderOptions);
IDataView trainDataView = textLoader.Load(trainDataPath);
IDataView testDataView = textLoader.Load(testDataPath);

IEstimator<ITransformer> preFeaturizer = 
	mlContext.Transforms.Categorical.OneHotEncoding(
		new T().GetOneHotInputColumnNames().Select(_ => new InputOutputColumnPair(_)).ToArray()
		);

ColumnInformation columnInformation = columnInference.ColumnInformation;
columnInformation.IgnoredColumnNames.AddIfMissing("Foo");
columnInformation.CategoricalColumnNames.Remove("Foo");
columnInformation.NumericColumnNames.Remove("Foo");

BinaryExperimentSettings experimentSettings = new BinaryExperimentSettings()
{
	MaxExperimentTimeInSeconds = experimentTime,
	OptimizingMetric = BinaryClassificationMetric.F1Score
};

var experiment = mlContext.Auto().CreateBinaryClassificationExperiment(experimentSettings);

ExperimentResult<BinaryClassificationMetrics> experimentResult = experiment.Execute(trainDataView, columnInformation, preFeaturizer, progress);

RunDetail<BinaryClassificationMetrics> bestRun = experimentResult.BestRun;

// Exception thrown here
var permutationFeatureImportance = mlContext
	   .Regression
	   .PermutationFeatureImportance(bestRun.Model, testDataViewWithBestScore, permutationCount: 3);

The LastTransformer of bestRun.Model is BinaryPredictionTransformer<Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.lightGbm.LightGbmBinaryModelParameters, ...>>

Expected behavior
Retrieve the PFI information.

The text was updated successfully, but these errors were encountered:

michaelgsharp · 2022-02-15T18:29:07Z

Hey @ericjohannsen,

This issue was actually fixed just yesterday by pr #6085. You can try building the latest from main and see if that fixes your issue, or you can the latest preview package here, https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json.

Closing this issue for now as it should be resolved, but please re-open the issue if it isn't resolved.

IntegerMan · 2022-02-20T20:03:13Z

Is there any further details on the preview package route or any estimate when a released package might be available? I added that JSON as a NuGet package source, but the latest available version of AutoML and Microsoft.ML from that source match the latest on NuGet and both have this issue as well.

I had hoped to include explainability in a user group talk / workshop I'm giving Thursday on AutoML, but it looks like that might be too optimistic if I wanted folks to be able to follow along.

ericjohannsen · 2022-02-20T20:22:58Z

@michaelgsharp @IntegerMan
I was able to reference Microsoft.ML 2.0.0-preview.22115.2 and Microsoft.ML.AutoML 0.20.0-preview.22115.2 using the source on Azure, but still encounter the same problem.

<PackageReference Include="Microsoft.ML" Version="2.0.0-preview.22115.2" />
<PackageReference Include="Microsoft.ML.AutoML" Version="0.20.0-preview.22115.2" />

ericjohannsen · 2022-03-01T04:44:43Z

@michaelgsharp Any word on this?

michaelgsharp · 2022-03-02T18:11:16Z

In the preview version instead of passing in bestRun.Model.LastTransformer, you should just pass in bestRun.Model. The PFI code should automatically pull out the transformer that it needs. Can you try doing that and let me know what happens? I'll repoen this issue for now.

We are planning on doing a servicing release which will include this bug fix next week as well.

ericjohannsen · 2022-03-05T18:14:31Z

@michaelgsharp I was already passing in bestRun.Model

var bestModel = bestRun.Model;

var permutationFeatureImportance = mlContext
	   .Regression
	   .PermutationFeatureImportance(bestRun.Model, testDataViewWithBestScore, permutationCount: 3);

ghost · 2022-03-19T21:00:54Z

This issue has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

michaelgsharp · 2022-03-22T21:24:42Z

@ericjohannsen the servicing release is out with the fix for this. Can you try with the latest 1.7.1 and let me know if it solves your issue?

ericjohannsen · 2022-03-23T03:35:54Z

@michaelgsharp No joy with 1.7.1. If it helps:

Let me know what other information might be helpful diagnosing this.

michaelgsharp · 2022-03-24T00:11:28Z

Hmm.. Could you share a couple lines of the data you used so that I can run it on my end and see what I can find?

ericjohannsen · 2022-03-24T01:10:21Z

@michaelgsharp Sure. I changed the headers to obfuscate the domain. Not sure whether it matters, but I'm using cross-validation.

TheData.zip

michaelgsharp · 2022-03-28T21:23:11Z

Alright, figured it out. There are 2 things that need to change and both are in the call to PFI.

The first is that you are calling PFI in the Regression catalog, but you are using a Binary trainer. You need to make sure the catalog you use for PFI and the trainer are the same. In this case, PFI in the binary catalog has a slightly different name.

The second is that PFI re-runs your trainer on the data. This means that the data needs to have the correct columns for the trainer. The easiest way to do this is just to use your pipeline to transform the data you pass into PFI. (Your data may already be correct. I didn't have access to see exactly what your testDataViewWithBestScore actually is, so you can try it without it to see, but if it doesn't work you will have to transform it).

var permutationFeatureImportance = mlContext
	   .BinaryClassification
	   .PermutationFeatureImportanceNonCalibrated(bestRun.Model, bestRun.Model.Transform(testDataViewWithBestScore), permutationCount: 3);
	   ```

Try that out and let me know how it goes.

ericjohannsen · 2022-03-29T22:25:04Z

@michaelgsharp Still seeing the issue. I've attached a minimal reproducible example
MlMre.zip
.

michaelgsharp · 2022-03-29T23:25:06Z

Alright @ericjohannsen. Good news is that I have figured out the issue and why I had it working for me before and how to workaround it in your case. Bad news is that it means there is another bug in how we are extracting the transformer.

In my original testing I didn't have the pre-featurizer that you do because it had a Type T and I didn't know what it was so I just skipped it. This caused bestRun.Model to return a single transformer chain of normal transformers. In your code and the new example you sent there is a pre-featurizer. AutoML is concatenating the pre-featurizer pipeline with the model they generate. This causes bestRun.Model to return a transformer chain but this time the last "transformer" in the chain is another transformer chain instead of a normal transformer. We aren't doing recursive checking to find the transformer (100% my mistake as I didn't even think of this scenario), so its missing the transformer. I'll get this fixed.

In your case here is how to workaround it.

First remove the extra transform call on line 75. You can directly use testDataViewWithBestScores and don't need to do this
var transformedBestScore = bestRun.Model.Transform(testDataViewWithBestScore);.

Finally, we need to extract the transformer chain that is nested inside bestRun.Model. If you use this cast it will get it correctly. (bestRun.Model as TransformerChain<ITransformer>).LastTransformer

All in all the final lines should look like this:

// Evaluate test data

IDataView testDataViewWithBestScore = bestRun.Model.Transform(testDataView);

// Feature Importance (PFI)  https://github.com/dotnet/machinelearning/issues/6084

Console.WriteLine("Get most important features");

var permutationFeatureImportance = mlContext
    .BinaryClassification
    .PermutationFeatureImportanceNonCalibrated((bestRun.Model as TransformerChain<ITransformer>).LastTransformer, testDataViewWithBestScore, permutationCount: 3);

Can you try this and make sure its working for you? In this example PFI should run pretty quick, but with lots of features it does take some time to run so just be prepared for that.

ericjohannsen · 2022-03-30T01:16:39Z

Success!

@michaelgsharp Thank you for sticking with this, and for the detailed explanation.

Side note: The PFI calculation took quite a while on my Ryzen 9 3900 system, and I noticed only one CPU core is utilized.

michaelgsharp · 2022-03-30T23:57:54Z

Only 1 core is used? I haven't looked too much into how we are actually doing the calculations, but it seems like there is potential for some serious optimizations there.

I'm going to rename this issue to track fixing the recursive parsing issue for PFI. I'm glad we were able to get this solved!

luisquintanilla · 2022-09-23T18:18:22Z

@michaelgsharp are we okay to close this issue?

michaelgsharp closed this as completed Feb 15, 2022

michaelgsharp added the needs-author-action label Mar 2, 2022

michaelgsharp reopened this Mar 2, 2022

ghost added the no-recent-activity label Mar 19, 2022

ghost removed the no-recent-activity label Mar 22, 2022

ghost added needs-further-triage and removed needs-author-action labels Mar 23, 2022

michaelgsharp added bug Something isn't working P2 Priority of the issue for triage purpose: Needs to be fixed at some point. and removed needs-further-triage labels Mar 30, 2022

michaelgsharp added this to the ML.NET 2.0 milestone Mar 30, 2022

michaelgsharp changed the title ~~Exception Retrieving PFI from AutoML Model~~ Fix Recursive Parsing for PFI Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Recursive Parsing for PFI #6084

Fix Recursive Parsing for PFI #6084

ericjohannsen commented Feb 14, 2022

michaelgsharp commented Feb 15, 2022

IntegerMan commented Feb 20, 2022

ericjohannsen commented Feb 20, 2022

ericjohannsen commented Mar 1, 2022

michaelgsharp commented Mar 2, 2022

ericjohannsen commented Mar 5, 2022

ghost commented Mar 19, 2022

michaelgsharp commented Mar 22, 2022

ericjohannsen commented Mar 23, 2022

michaelgsharp commented Mar 24, 2022

ericjohannsen commented Mar 24, 2022

michaelgsharp commented Mar 28, 2022

ericjohannsen commented Mar 29, 2022

michaelgsharp commented Mar 29, 2022

ericjohannsen commented Mar 30, 2022 •

edited

michaelgsharp commented Mar 30, 2022

luisquintanilla commented Sep 23, 2022

Fix Recursive Parsing for PFI #6084

Fix Recursive Parsing for PFI #6084

Comments

ericjohannsen commented Feb 14, 2022

michaelgsharp commented Feb 15, 2022

IntegerMan commented Feb 20, 2022

ericjohannsen commented Feb 20, 2022

ericjohannsen commented Mar 1, 2022

michaelgsharp commented Mar 2, 2022

ericjohannsen commented Mar 5, 2022

ghost commented Mar 19, 2022

michaelgsharp commented Mar 22, 2022

ericjohannsen commented Mar 23, 2022

michaelgsharp commented Mar 24, 2022

ericjohannsen commented Mar 24, 2022

michaelgsharp commented Mar 28, 2022

ericjohannsen commented Mar 29, 2022

michaelgsharp commented Mar 29, 2022

ericjohannsen commented Mar 30, 2022 • edited

michaelgsharp commented Mar 30, 2022

luisquintanilla commented Sep 23, 2022

ericjohannsen commented Mar 30, 2022 •

edited