Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is not possible to use PermutationFeatureImportance from a model loaded from disk in F# #3976

Closed
fwaris opened this issue Jul 9, 2019 · 14 comments
Assignees
Labels
F# Support of F# language lightgbm Bugs related lightgbm loadsave Bugs related loading and saving data or models need info This issue needs more info before triage P1 Priority of the issue for triage purpose: Needs to be fixed soon. regression Bugs related regression tasks

Comments

@fwaris
Copy link

fwaris commented Jul 9, 2019

I am trying to use PermutationFeatureImportance (PFI) with F# but the F# type system is not resolving ITransformer to ISingleFeaturePredictionTransformer - which is required by PFI.

I believe it is due to IPredictorProducing (and related interfaces) being marked as "internal".

F# supports explicit interfaces and maybe that is the reason for this issue.

Here is a snippet of code that shows what I am trying to do
(I am using the latest bits - v 1.2.0 at the time of this post)

let mutable schema = null
let mdl = ctx.Model.Load(@"F:\fwaris\data\t\analysis\model_cv_LightGbmBinary.bin", &schema) 
let mdlt =  mdl :?> TransformerChain<ITransformer>
let m1 =  mdlt.LastTransformer //debugger shows it is Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.IPredictorProducing<float>>
let scored = mdl.Transform(trainView)
scored.Preview()
ctx.BinaryClassification.PermutationFeatureImportance(m1 :?> _,scored)

@dsyme

@fwaris
Copy link
Author

fwaris commented Jul 12, 2019

work around for now is to use a C# helper given below but really if an interface (IPredictorProducing) is going to be exposed via another public interface, it should not really be marked internal.

    public static class MLHelper<T> where T : class
    {
        public static System.Collections.Immutable.ImmutableArray<BinaryClassificationMetricsStatistics> PFI_BinaryClassification
             
             (
                MLContext ctx,
                ITransformer model,
                IDataView data,
                string labelColumnName = "Label",
                bool useFeatureWeightFilter = false,
                int? numberOfExamplesToUse = null,
                int permutationCount = 1
            )
        {
            
            var m = ctx.BinaryClassification.PermutationFeatureImportance(
                    model as ISingleFeaturePredictionTransformer<T>, 
                    data, 
                    labelColumnName : labelColumnName, 
                    useFeatureWeightFilter : useFeatureWeightFilter, 
                    numberOfExamplesToUse : numberOfExamplesToUse, 
                    permutationCount : permutationCount
                    );
            return m;
        }
        
    }

@eerhardt
Copy link
Member

eerhardt commented Aug 2, 2019

@fwaris - I just ran into this issue as well. I don't understand how your workaround works. What T is getting passed into MLHelper<T>?

@codemzs - this is the same issue as we were discussing today. I don't think it is possible to use PermutationFeatureImportance once a model is saved to disk.

This is an issue because if you use AutoML, it always saves the model to disk in order to save on memory.

The problem is this code:

internal static class BinaryPredictionTransformer
{
public const string LoaderSignature = "BinaryPredXfer";
public static BinaryPredictionTransformer<IPredictorProducing<float>> Create(IHostEnvironment env, ModelLoadContext ctx)
=> new BinaryPredictionTransformer<IPredictorProducing<float>>(env, ctx);
}

Whenever you load a predition transformer from a model stream, it is always creating an instance of a new BinaryPredictionTransformer<IPredictorProducing<float>>. This object cannot be cast to an ISingleFeaturePredictionTransformer<TModel> that is necessary for calling PermutationFeatureImportance because the T in this case (IPredictorProducing<float>) is internal.

We need to change the above code to save off the right type into the model, and create an instance of BinaryPredictionTransformer<TModel>, where TModel is the type that was originally used when training the pipeline before saving to disk - for example, BinaryPredictionTransformer<CalibratedModelParametersBase<LightGbmBinaryModelParameters, PlattCalibrator>> when using LightGbm.

/cc @Dmitry-A @justinormont

@eerhardt eerhardt changed the title IPredictorProducing 'internal' is causing issues with F# type resolution It is not possible to use PermutationFeatureImportance from a model loaded from disk Aug 2, 2019
@fwaris
Copy link
Author

fwaris commented Aug 3, 2019

@eerhardt, it seems you can punt on the type resolution in F# by using an underscore; i.e. the following trick seems to work (I tested again just to make sure):

let metrics = MLHelper<_>.PFI_BinaryClassification(mlctx, model, labelColumnName="Label")

The 'model' variable is of the concrete type (from debugger):
Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.IPredictorProducing>

However I agree with you that this area requires re-work to make it easier to use.

antoniovs1029 added a commit that referenced this issue Oct 2, 2019
* Address the issue of using PFI with a model loaded from disk.

* Provided working tests and samples for using PFI with a model loaded from disk for the cases of Ranking, Regression, and Multiclass prediction transformers. No tests or samples provided for Binary classification, for reasons that will be addressed in a future issue.

* Also modified LbfgsTests so that it uses the appropiate casts now that the PredictionTransformers have been updated.
@antoniovs1029
Copy link
Member

Hi. So PRs #4262 and #4306 fixed the problem Eric pointed out in his comment in this thread.

So please, let us know if this has been fixed for you. Particularly, those PRs where only tested for ML.NET on C#, so I would appreciate feedback from the F# side. I will rename and tag this issue as F# specific then, since that was your original problem.

@antoniovs1029 antoniovs1029 added the F# Support of F# language label Dec 20, 2019
@antoniovs1029 antoniovs1029 changed the title It is not possible to use PermutationFeatureImportance from a model loaded from disk It is not possible to use PermutationFeatureImportance from a model loaded from disk in F# Dec 20, 2019
@antoniovs1029 antoniovs1029 added the need info This issue needs more info before triage label Dec 20, 2019
@artemiusgreat
Copy link

artemiusgreat commented Dec 26, 2019

This is not fixed yet.
There are 2 ways to save the model.

1. As a pipeline + estimator
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/save-load-machine-learning-models-ml-net

var pipeline = context.Transforms.Conversion.MapValueToKey("Label", "X").Concatenate("Features", "X1", "X2");
var estimator = context.MulticlassClassification.Trainers.LightGbm();
var model = pipeline.Append(estimator).Fit(dataView);
context.Model.Save(model, dataView.Schema, "C:/model.zip");

2. As an estimator, without pipeline
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/explain-machine-learning-model-permutation-feature-importance-ml-net#train-the-model

var estimator = context.MulticlassClassification.Trainers.LightGbm();
var transformedData = pipeline.Fit(dataView).Transform(dataView);
var model = estimator.Fit(transformedData);
context.Model.Save(model, dataView.Schema, "C:/model.zip");

Then loading from the disk.

var model = context.Model.Load("C:/model.zip", out var schema);
var engine = context.Model.CreatePredictionEngine<InputModel, OutputModel>(model);

1. As a pipeline + estimator - model contains only pipeline transformers, including MapValueToKey and Concatenate, there is no way to get actual trainer / estimator and use it for PFI. LastTransformer property will return Concatenate transformer, but PFI requires an estimator, e.g. LighGbm or Regression

2. As an estimator without pipeline - now I see LightGbm trainer in the list of TransformationChain, but CreatePredictionEngine raises an exception "Features" column is not defined, because in this case model was saved as a pure estimator, without pipeline

@artemiusgreat
Copy link

artemiusgreat commented Dec 27, 2019

Correction.
It's possible to extract trainer from model, although the code is not that great.
LastTransformer still returns pipeline transformer instead of actual trainer.
ML.NET 1.5.0-preview in Nuget.

var model = context.Model.Load("C:/model.zip", out var schema);
var trainer = (model as IEnumerable<ITransformer>)
        .SelectMany(o => (o as IEnumerable<ITransformer>)
        .OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>())
        .FirstOrDefault();
var importance = context
        .MulticlassClassification
        .PermutationFeatureImportance(trainer, pipeline.Fit(dataView).Transform(dataView), permutationCount: 3);

@codemzs
Copy link
Member

codemzs commented Dec 27, 2019

How are you finding the 1.5.0-preview nuget? We literally just released today.

@artemiusgreat
Copy link

artemiusgreat commented Dec 27, 2019

@codemzs "Show pre-release" checkbox in Nuget package manager

@codemzs
Copy link
Member

codemzs commented Dec 27, 2019

ha! of course, I meant how has your experience been with it so far? does it fix any of your issues?

@artemiusgreat
Copy link

The only thing I needed is to run PFI using model loaded from file. As far as it works, I'm happy

@antoniovs1029
Copy link
Member

antoniovs1029 commented Dec 31, 2019

Hi, @artemiusgreat . So I am not sure: is your problem solved or not?

I believe it should be possible to access the lastTransformer directly from the model you saved to disk on the "1. As a pipeline + estimator" point by simply using:

var predictor = (lodedModel as TransformerChain<ITransformer>).LastTransformer as MulticlassPredictionTransformer<OneVersusAllModelParameters>;

I am not sure why would you need to use the .SelectMany(...) method you mentioned.

but PFI requires an estimator, e.g. LightGbm or Regression

PFI doesn't require an estimator, but a Prediction Transformer. So, in your example, the LightGbm trainer is also an estimator, and once it is trained (with .Fit()) it returns a Prediction Transformer of type MulticlassPredictionTransformer<OneVersusAllModelParameters>. You should pass this last transformer to PFI, and not the trainer or estimator:

pfi = ML.MulticlassClassification.PermutationFeatureImportance(predictor, data);

If you are still facing problems, please share with us the complete code and dataset you're using, so that I can take a closer look. Thanks.

@antoniovs1029 antoniovs1029 self-assigned this Jan 2, 2020
@antoniovs1029 antoniovs1029 added the P1 Priority of the issue for triage purpose: Needs to be fixed soon. label Jan 9, 2020
@artemiusgreat
Copy link

@antoniovs1029 Sorry, missed your comment. Yest it was fixed. Thanks.

@harishsk harishsk added loadsave Bugs related loading and saving data or models regression Bugs related regression tasks lightgbm Bugs related lightgbm labels Apr 29, 2020
@antoniovs1029
Copy link
Member

So I've just tested the original scenario of this issue, on F#, and now it works... so it was indeed fixed by PRs #4262 and #4306 .

@fwaris
Copy link
Author

fwaris commented Jun 21, 2020

Also, confirming that it works.

See this issue comment for some tricks that help when working with AutoML outputs
dotnet/docs#19006 (comment)

Note: The fix works in a compiled F# project but not in F# interactive (fsi) because the current fsi is bound to older libraries. I expect that it will work in the new preview version of fsi but I have not tested that yet.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F# Support of F# language lightgbm Bugs related lightgbm loadsave Bugs related loading and saving data or models need info This issue needs more info before triage P1 Priority of the issue for triage purpose: Needs to be fixed soon. regression Bugs related regression tasks
Projects
None yet
Development

No branches or pull requests

6 participants