FEA Add variable importance to linear models #21170

lorentzenchr · 2021-09-27T18:06:02Z

Describe the workflow you want to enable

I'd like to have a feature importance method native to linear models (without L1 penalty) that is calculated on the training set:

clf = LogisticRegression(with_importance=True)
clf.fit(X, y)
clf.feature_importances_  # or some nice plot thereof

Describe your proposed solution

New proposal

Evaluate if the LMG (Lindeman, Merenda and Gold, see [1, 2]) is applicable and feasible for L2 penalized regression and for GLMs. Else, consider other measures of [1, 2].

In short, LMG is Shapley value decomposition of R2 by the features.

References:

[1] R package relaimpo with JSS paper U. Grömping (2006). Relative Importance for Linear Regression in R: The Package relaimpo
[2] U. Grömping (2016). Variable importance in regression models

Original proposal

Compute the t-statistic of the coefficients

t[j] = coef[j] / std(coef[j])

and use the absolute, i.e. |t|, as measure of (in-sample) importance. For GLMs like the logistic regression, see section 5.3 in https://arxiv.org/pdf/1509.09169.pdf for a formula of Var[coef].

Describe alternatives you've considered, if relevant

Any general importance measure (permutation importance, SHAP values, ...) also works.

Additional context

Given the great and legitimate need for interpretability, I would favor to have a native importance measure for linear models. Random Forests have their own native feature_importances_ with the warning

impurity-based feature importances can be misleading for high cardinality features (many unique values).

We could add a similar warning for collinear features like

feature importances can be misleading for collinear or high-dimensional features.

I guess, in the end, this is true for all feature importance measures, even for SHAP (see also our multicollinear example).

Prior discussions like #16802, #6773, #13048, focued on p-values which seem out-of-scope for scikit-learn for different reasons. I hope we can circumvent these reasons by focusing on feature importance only and not considering p-values.

The text was updated successfully, but these errors were encountered:

lorentzenchr · 2021-09-27T18:08:57Z

@GaelVaroquaux @rth @NicolasHug @TomDLT friendly ping in case of interest as you've been involved in earlier issues.

GaelVaroquaux · 2021-09-27T20:55:09Z

I think that this is a very slippery slope: t-statistics are not well controlled outside of maximum-likelihood estimates.

Either people know what they are doing, and it's trivial to compute the above, or they don't, and they will misinterpret it (that's true of much of the model interpretation literature, that's been going around in circles for years because it is trying to give simple answers to problems that do not have a good solution in statistics).

I'm -1 on this line

lorentzenchr · 2021-09-27T21:43:09Z

We don't have to call it "t-statistic", just "native linear model feature importance".

@GaelVaroquaux Your arguments could be used against random forest feature importance, or even any feature importance measure. What do you propose instead for answering: "How important is feature X in your model? Could we drop it (for whatever good reasons, maybe it costs money)?"

I think we should have answers for the most simple, most taught and most trusted model class: linear models. I also think that the focus on model interpretation of late was very important for building trust in ML and showing that predictive performance is not necessarily the most important thing. Though, admittedly, model interpretation might be a hard nut to crack of its own.

glemaitre · 2021-09-28T08:51:16Z

By reusing the analysis that we did with the tree-based models, it seems that we had an understanding that there is no single good feature importance but rather feature importance methods that have some pros and cons. I assume that the same thing can be said about linear models, e.g. permutation importance vs. weights importance.

Adding a default feature_importance_ to the linear models means that, by default, we legitimate a single method. I am not sure that this is the right thing to do since this is somehow what we would like to run away from in tree-based models.

So there is probably a choice of API to think about: use native importance vs. use helper functions to compute the importance. If we want to avoid legitimate a particular feature importance, each model should provide a parameter/a method to compute a specific type of importance. However, we would still have some default feature importance. In the case, that we use helper functions, the choice of the importance will be user-specified. The issue, in this case, will be the integration with estimators that relied on coef_ and feature_importances_ attributes, e.g. feature selector estimators. I think that we can make some machinery such that the methods can take a model and a feature importance functions.

GaelVaroquaux · 2021-09-28T15:55:14Z

@GaelVaroquaux Your arguments could be used against random forest feature importance,

No, these are hard to implement :).

I think we should have answers for the most simple, most taught and most trusted model class: linear models.

But the answer is valid only for maximum likelihood is low dimensions. We have a didactic example on this topic. I think that we did the best that we could.

Though, admittedly, model interpretation might be a hard nut to crack of its own.

As far as I am concerned, it's still an open research question.

adrinjalali · 2021-10-04T08:50:34Z

I think adding something specific to linear models to the inspection module would be a good way to not have it as easily accessible as .feature_importances_ and yet easy enough for people who want to use it.

ogrisel · 2021-10-28T14:08:36Z

I think part of the problem is to provide a utility with a generic name such as "feature importance" which could imply that what we propose is "The Way" to assess the contributions of input features to a model.

Some of this problem would go away if we provide more specific names for different methods to compute local (per sample) and global (per dataset) "explanations" of model decisions.

For instance we could provide a utility function to compute "feature effects" for linear models to decompose the decision function for individual predictions as follows:

intercept                # baseline
+ coef_0 * X[i, 0]       # feature effect of feature 0 in the context of sample X[i]
+ coef_1 * X[i, 1]       # feature effect of feature 0 in the context of sample X[i]

And then this same function could be be aggregated across a dataset to compute a feature effect plot such as:

https://christophm.github.io/interpretable-ml-book/limo.html#effect-plot

This would be similar to the request to implement to implement decision_path for (H)GBRT in #19294 that would allow us to compute individual and aggregate feature effects for those models and them present the results using plots such as:

feature effects / impacts for a decision on a individual sample (local explanation)

from: https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211

feature effect of a given feature computed on a test set (global explanation)

from: https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211

If we want this utility to reflect the uncertainty caused the sampling of the training set and the training procedure, we could cross_validate() the model with return_estimators=True and use a set of models and predictions on their respective validations sets to compute the above plots using a dedicated from_cv_results method as is being currently drafted in #21211 in the context of calibration curves.

lorentzenchr · 2022-12-06T19:22:00Z

As someone said elsewhere on a different topic

... is very classic and used in many communities. People understand the meaning at a glance (even if the understanding is limited). I think that it is important that we support it.

I think the same applies here😏

lorentzenchr added New Feature Needs Decision Requires decision labels Sep 27, 2021

lorentzenchr changed the title ~~FEA Add variable importance to linear models via t-statistic~~ FEA Add variable importance to linear models Oct 24, 2021

lorentzenchr added module:linear_model module:inspection labels Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA Add variable importance to linear models #21170

FEA Add variable importance to linear models #21170

lorentzenchr commented Sep 27, 2021 •

edited

lorentzenchr commented Sep 27, 2021

GaelVaroquaux commented Sep 27, 2021

lorentzenchr commented Sep 27, 2021

glemaitre commented Sep 28, 2021

GaelVaroquaux commented Sep 28, 2021 via email

adrinjalali commented Oct 4, 2021

ogrisel commented Oct 28, 2021

lorentzenchr commented Dec 6, 2022 •

edited

FEA Add variable importance to linear models #21170

FEA Add variable importance to linear models #21170

Comments

lorentzenchr commented Sep 27, 2021 • edited

Describe the workflow you want to enable

Describe your proposed solution

New proposal

Original proposal

Describe alternatives you've considered, if relevant

Additional context

lorentzenchr commented Sep 27, 2021

GaelVaroquaux commented Sep 27, 2021

lorentzenchr commented Sep 27, 2021

glemaitre commented Sep 28, 2021

GaelVaroquaux commented Sep 28, 2021 via email

adrinjalali commented Oct 4, 2021

ogrisel commented Oct 28, 2021

feature effects / impacts for a decision on a individual sample (local explanation)

feature effect of a given feature computed on a test set (global explanation)

lorentzenchr commented Dec 6, 2022 • edited

lorentzenchr commented Sep 27, 2021 •

edited

lorentzenchr commented Dec 6, 2022 •

edited