ENH: increase transparency of background dataset sub-sampling #3461

jyliuu · 2024-01-18T10:52:21Z

Issue Description

Given $x$ which is the sample that we wish to explain, we can compute the Shapley values of that sample using a background sample $x^b$. By providing the Explainer class with background data, it should compute the Shapley values for each sample in the background the background data and then take the average, which will be an approximation to the interventional SHAP.

The averaging procedure means that if I for example split my background data in half, A and B, then I should be able to call explainer on both A and B to obtain the averaged SHAP, a and b, for each half. If I now take (a + b)/2, then this should equal calling SHAP on the entire dataset to begin with.

From my experimentation, it seems that if the background dataset is over 100 samples, then it becomes inconsistent i.e. (a+b)/2 is not equal to the interventional approximation on the entire background dataset. However, the formula holds for datasets under 100 samples.

Minimal Reproducible Example

import numpy as np
import pandas as pd
import xgboost

import shap

rng = np.random.default_rng(42)
N = 1000
M = 2

X = rng.standard_normal(size=(N, 2))
X[:, 0] = 0.2*X[:, 1] + X[:, 0]
y = -2*X[:, 0] + X[:, 1] + 0.5*X[:, 0]*X[:, 1]

X = pd.DataFrame(X, columns=["X1", "X2"])


model = xgboost.XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=3)
model.fit(X, y)


def get_shap_values(model, X, sample):
    explainer = shap.TreeExplainer(
        model,
        X,
        feature_perturbation="interventional",
    )
    explanation = explainer(sample)

    expected_value = explanation.base_values[0]
    shap_values = explanation.values[0]
    return shap_values, expected_value

# Consistent for when the background data has 100 or less samples


for i in range(50, 53): # i is number of samples in each half 
    midpoint = i
    double_mid = midpoint * 2
    # shap on two halves
    shap_values1, expected_value1 = get_shap_values(model, X.loc[1:midpoint, :], X.loc[[0], :])
    shap_values2, expected_value2 = get_shap_values(model, X.loc[(midpoint+1):double_mid, :], X.loc[[0], :])
    # Shap on full background data
    shap_values, expected_value = get_shap_values(model, X.loc[1:double_mid, :], X.loc[[0], :])

    print(len(X.loc[1:midpoint, :]), len(X.loc[(midpoint+1):double_mid, :]), len(X.loc[1:double_mid, :]))
    print(shap_values, (shap_values1 + shap_values2) / 2) # inconsistent here when i > 50

Traceback

No response

Expected Behavior

In the for loop the values should equal each other

Bug report checklist

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest release of shap.
I have confirmed this bug exists on the master branch of shap.
I'd be interested in making a PR to fix this bug

Installed Versions

0.44.0

CloseChoice · 2024-01-20T10:34:25Z

Thanks for the report and your effort to investigate this. Your description is absolutely accurate and the reason for this is the default in the tabular masker.

Here is an issue where this problem was already discussed including workaround: #3174.

We probably should throw at least a warning if max_samples < len(X). What do you thing @connortann ? This issue seems to come up and is confusing users.

connortann · 2024-01-23T19:43:57Z

I agree with your analysis, this seems to be a consequence of sampling. I'll remove the bug label as I think this is intended behaviour.

We probably should throw at least a warning if max_samples < len(X)

I'm not sure if I agree. To me, warnings are generally used to indicate undesirable situations in which the user should probably update their code to fix the warning. In this case I think for the majority of users the subsampling is expected and desirable behaviour. Many parts of shap are sampling-based and only offer approximate results.

Would log.info() be more appropriate?

CloseChoice · 2024-01-23T20:40:57Z

logging.info is fine for me as well. I would be fine with a print as well, just to make sure that users do not have to investigate a couple hours to find the reason for the inconsistency between values and theory

connortann · 2024-01-24T13:01:39Z

I would much prefer logging over print statements, as prints are much harder to configure and disable. I think adding a print would risk annoying a large majority of shap users.

I've renamed the title accordingly to reflect the plan.

jcoding2022 · 2024-05-11T02:49:18Z

I am also confused about the background dataset and would like to ask a follow-up question, if I may.

Suppose I use shap.TreeExplainer to explain predictions from my LightGBM model for a classification task. I am interested in model_output = "probability", so according to the documentation, I need to set feature_perturbation="interventional" and specify a background dataset. Given that I have training data, validation data, and test data, where should I pick the background dataset from - training, validation, or test? It says in the documentation that "Anywhere from 100 to 1000 random background samples are good sizes to use", how should I pick the samples? Should I fix the random samples so that the background dataset won't change regardless of the dataset (train, validation, test) I use?

CloseChoice · 2024-05-11T07:05:13Z

This is not strictly on topic, so if you have follow up questions to my answer please open a discussion or search for one of the topics where this is already discussed.

First, I do not believe there is a real answer to your question, there is no real backtesting one can do for shap values, etc. So one just has to take various considerations into account:

do you want to have deterministic shap values? If yes, then fixing the background dataset makes sense.
The background dataset is just used to calculate the baseline, so any number where this average is seen to converge should be sufficiently large. You can try to test if this is the case by keeping the dataset to explain constant and change the background dataset and check how large the differences in the shap values (or even simpler: just in the the expected value) are. For iid sampling and a sufficiently diverse background dataset a number of 100 to 1000 should suffice, I wouldn't expect much differences between train, test or validation. If so, that I would rather check if your splitting is chosen correctly.

jyliuu added the bug Indicates an unexpected problem or unintended behaviour label Jan 18, 2024

connortann added enhancement Indicates new feature requests question and removed bug Indicates an unexpected problem or unintended behaviour labels Jan 23, 2024

connortann changed the title ~~BUG: Computation of interventional SHAP is inconsistent with theory~~ ENH: increase transparency of background dataset sub-sampling Jan 24, 2024

connortann added this to the 0.45.0 milestone Jan 24, 2024

CloseChoice added good first issue This is a fix that might be easier for someone to do as a first contribution and removed question labels Feb 15, 2024

connortann removed this from the 0.45.0 milestone Mar 6, 2024

CloseChoice linked a pull request May 11, 2024 that will close this issue

improve transparency concerning background samples #3650

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: increase transparency of background dataset sub-sampling #3461

ENH: increase transparency of background dataset sub-sampling #3461

jyliuu commented Jan 18, 2024

CloseChoice commented Jan 20, 2024 •

edited

connortann commented Jan 23, 2024 •

edited

CloseChoice commented Jan 23, 2024

connortann commented Jan 24, 2024

jcoding2022 commented May 11, 2024

CloseChoice commented May 11, 2024

ENH: increase transparency of background dataset sub-sampling #3461

ENH: increase transparency of background dataset sub-sampling #3461

Comments

jyliuu commented Jan 18, 2024

Issue Description

Minimal Reproducible Example

Traceback

Expected Behavior

Bug report checklist

Installed Versions

CloseChoice commented Jan 20, 2024 • edited

connortann commented Jan 23, 2024 • edited

CloseChoice commented Jan 23, 2024

connortann commented Jan 24, 2024

jcoding2022 commented May 11, 2024

CloseChoice commented May 11, 2024

CloseChoice commented Jan 20, 2024 •

edited

connortann commented Jan 23, 2024 •

edited