New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial_dependence() computes conditional partial dependence #27441
Comments
Thanks @markusloecher for the details notes. Reading the tests, it seems that this something that is expected but it should not this way. I makes me think about discussion in SHAP regarding the feature perturbation that can be set to "intervational" or "tree_path_dependent" (https://shap.readthedocs.io/en/latest/generated/shap.TreeExplainer.html#shap.TreeExplainer). It boils down to the same issue. I think that it would make sense to have the interventional approach always and fix the current behaviour. In the future, we could think of the implication of proposing the conditional approach, both in terms of API but even more in terms of implication regarding the interpretation of such results. @markusloecher It looks like you have a fix in head in the notes that you wrote. Do you feel like contributing a bug fix? |
Thanks for the super quick reply! Yes, indeed these issues are quite similar to the "interventional" or "tree_path_dependent" SHAP discussions! While I did sketch out a conceptual fix, I am unfortunately nowhere close to implementing this in code. Especially not at the rigorous sklearn quality level. Thanks, |
Just following up one more time on this.
be new ? I am not certain (yet) that the differences are caused by highly unlikely samples. |
If you have a fix for the recursive tree method, even if very rough, we'd be interested! The fix is the problem, the polishing is the easy part. |
Will post one very soon ! |
@mayer79 Do you have any insights or recommendations here? |
@lorentzenchr : the two methods are different, and they estimate different things. (Just like TreeSHAP and permutation SHAP. ) I think it would be good to mention this in the docu, but of course it is not a bug. |
Well, of course both methods have their place. I would still consider the current setup a bug since the assumption is that for trees the recursive way yields the same results as the brute force; exemplified e.g. by this internal sklearn code: |
We should definitely document this. PR very welcome. |
Describe the bug
For the case of correlated predictors (clearly highly common) the
sklearn.inspection.partial_dependence()
function gives different answers formethod
= "recursion" andmethod
= "brute", see my post for elaborate examples.I do not believe that this is intentional and should be fixed. Alternatively, it should be communicated clearly in the documentation that (i) the two methods are not equivalent for tree based algorithms, and (ii) that$E[f(x_S,X_C)|X_S=x_s]$ instead of the (desired) interventional $E[f(x_S,X_C)| \mathbf{do}(X_S=x_s)]$
method
= "recursion" actually computes the conditionalSteps/Code to Reproduce
Expected Results
We would like the two methods to yield the same pdp values, so the last line should yield
array([[True, True]])
Actual Results
Instead they are different. The print statements yield
X1 brute (interventional): [[0.6 0.4]]
X1 recursion (conditional): [[0.42 0.22]]
and the last line yields
array([[False, False]])
Versions
The text was updated successfully, but these errors were encountered: