New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement for partial dependence plot #14969
Comments
Is there literature on using bootstrap samples to calculate partial dependence? This is kind of similar to https://github.com/AustinRochford/PyCEbox, which shows all curves (before the mean). |
Nop it was just inspired by what seaborn would expose when you make plots. |
This one is ICE which plots each sample from the dataset. This is something that we could implement as well but it should be another function. |
For ICE discussion see #14126. |
@glemaitre I would like to contribute to "Add support for categorical columns (bar plot)".
|
I was recently thinking about this feature and I don't have it clear in mind. I would like to assume that someone is passing a pipeline containing a
To be honest, I have little faith in 1. and 2. (even if I would like 1. to work :)).
In case that the data have been preprocessed, then it starts to be even trickier. Maybe we could use the @jnothman @NicolasHug @thomasjpfan would you have any wise advice? |
Here are my thoughts, following the notation from our UG:
For example: X = ... # features 0 and 3 are categorical, features 1, 2, 4 are continous
ct = ColumnTransformer(['ohe', OneHotEncoder(), (0, 3)], remainder='passthrough')
lr = LinearRegression()
pipe = make_pipeline(ct, lr)
pipe.fit(X, y)
plot_partial_dependence(pipe, X, features=(0, 2), is_categorical=(True, False))
# result: creates one bar plot for 0 and one continuous plot for 2
# Note: we don't want to support this use-case, i.e. passing OHEd data:
plot_partial_dependence(lr, ct.transform(X), features=???, is_categorical=???) |
OK this looks neat as well. |
I agree, this specification looks neat. I suppose the default value for
Even when the dataset is OHEd, we can still calculate partial dependence for individual binary columns resulting from OHEing. Will send a PR soon, if guys are happy with that. |
I will be happy to review :)
…On Wed, 26 Aug 2020 at 10:23, Madhura Jayaratne ***@***.***> wrote:
I agree, this specification looks neat. I suppose the default value for
is_categorical can be None, in which case we assume the old behaviour,
which is all features are continuous.
# Note: we don't want to support this use-case, i.e. passing OHEd data:plot_partial_dependence(lr, ct.transform(X), features=???, is_categorical=???)
Even when the dataset is OHEd, we can still calculate partial dependence
for individual binary columns resulting from OHEing.
Will send a PR soon, if guys are happy with that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14969 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY32P3NYHPO5B2RGMSBRQTSCTBBJANCNFSM4IWGO57A>
.
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
|
@glemaitre Pull request that is still WIP: #18298 |
Thanks I will put it in my review-to-do list :)
…On Sun, 30 Aug 2020 at 08:11, Madhura Jayaratne ***@***.***> wrote:
@glemaitre <https://github.com/glemaitre> Pull request that is still WIP:
#18298 <#18298>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14969 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY32PYSG6AAIRM3CTCLETLSDHUPHANCNFSM4IWGO57A>
.
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
|
Is there any literature about this? This feels more like an intervention and less like partial dependence. |
Inferring |
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Closes #14969
The partial dependence plot function could be improved
feature_names
optional whenX
is a dataframe. (ENH get column names by default in PDP when passing data… #15429)The text was updated successfully, but these errors were encountered: