Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX HistGradientBoosting raising ValueError with monotonic_cst and categorical_feature #28925

Merged
merged 7 commits into from May 3, 2024

Conversation

yuanx749
Copy link
Contributor

@yuanx749 yuanx749 commented May 1, 2024

Reference Issues/PRs

Fixes #28898

What does this implement/fix? Explain your changes.

HistGradientBoosting uses ColumnTransformer to preprocess the input X, which places the categorical features at the beginning of X. This PR adjusts the feature order in monotonic_cst correspondingly, to avoid the error when calling TreeGrower.

Any other comments?

Copy link

github-actions bot commented May 1, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ec90944. Link to the linter CI: here

@ogrisel
Copy link
Member

ogrisel commented May 2, 2024

Thanks for the PR @yuanx749, the fix and the test look good. Can you please add an entry to the change log (doc/whats_new/1.5.rst)?

@yuanx749
Copy link
Contributor Author

yuanx749 commented May 2, 2024

Thank you for reviewing @ogrisel. I added the entry to whats_new.

@ogrisel
Copy link
Member

ogrisel commented May 2, 2024

Actually, I think it would be worth making the test stronger by expanding test_predictions to insert some extra columns f_cat_a = rng.randint(low=0, high=9, size=n_samples) (and similarly f_b and f_c, before, between and after f_0 and f_1 and no monotonicity constraint on those columns (and no impact on y either).

This way we could better check the correctness of the monotonicity constraint remapping.

@ogrisel
Copy link
Member

ogrisel commented May 2, 2024

/cc @thomasjpfan.

@ogrisel ogrisel added this to the 1.5 milestone May 2, 2024
@yuanx749
Copy link
Contributor Author

yuanx749 commented May 3, 2024

Actually, I think it would be worth making the test stronger by expanding test_predictions to insert some extra columns f_cat_a = rng.randint(low=0, high=9, size=n_samples) (and similarly f_b and f_c, before, between and after f_0 and f_1 and no monotonicity constraint on those columns (and no impact on y either).

This way we could better check the correctness of the monotonicity constraint remapping.

Good point. I updated test_predictions, and all tests pass locally.

Then I think the additional non-regression test I added before can be removed.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @yuanx749 ! The fix and additional tests looks good to me.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete the redundant test but otherwise LGTM! Thanks for the fix.

@ogrisel ogrisel enabled auto-merge (squash) May 3, 2024 14:40
@ogrisel ogrisel merged commit 7bc1dbb into scikit-learn:main May 3, 2024
28 checks passed
@yuanx749 yuanx749 deleted the hist-mono branch May 5, 2024 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HistGradientBoostingClassifier raise error with monotonic constraints and categorical features
3 participants