Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle categorical split in model histogram and dataframe. #7065

Merged
merged 10 commits into from Jul 2, 2021

Conversation

trivialfis
Copy link
Member

  • Throw an error in the histogram function if the selected feature is categorical.
  • Parse categorical split in model to dataframe function. Added a column Category.

@trivialfis trivialfis mentioned this pull request Jun 25, 2021
67 tasks
@trivialfis
Copy link
Member Author

trivialfis commented Jun 25, 2021

I investigated into the last failing test with categorical test in gpu updater, the minor difference in one histogram bin lead to an different tree that can change prediction up to abs(0.5), this is what I got in the evaluation kernel:

tid: 3, nidx: 12, feat: 6, max: 89.427734375, bg: -638.998825073, bh: 29.000000000, mg: -0.000015259, mh: 0.000000000, pg: -662.094177246, ph: 31.000000000
tid: 2, nidx: 12, feat: 1, max: 89.426757812, bg: -638.998832703, bh: 29.000000000, mg: -0.000007629, mh: 0.000000000, pg: -662.094177246, ph: 31.000000000

There's 1e-5 difference in bg(histogram bin gradient), then xgboost selected a different feature to split on. I relaxed that test a bit more and added a short note. But I don't know how to get the test more robust while keeping the accuracy.

@trivialfis trivialfis merged commit a5d222f into dmlc:master Jul 2, 2021
@trivialfis trivialfis deleted the cat-disable-split-histogram branch July 2, 2021 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants