Handle categorical split in model histogram and dataframe. #7065

trivialfis · 2021-06-25T12:07:58Z

Throw an error in the histogram function if the selected feature is categorical.
Parse categorical split in model to dataframe function. Added a column Category.

trivialfis · 2021-06-25T18:55:03Z

I investigated into the last failing test with categorical test in gpu updater, the minor difference in one histogram bin lead to an different tree that can change prediction up to abs(0.5), this is what I got in the evaluation kernel:

tid: 3, nidx: 12, feat: 6, max: 89.427734375, bg: -638.998825073, bh: 29.000000000, mg: -0.000015259, mh: 0.000000000, pg: -662.094177246, ph: 31.000000000
tid: 2, nidx: 12, feat: 1, max: 89.426757812, bg: -638.998832703, bh: 29.000000000, mg: -0.000007629, mh: 0.000000000, pg: -662.094177246, ph: 31.000000000

There's 1e-5 difference in bg(histogram bin gradient), then xgboost selected a different feature to split on. I relaxed that test a bit more and added a short note. But I don't know how to get the test more robust while keeping the accuracy.

trivialfis added 4 commits June 25, 2021 18:46

Error on get_split_value_histogram when feature is categorical.

4c04f40

Support df.

346d940

Singular.

672cd2b

cleanup.

70eb077

trivialfis mentioned this pull request Jun 25, 2021

Categorical data support. #6503

Closed

67 tasks

trivialfis added 4 commits June 25, 2021 20:16

fix test.

862909c

Unused code.

1bf0731

Fix test.

7c76d74

Note about the flaky test.

d13d334

trivialfis added 2 commits June 26, 2021 16:03

full range.

49ae2c4

Small fix to test.

4703806

RAMitchell approved these changes Jul 1, 2021

View reviewed changes

trivialfis merged commit a5d222f into dmlc:master Jul 2, 2021

trivialfis deleted the cat-disable-split-histogram branch July 2, 2021 05:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle categorical split in model histogram and dataframe. #7065

Handle categorical split in model histogram and dataframe. #7065

trivialfis commented Jun 25, 2021

trivialfis commented Jun 25, 2021 •

edited

Handle categorical split in model histogram and dataframe. #7065

Handle categorical split in model histogram and dataframe. #7065

Conversation

trivialfis commented Jun 25, 2021

trivialfis commented Jun 25, 2021 • edited

trivialfis commented Jun 25, 2021 •

edited