New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve JSON format for categorical features. #6128
Conversation
trivialfis
commented
Sep 17, 2020
- This PR reduces the saved categories to a single vector instead of vector per-node.
Codecov Report
@@ Coverage Diff @@
## master #6128 +/- ##
=======================================
Coverage 78.52% 78.52%
=======================================
Files 12 12
Lines 3069 3069
=======================================
Hits 2410 2410
Misses 659 659 Continue to review full report at Codecov.
|
@trivialfis Do you have performance numbers? How much does this PR improve serialization performance? |
common::KCatBitField const cat_bits(node_categories); | ||
for (size_t i = 0; i < cat_bits.Size(); ++i) { | ||
if (cat_bits.Check(i)) { | ||
categories.emplace_back(static_cast<Integer::Int>(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trivialfis What's the performance implication of saving individual categories? Is it better than saving bitmaps directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not. If saving bitmap is preferred I can make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if saving individual categories produces acceptable performance, let's keep it. It's easier to parse by a human.