Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JSON format for categorical features. #6128

Merged
merged 5 commits into from Sep 21, 2020

Conversation

trivialfis
Copy link
Member

  • This PR reduces the saved categories to a single vector instead of vector per-node.

@codecov-commenter
Copy link

codecov-commenter commented Sep 17, 2020

Codecov Report

Merging #6128 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #6128   +/-   ##
=======================================
  Coverage   78.52%   78.52%           
=======================================
  Files          12       12           
  Lines        3069     3069           
=======================================
  Hits         2410     2410           
  Misses        659      659           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e955fb...be8aa27. Read the comment docs.

@hcho3
Copy link
Collaborator

hcho3 commented Sep 17, 2020

@trivialfis Do you have performance numbers? How much does this PR improve serialization performance?

common::KCatBitField const cat_bits(node_categories);
for (size_t i = 0; i < cat_bits.Size(); ++i) {
if (cat_bits.Check(i)) {
categories.emplace_back(static_cast<Integer::Int>(i));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trivialfis What's the performance implication of saving individual categories? Is it better than saving bitmaps directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not. If saving bitmap is preferred I can make the change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if saving individual categories produces acceptable performance, let's keep it. It's easier to parse by a human.

@trivialfis trivialfis merged commit 7065779 into dmlc:master Sep 21, 2020
@trivialfis trivialfis deleted the categories-format branch September 21, 2020 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants