Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature weights #5962

Merged
merged 9 commits into from Aug 18, 2020
Merged

Feature weights #5962

merged 9 commits into from Aug 18, 2020

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Jul 30, 2020

Closes #3754, closes #5308 . Currently supported tree methods: exact, hist, gpu_hist.

I spent some time on looking through the code base in hist, and want to unify it with approx and grow_local_histmaker in coming release. So right now only 3 tree methods that use column sampler are supported.

  • High level tests. Column sampling is difficult to have precise test, I need to figure out a way to have better tests for it.

The API also handles 4 different data types including f32, f64, uint32, uint64. It can be further extended, but the important point is I think we should move toward more general data backend, otherwise we will be making an extra copy every time user specifies a different type than f32. (the default type for numpy is f64).

@trivialfis
Copy link
Member Author

@codecov-commenter
Copy link

codecov-commenter commented Jul 31, 2020

Codecov Report

Merging #5962 into master will increase coverage by 0.10%.
The diff coverage is 95.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5962      +/-   ##
==========================================
+ Coverage   79.04%   79.14%   +0.10%     
==========================================
  Files          12       12              
  Lines        3025     3040      +15     
==========================================
+ Hits         2391     2406      +15     
  Misses        634      634              
Impacted Files Coverage Δ
python-package/xgboost/data.py 59.40% <93.75%> (+0.85%) ⬆️
python-package/xgboost/core.py 78.22% <100.00%> (+0.08%) ⬆️
python-package/xgboost/sklearn.py 91.44% <100.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a418278...0d23093. Read the comment docs.

src/c_api/c_api.cc Outdated Show resolved Hide resolved
src/common/random.h Show resolved Hide resolved
python-package/xgboost/core.py Outdated Show resolved Hide resolved
@RAMitchell
Copy link
Member

Can you point to other implementations of column weighting in other libraries or papers?

@trivialfis
Copy link
Member Author

Can you point to other implementations of column weighting in other libraries or papers?

I can not. It's a simple heuristic for column sampling so I just rolled it out myself.

@trivialfis trivialfis force-pushed the feature-weights branch 2 times, most recently from 2cd1792 to b9fc867 Compare August 14, 2020 05:37
@trivialfis
Copy link
Member Author

Note to myself: Revise C doc after all tests pass.

src/data/data.cu Show resolved Hide resolved
Copy link
Member Author

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RAMitchell there's a validation method in meta info, which is called during training.

@trivialfis
Copy link
Member Author

I will reference the sampling algorithm itself later. But it's not related to column sampling used for machine learning.

@trivialfis trivialfis merged commit 4d99c58 into dmlc:master Aug 18, 2020
@trivialfis trivialfis deleted the feature-weights branch August 18, 2020 11:55
@trivialfis trivialfis mentioned this pull request Aug 18, 2020
14 tasks
anusharamesh pushed a commit to AppLovin/xgboost that referenced this pull request May 11, 2022
anusharamesh added a commit to AppLovin/xgboost that referenced this pull request May 12, 2022
* Update BoosterParams.scala

* fix scala checkstyle error

* fix whitespace checkstyle error

* fix type cast error

* fix conversion issue

* update version for 1.2.5-al

* Feature weights (dmlc#5962)

* update version

* version update 2

Co-authored-by: Oscar Pan <oscar.pan@applovin.com>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants