New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature weights #5962
Feature weights #5962
Conversation
Note: try to reproduce it https://ci.appveyor.com/project/tqchen/xgboost/builds/34397694/job/wnsuvk9nb4okyf7y |
c4c9959
to
328eb14
Compare
Codecov Report
@@ Coverage Diff @@
## master #5962 +/- ##
==========================================
+ Coverage 79.04% 79.14% +0.10%
==========================================
Files 12 12
Lines 3025 3040 +15
==========================================
+ Hits 2391 2406 +15
Misses 634 634
Continue to review full report at Codecov.
|
328eb14
to
9a85264
Compare
Can you point to other implementations of column weighting in other libraries or papers? |
I can not. It's a simple heuristic for column sampling so I just rolled it out myself. |
2cd1792
to
b9fc867
Compare
Note to myself: Revise C doc after all tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RAMitchell there's a validation method in meta info, which is called during training.
I will reference the sampling algorithm itself later. But it's not related to column sampling used for machine learning. |
195aae5
to
050cbda
Compare
* Update BoosterParams.scala * fix scala checkstyle error * fix whitespace checkstyle error * fix type cast error * fix conversion issue * update version for 1.2.5-al * Feature weights (dmlc#5962) * update version * version update 2 Co-authored-by: Oscar Pan <oscar.pan@applovin.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
Closes #3754, closes #5308 . Currently supported tree methods:
exact
,hist
,gpu_hist
.I spent some time on looking through the code base in hist, and want to unify it with
approx
andgrow_local_histmaker
in coming release. So right now only 3 tree methods that use column sampler are supported.The API also handles 4 different data types including f32, f64, uint32, uint64. It can be further extended, but the important point is I think we should move toward more general data backend, otherwise we will be making an extra copy every time user specifies a different type than f32. (the default type for numpy is f64).