New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite sparse dmatrix using callbacks. #7092
Conversation
For example usage, there's a C demo in the original PR. |
Codecov Report
@@ Coverage Diff @@
## master #7092 +/- ##
=======================================
Coverage 81.59% 81.59%
=======================================
Files 13 13
Lines 3901 3901
=======================================
Hits 3183 3183
Misses 718 718 Continue to review full report at Codecov.
|
Note: The rewrite is about following things:
|
@RAMitchell @hcho3 we still have some more tests we need to do in order to get it useful. For example we need to test whether the page might go out of scope in predictor before the possibly async prediction is finished. I will defer those into future PRs. |
4aa7d48
to
d8bbf8b
Compare
Looks good, we will now have a much more flexible external memory implementation, supporting date iterators in other languages, and more easily extending internal data structures to work with external memory. Good to see lots of tests also. |
This reverts commit dd2c8a9.
This reverts commit 3d5f319.
d8bbf8b
to
f54b36a
Compare
(gpu_)page_size
is removed. Now the size of each binary block is entirely determined by the batch size provided by the user.Part of #7070 . This PR handles the internal implementation of external memory, the function is not exposed to Python yet. High level tests are written with custom iterators without dmlc-core parser so they are still at the original PR.