New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make QuantileDMatrix
default to sklearn esitmators.
#8220
Conversation
When the tree method is selected to be |
7fadeb6
to
cff8513
Compare
Start working on sparse data. Fix race. Remove check. Merge functions. Cleanup. Cleanup. Start writing tests. Fix. comp column. Python test. lint. lint. Fix. Cleanup. Avoid binary search. Use quantile dmatrix by default in sklearn interface. dask as well. Fix max_bin. Fix empty dmatrix for CPU. Fix GPU version. Fix empty DMatrix. pylint.
cff8513
to
cc90c2b
Compare
QuantileDMatrix
default to sklearn esitmators.QuantileDMatrix
default to sklearn esitmators.
client: Optional["distributed.Client"], | ||
tree_method: Optional[str], | ||
max_bin: Optional[int], | ||
**kwargs: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that tree_method
and max_bin
will no longer be in the kwargs
dictionary. Can you make sure this will not cause undesirable behavior? For example, is kwargs
passed to xgb.train
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Training parameters are obtained via params = self.get_xgb_params()
, which you can find in the async def _fit_async
function.
return QuantileDMatrix( | ||
**kwargs, ref=ref, nthread=self.n_jobs, max_bin=self.max_bin | ||
) | ||
except TypeError: # `QuantileDMatrix` supports lesser types than DMatrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which cases are not supported out of curiosity? Just wondering if this is going to run into the exception regularly in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datatable, CSC, arrow. Also, the DMatrix has a dispatcher to convert unknown types to scipy csr.
src/data/iterative_dmatrix.cc
Outdated
} | ||
|
||
batch_param_ = BatchParam{d, max_bin}; | ||
batch_param_.sparse_thresh = 0.2; // default from TrainParam |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the default, this hardcoded value will be forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to obtain the value from train param.
* Prepare for improving Windows networking compatibility. * Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which conflicts with winsock2.h * Define `NOMINMAX` conditionally. * Link the winsock library when mysys32 is used. * Add config file for read the doc.
Start working on sparse data. Fix race. Remove check. Merge functions. Cleanup. Cleanup. Start writing tests. Fix. comp column. Python test. lint. lint. Fix. Cleanup. Avoid binary search. Use quantile dmatrix by default in sklearn interface. dask as well. Fix max_bin. Fix empty dmatrix for CPU. Fix GPU version. Fix empty DMatrix. pylint.
…-quantile-dmatrix
Stacked on #8215 .