New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pyspark] Use quantile dmatrix. #8284
Conversation
@WeichenXu123 @wbo4958 Please take a look when you are available. |
@wbo4958 Could you please take another look? |
LGTM |
Apologies for the new changes. For some reason, the pytest mark doesn't work with test cases derived from the python unittest module. |
python-package/xgboost/spark/data.py
Outdated
|
||
Parameters | ||
---------- | ||
iterator : | ||
Pyspark partition iterator. | ||
feature_cols: | ||
A sequence of feqture names, used only when rapids plugin is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feqture -> feature. this parameter can be used even without rapid plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't really done any test for it and it will likely trigger an assert error. We have DMatrix
and QuantileDMatrix
to support, I will leave that to the next release.
@@ -228,6 +260,10 @@ def append_dqm(part: pd.DataFrame, name: str, is_valid: bool) -> None: | |||
|
|||
def make(values: Dict[str, List[np.ndarray]], kwargs: Dict[str, Any]) -> DMatrix: | |||
if len(values) == 0: | |||
get_logger("XGBoostPySpark").warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously, the warning is only printed for empty training data. while this PR also prints it for validation data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still good to have a warning, the support is "best effort" and is not yet extensively tested. If something goes wrong at least users will have some clues to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. only some minor nits.
Close #8083 .