[pyspark] Use quantile dmatrix. #8284

trivialfis · 2022-09-28T07:44:37Z

Close #8083 .

trivialfis · 2022-10-04T04:54:25Z

@WeichenXu123 @wbo4958 Please take a look when you are available.

python-package/xgboost/spark/data.py

trivialfis · 2022-10-09T07:52:32Z

@wbo4958 Could you please take another look?

wbo4958 · 2022-10-10T01:20:50Z

LGTM

trivialfis · 2022-10-11T06:58:22Z

Apologies for the new changes. For some reason, the pytest mark doesn't work with test cases derived from the python unittest module.

wbo4958 · 2022-10-12T09:07:45Z

python-package/xgboost/spark/data.py


    Parameters
    ----------
    iterator :
        Pyspark partition iterator.
+    feature_cols:
+        A sequence of feqture names, used only when rapids plugin is enabled.


feqture -> feature. this parameter can be used even without rapid plugin.

I haven't really done any test for it and it will likely trigger an assert error. We have DMatrix and QuantileDMatrix to support, I will leave that to the next release.

wbo4958 · 2022-10-12T09:11:10Z

python-package/xgboost/spark/data.py

@@ -228,6 +260,10 @@ def append_dqm(part: pd.DataFrame, name: str, is_valid: bool) -> None:

    def make(values: Dict[str, List[np.ndarray]], kwargs: Dict[str, Any]) -> DMatrix:
        if len(values) == 0:
+            get_logger("XGBoostPySpark").warning(


previously, the warning is only printed for empty training data. while this PR also prints it for validation data.

It's still good to have a warning, the support is "best effort" and is not yet extensively tested. If something goes wrong at least users will have some clues to debug.

wbo4958

Overall, LGTM. only some minor nits.

[pyspark] Use quantile dmatrix.

ecd19df

trivialfis added this to In progress in 1.7 Roadmap via automation Sep 28, 2022

trivialfis added 10 commits September 28, 2022 18:57

Pass parameters.

e8247a8

lint.

d9889ed

Split params.

bdec7f6

Prevent accidental error.

7ed1d22

cleanup.

a8c66de

Merge branch 'master' into pyspark-qdm

e75eb8a

cleanup.

f89404d

Fix.

2aacb2e

Workaround.

739b7fa

Merge branch 'master' into pyspark-qdm

7f3e864

trivialfis changed the title ~~[WIP] [pyspark] Use quantile dmatrix.~~ [pyspark] Use quantile dmatrix. Sep 29, 2022

trivialfis marked this pull request as ready for review September 29, 2022 14:11

Merge branch 'master' into pyspark-dqm

0a2251b

wbo4958 reviewed Oct 8, 2022

View reviewed changes

python-package/xgboost/spark/data.py Outdated Show resolved Hide resolved

Additional assert.

e37ace4

wbo4958 reviewed Oct 8, 2022

View reviewed changes

python-package/xgboost/spark/data.py Show resolved Hide resolved

python-package/xgboost/spark/data.py Show resolved Hide resolved

Reviewer's comments, tests.

8e079d6

RAMitchell approved these changes Oct 10, 2022

View reviewed changes

1.7 Roadmap automation moved this from In progress to Reviewer approved Oct 10, 2022

trivialfis added 5 commits October 10, 2022 20:09

unittest conflicts with pytest.

a761a2c

Merge branch 'master' into pyspark-qdm

2438b08

mypy.

943eb5f

pylint.

a74dba0

Fixes.

715f193

One more.

f849a48

wbo4958 reviewed Oct 12, 2022

View reviewed changes

typo.

84d5bdd

trivialfis merged commit 97a5b08 into dmlc:master Oct 12, 2022

1.7 Roadmap automation moved this from Reviewer approved to Done Oct 12, 2022

trivialfis deleted the pyspark-qdm branch October 12, 2022 12:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pyspark] Use quantile dmatrix. #8284

[pyspark] Use quantile dmatrix. #8284

trivialfis commented Sep 28, 2022

trivialfis commented Oct 4, 2022

trivialfis commented Oct 9, 2022

wbo4958 commented Oct 10, 2022

trivialfis commented Oct 11, 2022

wbo4958 Oct 12, 2022

trivialfis Oct 12, 2022

wbo4958 Oct 12, 2022

trivialfis Oct 12, 2022

wbo4958 left a comment

[pyspark] Use quantile dmatrix. #8284

[pyspark] Use quantile dmatrix. #8284

Conversation

trivialfis commented Sep 28, 2022

trivialfis commented Oct 4, 2022

trivialfis commented Oct 9, 2022

wbo4958 commented Oct 10, 2022

trivialfis commented Oct 11, 2022

wbo4958 Oct 12, 2022

Choose a reason for hiding this comment

trivialfis Oct 12, 2022

Choose a reason for hiding this comment

wbo4958 Oct 12, 2022

Choose a reason for hiding this comment

trivialfis Oct 12, 2022

Choose a reason for hiding this comment

wbo4958 left a comment

Choose a reason for hiding this comment