Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] [pyspark] Make QDM optional based on cuDF check #8471

Merged
merged 6 commits into from Nov 27, 2022

Conversation

WeichenXu123
Copy link
Contributor

@WeichenXu123 WeichenXu123 commented Nov 16, 2022

Signed-off-by: Weichen Xu weichen.xu@databricks.com

Closes #8467
Closes #8469

If cuDF is installed and using hist / gpu_hist tree method, use QDM, otherwise use DMatrix.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123
Copy link
Contributor Author

CC @trivialfis @wbo4958 @hcho3

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think an independent parameter is needed. XGBoost should be able to decide what data structure should be used based on input.

python-package/xgboost/compat.py Outdated Show resolved Hide resolved
python-package/xgboost/spark/core.py Outdated Show resolved Hide resolved
@WeichenXu123 WeichenXu123 changed the title [jvm-packages] [pyspark] Add use_quantile_dmatrix param and make cuDF optional in PySpark package [jvm-packages] [pyspark] Make QDM optional based on cuDF check Nov 23, 2022
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123
Copy link
Contributor Author

@trivialfis PR updated ! :)

@WeichenXu123 WeichenXu123 requested review from wbo4958, hcho3 and trivialfis and removed request for wbo4958, hcho3 and trivialfis November 23, 2022 02:38
@WeichenXu123 WeichenXu123 requested review from hcho3 and trivialfis and removed request for trivialfis and hcho3 November 23, 2022 04:39
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
python-package/xgboost/compat.py Outdated Show resolved Hide resolved
python-package/xgboost/spark/data.py Outdated Show resolved Hide resolved
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Comment on lines +81 to +82
if importlib.util.find_spec("cudf") is None:
return False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for avoiding printing importing error if cuDF not installed. (specifically for databricks runtime)

@trivialfis trivialfis merged commit 67ea1c3 into dmlc:master Nov 27, 2022
@hcho3 hcho3 mentioned this pull request Nov 29, 2022
9 tasks
@trivialfis trivialfis added this to To be backported in 1.7.2 Patch Release. Nov 29, 2022
trivialfis pushed a commit to trivialfis/xgboost that referenced this pull request Dec 6, 2022
trivialfis added a commit that referenced this pull request Dec 6, 2022
Co-authored-by: WeichenXu <weichen.xu@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
1.7.2 Patch Release.
To be backported
Development

Successfully merging this pull request may close these issues.

Xgboost regressor training with GPU does not work when python environment does not have "cudf" package
4 participants