Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pyspark] Add param validation for "objective" and "eval_metric" param, and remove invalid booster params #8173

Merged
merged 12 commits into from Aug 24, 2022

Conversation

WeichenXu123
Copy link
Contributor

@WeichenXu123 WeichenXu123 commented Aug 14, 2022

  • Added param validation for "objective" and "eval_metric" param
  • Removed invalid booster params passed to xgboost.train
  • Allow regressor setting other objective value
  • Handle the case running on spark GPU cluster but spark.task.resource.gpu.amount config is not set.

@WeichenXu123 WeichenXu123 changed the title [pyspark] Add param validation for "objective" and "eval_metric" param [pyspark] Add param validation for "objective" and "eval_metric" param, and remove invalid booster params Aug 14, 2022
@WeichenXu123
Copy link
Contributor Author

CC @trivialfis @wbo4958

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@trivialfis
Copy link
Member

Please fix the formatting error: https://github.com/dmlc/xgboost/runs/7824147126?check_suite_focus=true .

@trivialfis
Copy link
Member

Any update?

@@ -267,6 +267,18 @@ def _validate_params(self):
"If features_cols param set, then features_col param is ignored."
)

if self.getOrDefault(self.objective) is not None:
Copy link
Contributor

@wbo4958 wbo4958 Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense (to me) to check the type of a parameter that already has the typeConverter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "objective" param is automatically generated, currently it does not have type converter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still hope we can find a common solution for the type converter in the following PR

WeichenXu123 and others added 3 commits August 23, 2022 16:29
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Ubuntu <weichen.xu@ip-10-110-30-183.us-west-2.compute.internal>
Comment on lines 356 to 361
if not gpu_per_task:
# For spark cluster with GPU config, the "spark.task.resource.gpu.amount"
# config might not be set. In this case, the default gpu per task is 1.
# If running on no GPU available, in spark task calling `_get_gpu_id` will
# raise gpu not found error.
gpu_per_task = "1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbo4958 A notable fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault, reverted this.

@WeichenXu123
Copy link
Contributor Author

CC @trivialfis @wbo4958 Done.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123
Copy link
Contributor Author

@trivialfis @wbo4958 We can merge this. The test failures are not related to my PR.

@trivialfis trivialfis merged commit d03794c into dmlc:master Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants