Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global configuration could manage global number of threads #10027

Closed
david-cortes opened this issue Feb 1, 2024 · 4 comments
Closed

Global configuration could manage global number of threads #10027

david-cortes opened this issue Feb 1, 2024 · 4 comments

Comments

@david-cortes
Copy link
Contributor

XGBoost has a global configuration state:
https://xgboost.readthedocs.io/en/stable/parameter.html#global-configuration

Some places in the code try to retrieve a global number of threads, such as here:

std::int32_t n_threads = ctx->Threads();

The default number of threads is determined from OpenMP, which is a reasonable thing to do:

n_threads = std::min(omp_get_num_procs(), omp_get_max_threads());

But being a global configuration, would be more logical to allow users to modify it through the functions that are named as such.

It would make things better for the R interface, since currently there are workarounds like these which result in degraded performance for users:

n_threads = std::min(2, n_threads);

If it were a global option, should be possible to adopt a mechanism like LightGBM's which changes the number of configured threads before tests and examples, but doesn't have the lines that change the threads in the user-visible part of the examples, so that it executes with reduced threads on CRAN but doesn't suggest doing so to the user.

Example:
https://github.com/microsoft/LightGBM/blob/252828fd86627d7405021c3377534d6a8239dd69/R-package/R/lgb.Dataset.R#L783

@trivialfis
Copy link
Member

After working on distributed systems, I'm a bit afraid of this global variable, it's global and it's thread local. This results in the fact that it needs special handling for distributed systems, and it fails without possible fix when user launches a new thread without setting the value again.

@david-cortes
Copy link
Contributor Author

After working on distributed systems, I'm a bit afraid of this global variable, it's global and it's thread local. This results in the fact that it needs special handling for distributed systems, and it fails without possible fix when user launches a new thread without setting the value again.

But a failure here would be no issue, since then it could just default back to OMP's config.

The idea would be to use this as a means to override the OMP config, so that it would check for this first, and then if it's not set, would look at the OMP config.

@trivialfis
Copy link
Member

trivialfis commented Feb 1, 2024

since then it could just default back to OMP's config.

Depending on your perspective, this change of behavior can be considered a bug. Let's not make things that can surprise people. In general, global variables are not very desirable.

The idea would be to use this as a means to override the OMP config

I think booster local variable is fine. It's a bit inconvenient when writing tests and examples, but it's really a CRAN issue instead of a real world problem. We can workaround it by setting global OMP environment variable before running any test, XGBoost respects those variables.

def test_with_omp_thread_limit(self):

@trivialfis
Copy link
Member

I think we can close this now that #10028 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants