Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Promote number of threads to top-level argument in lightgbm() and change default to number of cores #4972

Merged
merged 16 commits into from
Apr 1, 2022

Conversation

david-cortes
Copy link
Contributor

@david-cortes david-cortes commented Jan 23, 2022

ref #4968

The Python interface for lightgbm allows passing the number of threads to use as a top-level parameter, while the R interface does not. The Python interface also defaults to using all available threads, while the R interface doesn't.

This PR promotes the number of threads towards top-level parameter in lightgbm(), changes the default to match Python's, and explains that a number of "zero" means using the configured OMP threads.

IMHO the number of threads is a very common thing to want to control (probably more so than other top-level parameters such as callbacks), and its default value should be explicit to the user.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the proposal! I'm ok with adding num_threads as a top-level argument to lightgbm::lightgbm(), and with using the built-in {parallel} package to set a default value for it.

Please see my first round of review comments.

In addition, please add unit tests checking this behavior. At least the following:

  • "lightgbm() does not require parameter num_threads"
  • "if num_threads in params and keyword argument to lightgbm() have different values, the value in params is used"
  • "if num_threads is not found in params, lightgbm() uses the value passed to keyword argument num_threads"

Whenever you propose user-facing changes to this project, please add tests on those changes to improve maintainers' confidence and to ensure that future developments in the project don't break the proposed behavior inadvertently.

R-package/R/lightgbm.R Outdated Show resolved Hide resolved
R-package/NAMESPACE Show resolved Hide resolved
R-package/R/lightgbm.R Outdated Show resolved Hide resolved
R-package/R/lightgbm.R Outdated Show resolved Hide resolved
R-package/R/lightgbm.R Outdated Show resolved Hide resolved
R-package/tests/testthat/test_Predictor.R Outdated Show resolved Hide resolved
@david-cortes
Copy link
Contributor Author

Updated.

@david-cortes
Copy link
Contributor Author

Updated, with a fallback to parallel::detectCores when appropriate.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the changes, and the idea to fall back to parallel::detectCores() if RhcpBLASctl::get_num_cores() is not available.

@jameslamb
Copy link
Collaborator

regarding this linter warning (build link)

/home/runner/work/LightGBM/LightGBM/R-package/R/utils.R:223:7: warning: Function "require" is undesirable. As an alternative, library() is preferred to require() because it will raise an error immediately if a package is missing..

Please suppress it with a # nolint comment on the line using require().

@david-cortes
Copy link
Contributor Author

Thanks very much for the changes, and the idea to fall back to parallel::detectCores() if RhcpBLASctl::get_num_cores() is not available.

  • please remove all the added num_threads = 1 in docs / examples /tests not otherwise touched by this PR, per the request in #4972 (comment)

  • please preserve the alphabetical ordering of imports in DESCRIPTION

    • {RhpcBLASctl} should be between {processx} and {rmarkdown}
    • {parallel} should be between {methods} and {utils}

Aren't capitalized letters supposed to come before small-case letters?

@jameslamb
Copy link
Collaborator

Aren't capitalized letters supposed to come before small-case letters?

maybe in some types of sorting, but alphabetizing that list is mainly intended to benefit developers on this project visually inspecting that file, and I don't always remember if, for example, {rmarkdown} begins with a capital or lowercase "r".

@david-cortes
Copy link
Contributor Author

Updated.

@david-cortes
Copy link
Contributor Author

Looks like RhpcBLASctl needs to be added somewhere in the CI configuration.

@jameslamb
Copy link
Collaborator

Looks like RhpcBLASctl needs to be added somewhere in the CI configuration.

Yes please. I think these are all the places:

$packages = "c('data.table', 'jsonlite', 'knitr', 'Matrix', 'processx', 'R6', 'rmarkdown', 'testthat'), dependencies = c('Imports', 'Depends', 'LinkingTo')"

RDscript${{ matrix.r_customization }} -e "install.packages(c('R6', 'data.table', 'jsonlite', 'knitr', 'Matrix', 'rmarkdown', 'testthat'), repos = 'https://cran.r-project.org', Ncpus = parallel::detectCores())"

Rscript -e "install.packages(c('R6', 'data.table', 'jsonlite', 'knitr', 'Matrix', 'rmarkdown', 'roxygen2', 'testthat'), repos = 'https://cran.r-project.org', Ncpus = parallel::detectCores())"

Rscript -e "install.packages(c('R6', 'data.table', 'jsonlite', 'knitr', 'Matrix', 'rmarkdown', 'rhub', 'testthat'), dependencies = c('Depends', 'Imports', 'LinkingTo'), repos = 'https://cran.r-project.org', Ncpus = parallel::detectCores())" || exit -1

packages="c('data.table', 'jsonlite', 'knitr', 'Matrix', 'R6', 'rmarkdown', 'testthat')"

RDscriptvalgrind -e "install.packages(c('R6', 'data.table', 'jsonlite', 'knitr', 'Matrix', 'rmarkdown', 'testthat'), repos = 'https://cran.r-project.org', Ncpus = parallel::detectCores())" || exit -1

RDscript -e "install.packages(c('R6', 'data.table', 'jsonlite', 'knitr', 'Matrix', 'rmarkdown'), lib = '${R_LIB_PATH}', dependencies = c('Depends', 'Imports', 'LinkingTo'), repos = 'https://cran.r-project.org', Ncpus = parallel::detectCores())" || exit -1

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some suggestions for how to get the R linter to pass, and a suggestion to please remove unrelated changes from this PR.

R-package/R/utils.R Outdated Show resolved Hide resolved
R-package/R/utils.R Outdated Show resolved Hide resolved
R-package/tests/testthat/test_basic.R Outdated Show resolved Hide resolved
@david-cortes
Copy link
Contributor Author

Updated.

@david-cortes david-cortes changed the title [R-package] Promote number of threads to top-level argument in lightgbm() and make default match with Python's [R-package] Promote number of threads to top-level argument in lightgbm() and change default to number of cores Mar 31, 2022
@jameslamb jameslamb self-requested a review April 1, 2022 00:21
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thanks very much for your help with this, and with the corresponding changes to the Python package in #5105.

@jameslamb jameslamb merged commit 33eb037 into microsoft:master Apr 1, 2022
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants