[R-package] User-friendly redesign for `lightgbm()` #4968

david-cortes · 2022-01-23T00:48:30Z

This PR redesigns the R interface for lightgbm() so as to make it similar to those of other packages and base R's, while keeping the old lgb.train interface as-is. This redesign includes changes such as:

Allowing a formula interface as well as an X/y interface with non-standard evaluation for arguments.
Allowing data.frame inputs and handling the encoding of categorical variables in the X/y interface.
Auto-detecting binary/multiclass classification according to the type of the target variable.
Renaming function arguments to make them more in line with what is used in core R packages.
Minor details such as ensuring that the prediction outputs have appropriate row and colum names.
Using base-1 indexing in the outputs.
Changing default arguments to match those of Python (e.g. number of threads).
Expanding a bit the documentation.

jameslamb

Thanks very much for your continued interest in LightGBM!

However, I'm not willing to review this pull request in its current form.

As we have politely mentioned to you in the past, such large pull requests with multiple unrelated changes are very difficult to review. Some examples: #4207 (review), #4685 (review), .

As we've also politely mentioned to you several times in the past, making comments like "similar to those of other packages" or "more in line with what is used in core R packages" without providing reference links or specific evidence unnecessarily increases the effort required to review such proposals. Some examples: #4692 (comment), #4692 (review), #4207 (comment), #4812 (review), #4597 (comment), #4681 (comment).

This pull request is proposing several very large breaking changes to the R package's interface, each of which requires careful consideration and conversation about what the benefits of those changes are and whether they are worth the cost of breaking users' existing code. We have an obligation to LightGBM's existing users to understand and carefully consider changes to this project, especially changes that break existing code, and can't do that effectively for so many changes at one time.

We really appreciate all of your help with LightGBM, but need to be protective of the limited time that this project's small group of mostly-volunteer maintainers can devote to maintaining the project.

Could you please make each of the related sets of changes a separate pull request or proposal in an issue, which could be reviewed and discussed independently, and in each please provide specific evidence describing the benefit of the proposed change?

Looking through these changes tonight, I see the following sets of changes which should each be their own pull request / issue:

Moving parameters objective, threads, and metric out of params list in examples and tests and passing it through keyword arguments.
Other changes in tests that move parameters out of params and into keyword arguments of lightgbm::lightgbm().
Changes to argument names in predict.lgb.Booster().
Changing lightgbm::lightgbm() from a regular function to a set of S3 methods that dispatch on the type of data (WITHOUT introducing the new formula support or changing any argument names).
Adding support for formulas.
Automatically setting categorical_features based on data frame column types (as part of [R-package] Accept data frames as inputs #4323).
Handling factor columns in data.frame inputs (as part of [R-package] Accept data frames as inputs #4323)
Changing argument names data and label to X and y in lightgbm::lightgbm().
non-standard evaluation of arguments using eval.parent(), deparse1(), substitute(), etc.
adding column names to predictions
addition of a "class" type of prediction, where predict() returns the predicted class instead of probabilities for classification tasks

StrikerRUS · 2022-01-23T02:39:02Z

@david-cortes Thank you so much for your huge work!

@jameslamb I absolutely agree with you that this PR should be spit into smaller ones. I just want to add that open one big PR with all changes first is a good idea because maintainers can assess overall idea and understand a final state that the proposing changes move the project into. Big PR can be treated as a playground for initial and general discussions (demonstrative alternative to original feature request). Also, it's good to test with CI how all proposed connected changes fit together.

See for example dmlc/xgboost#7545 (comment) or dmlc/xgboost#7309 (comment).

improve R interface

5822655

david-cortes requested review from jameslamb and Laurae2 as code owners January 23, 2022 00:48

david-cortes added 4 commits January 22, 2022 21:59

linter

64998b7

linter

4f37862

avoid using functions introduced in R4.0.0

091324c

linter complaints

7eefc53

jameslamb requested changes Jan 23, 2022

View reviewed changes

jameslamb added the breaking label Jan 23, 2022

david-cortes mentioned this pull request Jun 30, 2022

[R-package] Accept factor labels and use their levels #5341

Merged

trivialfis mentioned this pull request Nov 3, 2022

Address all lintr warnings. dmlc/xgboost#8012

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] User-friendly redesign for `lightgbm()` #4968

[R-package] User-friendly redesign for `lightgbm()` #4968

david-cortes commented Jan 23, 2022

jameslamb left a comment

StrikerRUS commented Jan 23, 2022 •

edited

[R-package] User-friendly redesign for lightgbm() #4968

Are you sure you want to change the base?

[R-package] User-friendly redesign for lightgbm() #4968

Conversation

david-cortes commented Jan 23, 2022

jameslamb left a comment

Choose a reason for hiding this comment

StrikerRUS commented Jan 23, 2022 • edited

[R-package] User-friendly redesign for `lightgbm()` #4968

[R-package] User-friendly redesign for `lightgbm()` #4968

StrikerRUS commented Jan 23, 2022 •

edited