Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Fix global feature importance and predict with 1 sample. #7394

Merged
merged 3 commits into from Nov 5, 2021

Conversation

trivialfis
Copy link
Member

  • Add implementation for tree index. The parameter is not documented in C API since we
    should work on porting the model slicing to R instead of supporting more use of tree
    index.

  • Fix the difference between "gain" and "total_gain".

Related: #7260 (comment).

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".
@trivialfis trivialfis mentioned this pull request Nov 4, 2021
8 tasks
@trivialfis
Copy link
Member Author

@hcho3 @hetong007 Please take a look when you are available.

@hetong007
Copy link
Member

Just to confirm, with this patch, xgboost won't break EIX/radiant.model, right?

@trivialfis
Copy link
Member Author

I ran devtools::check() on radiant.model and it passed. For EIX, the vignettes failed to build and it doesn't have any test in its repository https://github.com/ModelOriented/EIX .

  installing the package to build vignettes
E  creating vignettes (26.5s)
   --- re-building ‘EIX.Rmd’ using rmarkdown
      [[ suppressing 19 column names 'satisfaction_level', 'last_evaluation', 'number_project' ... ]]
   Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider increasing max.overlaps
   Quitting from lines 157-165 (EIX.Rmd) 
   Error: processing vignette 'EIX.Rmd' failed with diagnostics:
   non-numeric matrix extent
   --- failed re-building ‘EIX.Rmd’
   
   --- re-building ‘titanic_data.Rmd’ using rmarkdown
   Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider increasing max.overlaps
   Quitting from lines 81-86 (titanic_data.Rmd) 
   Error: processing vignette 'titanic_data.Rmd' failed with diagnostics:
   non-numeric matrix extent
   --- failed re-building ‘titanic_data.Rmd’
   
   SUMMARY: processing the following files failed:
     ‘EIX.Rmd’ ‘titanic_data.Rmd’
   
   Error: Vignette re-building failed.
   Execution halted
Error in (function (command = NULL, args = character(), error_on_status = TRUE,  : 
  System command 'R' failed, exit status: 1, stdout + stderr (last 10 lines):

@trivialfis
Copy link
Member Author

Same error as you have shared. Rerunning tests with 1.4

@trivialfis
Copy link
Member Author

trivialfis commented Nov 4, 2021

@hcho3 sorry, pushed a new commit for the fix in prediction leaf, where input is only one sample but we need to return a matrix instead of vector. I rewrote the prediction conditions to mimic the old code exactly and ran tests with those reverse dependencies.

@trivialfis trivialfis changed the title [R] Fix global feature importance. [R] Fix global feature importance and predict with 1 sample. Nov 4, 2021
@trivialfis
Copy link
Member Author

@hetong007 I have tested both packages using devtools.

@hetong007 hetong007 merged commit c968217 into dmlc:master Nov 5, 2021
@trivialfis trivialfis deleted the fix-R-gfi branch November 5, 2021 07:16
@trivialfis
Copy link
Member Author

I will back port

trivialfis added a commit to trivialfis/xgboost that referenced this pull request Nov 5, 2021
* [R] Fix global feature importance.

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".

* debug.

* Fix prediction.
trivialfis added a commit that referenced this pull request Nov 5, 2021
…e. (#7394) (#7397)

* [R] Fix global feature importance.

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".

* debug.

* Fix prediction.
@trivialfis trivialfis added this to 1.5.1 Done in 2.0 Roadmap Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants