Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- create a gpu metrics (internal) registry #5387

Merged
merged 4 commits into from Mar 7, 2020

Conversation

sriramch
Copy link
Contributor

@sriramch sriramch commented Mar 3, 2020

  • if this approach is palatable, this can be rolled to other areas as well
  • the objective is to separate the cpu and gpu implementations (where they digress significantly) such that they evolve independently. to that end, this approach will:
    • preserve the same metrics configuration (from the end user perspective)
    • internally delegate the responsibility to the gpu metrics builder when there is a
      valid device present
    • decouple the gpu metrics builder from the cpu ones to prevent misuse (thus not impacting other areas of the code). the locality of gpu reference is strictly within the metric builders registered with the metric registry
    • move away from including the cuda file from within the cc file and segregate the code
      via ifdef's

next, i'll create a pr for the ranking metric implementation on the gpu

@RAMitchell @trivialfis - please review

  - if this approach is palatable, this can be rolled to other areas as well
  - the objective is to separate the cpu and gpu implementations such that they evolve
    indepedently. to that end, this approach will:
    - preserve the same metrics configuration (from the end user perspective)
    - internally delegate the responsibility to the gpu metrics builder when there is a
      valid device present
    - decouple the gpu metrics builder from the cpu ones to prevent misuse
    - move away from including the cuda file from within the cc file and segregate the code
      via ifdef's

nest, i'll create a pr for the metric implementation on the gpu
@hcho3
Copy link
Collaborator

hcho3 commented Mar 3, 2020

the objective is to separate the cpu and gpu implementations (where they digress significantly) such that they evolve independently

Do you have a concrete use case in mind?
@sriramch

@mli
Copy link
Member

mli commented Mar 3, 2020

Codecov Report

Merging #5387 into master will increase coverage by 0.12%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5387      +/-   ##
==========================================
+ Coverage   84.07%   84.19%   +0.12%     
==========================================
  Files          11       11              
  Lines        2411     2411              
==========================================
+ Hits         2027     2030       +3     
+ Misses        384      381       -3     
Impacted Files Coverage Δ
python-package/xgboost/tracker.py 95.18% <0.00%> (+1.20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5dc8e89...f522124. Read the comment docs.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other situations like this we have been deferring configurations of GPU/CPU algorithms to the learner and I'm wondering if we should do the same here. The learner holds most information about the learning task and configuration and might be better suited to dispatching the correct algorithm.

Another disadvantage of this approach is having to define duplicated code in each CPU metric for dispatching GPU algorithms.

So as an alternative I would propose separating the objectives into distinct GPU/CPU implementations as you have done, but selecting them inside the learner. @trivialfis WDYT?

@sriramch
Copy link
Contributor Author

sriramch commented Mar 3, 2020

the objective is to separate the cpu and gpu implementations (where they digress significantly) such that they evolve independently

Do you have a concrete use case in mind?
@sriramch

@hcho3 the metrics and the objective ranking implementations are the ones i'd in mind for this.

@sriramch
Copy link
Contributor Author

sriramch commented Mar 3, 2020

In other situations like this we have been deferring configurations of GPU/CPU algorithms to the learner and I'm wondering if we should do the same here. The learner holds most information about the learning task and configuration and might be better suited to dispatching the correct algorithm.

[sc] the reason i did this here is for the specific component - metric/objective etc. to choose independently if it needs to be accelerated on the gpu, as opposed to having the learner choose it for them. not all metrics are accelerated on the gpu and it was best left for that metric to decide independently.

Another disadvantage of this approach is having to define duplicated code in each CPU metric for dispatching GPU algorithms.

[sc] further, even within a given metric, we only move the computation on the gpu only when a certain condition is met. this also has to be decided independently on a metric-by-metric (or by a component) basis by that component. punting all this to the learner could increase coupling significantly.

@RAMitchell
Copy link
Member

Possibly another way to do this without the registry would be to create metrics.h metrics.cc metrics.cu, put the struct EvalAuc : public Metric declaration in the header and define methods for GPU/CPU computation, with implementations in respective cc/cu files. I guess this only works when the class has no GPU only member variables.

@sriramch
Copy link
Contributor Author

sriramch commented Mar 4, 2020

Possibly another way to do this without the registry would be to create metrics.h metrics.cc metrics.cu, put the struct EvalAuc : public Metric declaration in the header and define methods for GPU/CPU computation, with implementations in respective cc/cu files. I guess this only works when the class has no GPU only member variables.

[sc] some of the metrics use the template method pattern that drives the logic in one place, and deferring part of functionality to the sub classes or other meta types (to prevent virtual on device). this can get hairy with this approach.

the gpu registry approach keeps it fairly isolated and can easily delegate responsibility to the gpu classes as long as they are registered with the same name as their cpu counterparts (which is its only requirement). what is the main objection to this approach (other than that it is non traditional)?

@trivialfis
Copy link
Member

In other situations like this we have been deferring configurations of GPU/CPU algorithms to the learner

Please don't. The Learner is already very complicated as it's.

Just out of curious, why do we need a class to carry out computing metric? Looking through the code there's no class states. Looking at #5326 , the whole class is just a function.

@sriramch
Copy link
Contributor Author

sriramch commented Mar 4, 2020

Just out of curious, why do we need a class to carry out computing metric? Looking through the code there's no class states. Looking at #5326 , the whole class is just a function.

[sc] the metric class has state and there is ranking config (parsed out of the metric name). these can evolve, and if we made them as free functions, we would have to pass them all to every metric function that is supported on the gpu. hence, they do have some state.

further, upon looking closely, i find that this pattern isn't without precedent. it looks like page formats, tree io etc. all leverage the registry mechanics to cleanly delegate responsibility.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me. We will want to make sure binary classification is supported for AUC in the next PR.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the approach. Now we need to be sure there's no problem during changing between CPU and GPU. Please be aware of that, this can happen after serialization(like pickling) and unserialization on another worker.

sriramch added a commit to sriramch/xgboost that referenced this pull request Mar 6, 2020
  - this is the last part of dmlc#5326 that has been split
  - this also includes dmlc#5387
@RAMitchell RAMitchell merged commit 1ba6706 into dmlc:master Mar 7, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants