Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model reproduciblility with histogram tree method. #5023

Closed
3 of 4 tasks
trivialfis opened this issue Nov 7, 2019 · 5 comments
Closed
3 of 4 tasks

Model reproduciblility with histogram tree method. #5023

trivialfis opened this issue Nov 7, 2019 · 5 comments

Comments

@trivialfis
Copy link
Member

trivialfis commented Nov 7, 2019

A link to the original issue: dask/dask-xgboost#37

TODOS:

  • CPU/GPU deterministic scattered add for building histogram.
  • Verify allreduce is deterministic.
  • Verify cub BlockSum is deterministic.
  • Distributed environment.

Related:
#4204
#3921
#3707

@trivialfis
Copy link
Member Author

It would be nice to test for hist and gpu_hist too, as these two are most used in production env.

@mmccarty
Copy link

Thank you @trivialfis Very interested to find out what's going on here.

@trivialfis
Copy link
Member Author

Single node GPU hist for regression and classification is now deterministic.

@trivialfis
Copy link
Member Author

Remaining issues are dask partitioning functions and GPU ranking. Ranking is tracked in #5561 . Dask partitioning still needs some more investigation.

@trivialfis
Copy link
Member Author

The histogram method inside xgboost is bit to bit reproducible now. Remaining question is in dask data partitioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants