Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu_hist integer overflows and OOM issues #6228

Closed
RAMitchell opened this issue Oct 13, 2020 · 4 comments
Closed

gpu_hist integer overflows and OOM issues #6228

RAMitchell opened this issue Oct 13, 2020 · 4 comments

Comments

@RAMitchell
Copy link
Member

This issue tracks gpu_hist bugs relating to large workloads, uncovered in recent experiments on the mortgage dataset.

  • Thrust copy_if has an integer overflow when n_rows*n_cols > 2^31. Loop over copy_if #6201 implements a workaround by iterating over batches.

  • Memory usage has possibly increased since version 1.0-1.1, leading to OOM on 32gb devices even with the above fix. We should do some analysis of peak memory usage over versions on a large synthetic workload, checking for regressions.

  • DaskDeviceQuantileDMatrix has integer overflow bugs related to thrust::inclusive_scan, occurring when dask chunk sizes exceed 2^31.

To prevent this occurring in future we can try unit tests on large sizes, checking for overflow or memory issues. These tests need to be carefully designed to not be flaky (e.g. only run on a machine with sufficient memory) and to run quickly (<1-2 seconds).

@hcho3
Copy link
Collaborator

hcho3 commented Nov 3, 2020

@RAMitchell Can we close this now?

@jamescolless
Copy link

Sorry, just to confirm, does the 2^31-1 limitation also occur in DeviceQuantileDMatrix or only in DaskDeviceQuantileDMatrix?

@trivialfis
Copy link
Member

Both. For Dask version, the limitation is for each partition.

@trivialfis
Copy link
Member

It's fixed in latest master, the fix should go in next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants