New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage jumps to 50G when trying to predict #6659
Comments
Just confirming the data loading is correct, your data has 3781180 columns? |
the original training dataset has less than 100 columns, but there are some high cardinality categoricals which due to 1-hot encoding lead to this many columns in the xgboost training set |
@ShvetsKS Would you like to help taking a look? I think the thread optimization spikes up the memory usage. A better way to handle this might be putting some thoughts on extreme sparse dataset. Right now you can try setting As a side note, #6503 should help removing the 1-hot encoding. |
|
setting nthread to 1 helped to work around the issue |
@trivialfis memory usage was increased as currently we process
|
I have a fairly small booster and data-set, when trying to do prediction on this dataset the memory usage jumps to 50GB.
Here is the code to reproduce:
Data used to reproduce attached.
data.zip
The text was updated successfully, but these errors were encountered: