New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault with Python package for the approximate method only #1133
Comments
interesting, if you can do gdb and get the backtrace of where the segfault happens, we can take a more careful look. |
Hi, I don't know much about c++ so I'm not sure I understand this message. It seems related to some lib, but I don't really get how I could fix it. |
Make sure you update the most recent version, specifically pull the most recent version of rabit. This should have been solved in most recent version |
I've installed the lib last Monday with a clean clone and used the setup.py file from the package. Did you change anything since then ? |
In the latest version of rabit https://github.com/dmlc/rabit/blob/849b20b7c822d194a515cc5587c37764cdf39385/src/allreduce_robust.cc#L82 If you are not in distributed mode, the TryAllreduceScatterRing won't be executed from allreduce function. I fixed in in sometime. But somehow in your case the code still get into this function |
Hi, |
I would add that I am also experiencing this behavior with the approximate algorithm, although it seems to be pretty inconsistent with how many observations it can handle; I was feeding 40 million points in a couple days ago, and now it's choking on 7 million from the same dataset. It seems to work fine when set to exact, although that would certainly seem to defeat the purpose of approximation for larger datasets! A clean clone/install didn't seem to help me, either. |
I can confirm this error.
I tried to enforce robust allreduce without any luck. |
everything works fine if linked against |
please check if latest change fixed this problem #1186 |
seems to work just fine now |
it seems to work fine now, so I close the issue. |
Hi everyone,
I'm trying to use xgboost for a classification task with fairly big data (25M rows in the training set, the libsvm file is 2.2Go on disk), using the python package.
It works fine when I set the tree_method to 'exact' but I have a segmentation fault with the 'approx' tree_method.
Initially I thought that it was related to the high usage of RAM (I'm using a computer with 32Go RAM), so I'm using the version with external memory (https://github.com/dmlc/xgboost/blob/master/doc/external_memory.md) which create cache files correctly. But I still have a segfault.
I've tried to launch my model with xgboost directly (without using the python package) and it works for both the exact and the approx methods (although it's quite slow).
Here is the code that I'm using in python :
dtrain = xgb.DMatrix('/path/to/data/data_train_libsvm#dtrain.cache')
dval = xgb.DMatrix('/path/to/data/data_val_libsvm#dval.cache')
param = {'booster':'gbtree','silent':0, 'nthread':8,
'eta':0.1, 'max_depth':6, 'subsample':0.8, 'colsample_bytree':0.8, 'scale_pos_weight':12000,
'objective':'binary:logistic', 'eval_metric':'auc' }
watchlist = [(dtrain,'train'), (dval,'eval')]
num_round = 300
bst = xgb.train(param, dtrain, num_round, watchlist, early_stopping_rounds=30)
I've tried to track down the error and it appears during the first booster update : line 750 in file core.py when calling _LIB.XGBoosterUpdateOneIter
If anyone has an idea of what could be going on, I would be super greatful !
Thanks,
Cheers,
Imen
The text was updated successfully, but these errors were encountered: