Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in CLI binary classification demo #2638

Closed
eroicaleo opened this issue Aug 25, 2017 · 8 comments
Closed

Segmentation fault in CLI binary classification demo #2638

eroicaleo opened this issue Aug 25, 2017 · 8 comments

Comments

@eroicaleo
Copy link

For bugs or installation issues, please provide the following information.
The more information you provide, the more easily we will be able to offer
help and advice.

Environment info

Operating System: OS X El Capitan Version 10.11.6

Compiler: clang-800.0.42.1

Package used (python/R/jvm/C++): C++

xgboost version used: 0.6

If installing from source, please provide

  1. The commit hash (git rev-parse HEAD): 70071fc

Steps to reproduce

  1. git clone --recursive https://github.com/dmlc/xgboost xgboost
  2. cd xgboost; cp make/minimum.mk ./config.mk; make -j4
  3. cd demo/binary_classification/
  4. ./runexp.sh

logs:

[00:07:11] 6541x127 matrix with 143902 entries loaded from agaricus.txt.train
[00:07:11] 1583x127 matrix with 34826 entries loaded from agaricus.txt.test
[00:07:11] src/cli_main.cc:198: Loading data: 0.021951 sec
[00:07:11] boosting round 0, 4.49982e-08 sec elapsed
[00:07:11] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[00:07:11] [0]	test-error:0.015793	train-error:0.014524
[00:07:11] boosting round 1, 0.00579598 sec elapsed
[00:07:11] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 0 pruned nodes, max_depth=3
[00:07:11] [1]	test-error:0.000000	train-error:0.001223
[00:07:11] update end, 0.00959829 sec in all
[00:07:11] 1583x127 matrix with 34826 entries loaded from agaricus.txt.test
[00:07:11] start prediction...
./runexp.sh: line 9:  8938 Segmentation fault: 11  ../../xgboost mushroom.conf task=pred model_in=0002.model
@eroicaleo
Copy link
Author

Adding the following line at the end of Load method in learner.cc
seems to be able get rid of the segmentation fault.

diff --git a/src/learner.cc b/src/learner.cc
index 689e1f9..c35e285 100644
--- a/src/learner.cc
+++ b/src/learner.cc
@@ -314,6 +314,7 @@ class LearnerImpl : public Learner {
     cfg_["num_class"] = common::ToString(mparam.num_class);
     cfg_["num_feature"] = common::ToString(mparam.num_feature);
     obj_->Configure(cfg_.begin(), cfg_.end());
+    gbm_->Configure(cfg_.begin(), cfg_.end());
   }

@demobin8
Copy link

core dump on predicting

[19:33:44] 287x79 matrix with 21321 entries loaded from train.feat.test
[19:33:44] start prediction...
段错误 (核心已转储)

Environment info

ubuntu16.04
gcc-5.4.0
install from source git rev-parse HEAD

Step to reproduce

~/work/github/xgboost/xgboost houseprice.conf task=pred model_in=0002.model

by follow @eroicaleo 's solution, predict success

@japinder007
Copy link

I am getting this error as well on running the binary_classification demo.

Stack trace

Thread 1 "xgboost" received signal SIGSEGV, Segmentation fault.
0x0000000000594ae1 in xgboost::gbm::GBTree::PredictBatch(xgboost::DMatrix*, std::vector<float, std::allocator >, unsigned int) ()
(gdb) bt
#0 0x0000000000594ae1 in xgboost::gbm::GBTree::PredictBatch(xgboost::DMatrix
, std::vector<float, std::allocator >, unsigned int) ()
#1 0x0000000000423ab6 in xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix
, std::vector<float, std::allocator >, unsigned int) const ()
#2 0x0000000000423b87 in xgboost::LearnerImpl::Predict(xgboost::DMatrix
, bool, std::vector<float, std::allocator >*, unsigned int, bool, bool) const ()
#3 0x000000000040d086 in xgboost::CLIPredict(xgboost::CLIParam const&) ()
#4 0x0000000000410d36 in xgboost::CLIRunTask(int, char**) ()
#5 0x00007ffff6d4d830 in __libc_start_main (main=0x408400

, argc=4, argv=0x7fffffffe508,
init=, fini=, rtld_fini=, stack_end=0x7fffffffe4f8)
at ../csu/libc-start.c:291
#6 0x000000000040c729 in _start ()

@rydevera3
Copy link

rydevera3 commented Sep 14, 2017

@eroicaleo - I can confirm that this solution also worked for me, although I have no idea what it is actually doing. I was running this from my macbook with macOS Sierra. The only thing I want to add is that I had to re-run the make -j4 command for the solution to work.

UPDATE
Wanted to add one more note that I wasn't actually running the binary classification tutorial but the learning to rank (objective=pairwise:rank) tutorial.

@zhouquan03
Copy link

zhouquan03 commented Sep 26, 2017

I also have this error.

Here is my system infomation:
$ cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)

Then run demo:
$ cd demo/regression

$ ./runexp.sh
[17:22:39] 169x36 matrix with 1183 entries loaded from machine.txt.train
[17:22:39] 40x36 matrix with 280 entries loaded from machine.txt.test
[17:22:39] src/cli_main.cc:198: Loading data: 0.058177 sec
[17:22:39] boosting round 0, 4.76837e-07 sec elapsed
[17:22:39] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[17:22:39] [0] test-rmse:91.562714
[17:22:39] boosting round 1, 0.198539 sec elapsed
[17:22:39] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[17:22:39] [1] test-rmse:71.693672
[17:22:39] update end, 0.371482 sec in all
[17:22:40] 40x36 matrix with 280 entries loaded from machine.txt.test
[17:22:40] start prediction...
./runexp.sh: line 9: 6346 Segmentation fault (core dumped) ../../xgboost machine.conf task=pred model_in=0002.model
booster[0]:
0:[MMAX<22486] yes=1,no=2,missing=1
1:[CACH<28] yes=3,no=4,missing=3
3:[MMAX<10001] yes=7,no=8,missing=7
7:leaf=29.2258
8:leaf=66.88
4:[CACH<97] yes=9,no=10,missing=9
9:leaf=106.348
10:leaf=194.917
2:[MMAX<48001] yes=5,no=6,missing=5
5:[CACH<57] yes=11,no=12,missing=11
11:leaf=166.273
12:leaf=349.179
6:leaf=673.375
booster[1]:
0:[CACH<81] yes=1,no=2,missing=1
1:[MMIN<4621] yes=3,no=4,missing=3
3:[CHMIN<11] yes=7,no=8,missing=7
7:leaf=-0.967454
8:leaf=-33.8347
4:[vendor=ibm] yes=10,no=9
9:leaf=14.3699
10:leaf=87.0657
2:[MMAX<48001] yes=5,no=6,missing=5
5:[MYCT<81] yes=11,no=12,missing=11
11:leaf=86.3795
12:leaf=-24.2778
6:leaf=237.083

Here is gdb output:
$ gdb ../../xgboost core.5501
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
...
...
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `../../xgboost machine.conf task=pred model_in=0002.model'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000055203e in xgboost::gbm::GBTree::PredictBatch(xgboost::DMatrix*, std::vector<float, std::allocator >*, unsigned int) ()
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libgomp-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64

@suntzu86
Copy link
Contributor

suntzu86 commented Feb 5, 2018

Ping? We're seeing the same issue. Just changing 'objective' from 'reg:linear' to 'reg:logistic' or 'binary:logistic' causes a segfault in an attempted use-case.

@eroicaleo 's comment above seems reasonable, esp given this comment above the block he's noting:
https://github.com/dmlc/xgboost/blob/master/src/learner.cc#L304
which indicates this logic is supposed to mirror LazyInitModel. However, LazyInitModel calls Configure on gbm_ and obj_:
https://github.com/dmlc/xgboost/blob/master/src/learner.cc#L537
As does the LearnerImpl::Configure method:
https://github.com/dmlc/xgboost/blob/master/src/learner.cc#L258

@tqchen this looks like some pretty long-standing code... is the fix just @eroicaleo 's suggestion? Seems troubling for classification/ranking to be broken in this way.

@khotilov
Copy link
Member

khotilov commented Feb 7, 2018

But does anyone care enough to submit a PR? :)

@hcho3
Copy link
Collaborator

hcho3 commented Feb 9, 2018

@suntzu86 I cannot seem to reproduce the issue with the latest master. Can you post a script that reproduces the bug? Thanks!

@tqchen tqchen closed this as completed Jul 4, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Oct 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants