Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Python] Categorical CUDA fails on a data validation check if there's a float column containing only NaNs #10089

Open
OneForward opened this issue Mar 5, 2024 · 1 comment

Comments

@OneForward
Copy link

Hi, I found a Python-version bug exactly the same as this R-version bug.

Minimal code to reproduce is shown below,

import pandas as pd 
import numpy as np 
import xgboost as xgb 
print(xgb.__version__)
X = pd.DataFrame({'category_column': [0, 0, 0, -1]}, dtype='category')
X['na_column'] = np.nan 
X = X[['na_column', 'category_column']]
y = pd.DataFrame({'label': [0, 0, 0, 0]})

dtrain = xgb.DMatrix(X, y, enable_categorical=True)
booster = xgb.train({'tree_method': 'hist', 'device': 'cuda'}, dtrain)

This script with xgboost packge version 2.0.1 running on a GPU machine would probably outputs the following, Check failed: max_cat + 1 >= n_categories (1 vs. 2) : Maximum cateogry should not be lesser than the total number of categories..

2.0.1
---------------------------------------------------------------------------
XGBoostError                              Traceback (most recent call last)
Cell In[1], line 12
      9 y = pd.DataFrame({'label': [0, 0, 0, 0]})
     11 dtrain = xgb.DMatrix(X, y, enable_categorical=True)
---> 12 booster = xgb.train({'tree_method': 'hist', 'device': 'cuda'}, dtrain)

File ~/python3.11/site-packages/xgboost/core.py:729, in require_keyword_args..throw_if..inner_f(*args, **kwargs)
    727 for k, arg in zip(sig.parameters, args):
    728     kwargs[k] = arg
--> 729 return func(**kwargs)

File ~/python3.11/site-packages/xgboost/training.py:181, in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, custom_metric)
    179 if cb_container.before_iteration(bst, i, dtrain, evals):
    180     break
--> 181 bst.update(dtrain, i, obj)
    182 if cb_container.after_iteration(bst, i, dtrain, evals):
    183     break

File ~/python3.11/site-packages/xgboost/core.py:2049, in Booster.update(self, dtrain, iteration, fobj)
   2046 self._assign_dmatrix_features(dtrain)
   2048 if fobj is None:
-> 2049     _check_call(
   2050         _LIB.XGBoosterUpdateOneIter(
   2051             self.handle, ctypes.c_int(iteration), dtrain.handle
   2052         )
   2053     )
   2054 else:
   2055     pred = self.predict(dtrain, output_margin=True, training=True)

File ~/python3.11/site-packages/xgboost/core.py:281, in _check_call(ret)
    270 """Check the return value of C API call
    271 
    272 This function will raise exception when error occurs.
   (...)
    278     return value from API calls
    279 """
    280 if ret != 0:
--> 281     raise XGBoostError(py_str(_LIB.XGBGetLastError()))

XGBoostError: [11:32:42] /workspace/src/tree/updater_gpu_hist.cu:781: Exception in gpu_hist: [11:32:42] /workspace/src/common/categorical.h:82: Check failed: max_cat + 1 >= n_categories (1 vs. 2) : Maximum cateogry should not be lesser than the total number of categories.
Stack trace:
  [bt] (0) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x7f0c9a) [0x7fe652671c9a]
  [bt] (1) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x7f41a2) [0x7fe6526751a2]
  [bt] (2) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x792c67) [0x7fe652613c67]
  [bt] (3) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x83f932) [0x7fe6526c0932]
  [bt] (4) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x83fef2) [0x7fe6526c0ef2]
  [bt] (5) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x41589e) [0x7fe65229689e]
  [bt] (6) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb08679) [0x7fe652989679]
  [bt] (7) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb085c3) [0x7fe6529895c3]
  [bt] (8) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb40297) [0x7fe6529c1297]



Stack trace:
  [bt] (0) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb27f2a) [0x7fe6529a8f2a]
  [bt] (1) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0xb485c9) [0x7fe6529c95c9]
  [bt] (2) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x460c79) [0x7fe6522e1c79]
  [bt] (3) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x46176c) [0x7fe6522e276c]
  [bt] (4) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x4c54f7) [0x7fe6523464f7]
  [bt] (5) ~/python3.11/site-packages/xgboost/lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7fe651fe2ef0]
  [bt] (6) ~/python3.11/lib-dynload/../../libffi.so.8(+0xa052) [0x7fe6d35be052]
  [bt] (7) ~/python3.11/lib-dynload/../../libffi.so.8(+0x8925) [0x7fe6d35bc925]
  [bt] (8) ~/python3.11/lib-dynload/../../libffi.so.8(ffi_call+0xde) [0x7fe6d35bd06e]
@trivialfis
Copy link
Member

Thank you for raising the issue, I will look into it, can reproduce it in 2.0 but not with the latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants