New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU predictor should throw an error when categorical splits are present #6488
Comments
The example script is salvaged by saving memory snapshot with pickle: import pandas as pd
import numpy as np
import xgboost as xgb
import pickle
rng = np.random.default_rng(seed=0)
x0 = rng.integers(low=0, high=3, size=20)
x1 = rng.integers(low=0, high=5, size=20)
noise = rng.normal(loc=0, scale=0.1, size=20)
df = pd.DataFrame({'x0': x0, 'x1': x1}).astype('category')
X = np.column_stack((x0, x1))
y = (x0 * 10 - 20) + (x1 - 2) + noise
params = {'tree_method': 'gpu_hist',
'predictor': 'gpu_predictor',
'enable_experimental_json_serialization': True,
'max_depth': 6,
'learning_rate': 1.0}
dtrain = xgb.DMatrix(df, label=y, enable_categorical=True)
bst = xgb.train(params, dtrain, num_boost_round=5, evals=[(dtrain, 'train')])
pred = bst.predict(dtrain)
with open('serialized.pkl', 'wb') as f:
pickle.dump(bst, f)
with open('serialized.pkl', 'rb') as f:
bst2 = pickle.load(f)
pred2 = bst2.predict(dtrain)
np.testing.assert_almost_equal(pred, pred2) # this passes In addition, manually saving configuration with import pandas as pd
import numpy as np
import xgboost as xgb
rng = np.random.default_rng(seed=0)
x0 = rng.integers(low=0, high=3, size=20)
x1 = rng.integers(low=0, high=5, size=20)
noise = rng.normal(loc=0, scale=0.1, size=20)
df = pd.DataFrame({'x0': x0, 'x1': x1}).astype('category')
X = np.column_stack((x0, x1))
y = (x0 * 10 - 20) + (x1 - 2) + noise
params = {'tree_method': 'gpu_hist',
'predictor': 'gpu_predictor',
'enable_experimental_json_serialization': True,
'max_depth': 6,
'learning_rate': 1.0}
dtrain = xgb.DMatrix(df, label=y, enable_categorical=True)
bst = xgb.train(params, dtrain, num_boost_round=5, evals=[(dtrain, 'train')])
pred = bst.predict(dtrain)
bst.save_model('serialized.json')
with open('config.json', 'w') as f:
f.write(bst.save_config())
bst2 = xgb.Booster(model_file='./serialized.json')
with open('config.json', 'r') as f:
config = f.read()
bst2.load_config(config)
pred2 = bst2.predict(dtrain)
np.testing.assert_almost_equal(pred, pred2) # this passes |
It suffices to set import pandas as pd
import numpy as np
import xgboost as xgb
rng = np.random.default_rng(seed=0)
x0 = rng.integers(low=0, high=3, size=20)
x1 = rng.integers(low=0, high=5, size=20)
noise = rng.normal(loc=0, scale=0.1, size=20)
df = pd.DataFrame({'x0': x0, 'x1': x1}).astype('category')
X = np.column_stack((x0, x1))
y = (x0 * 10 - 20) + (x1 - 2) + noise
params = {'tree_method': 'gpu_hist',
'predictor': 'gpu_predictor',
'enable_experimental_json_serialization': True,
'max_depth': 6,
'learning_rate': 1.0}
dtrain = xgb.DMatrix(df, label=y, enable_categorical=True)
bst = xgb.train(params, dtrain, num_boost_round=5, evals=[(dtrain, 'train')])
pred = bst.predict(dtrain)
bst.save_model('serialized.json')
bst2 = xgb.Booster(model_file='./serialized.json')
bst2.set_param({'predictor': 'gpu_predictor'})
pred2 = bst2.predict(dtrain)
np.testing.assert_almost_equal(pred, pred2) Given a model with categorical splits, we should throw an error when the predictor is not |
hcho3
changed the title
Model with categorical splits fail to preserve after round-trip serialization
CPU predictor should throw an error when categorical splits are present
Dec 10, 2020
I will support CPU predictor. |
Closing in favor of #6503 . |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Reproducer:
Log:
Note to others: The categorical split feature is currently in experimental status.
The text was updated successfully, but these errors were encountered: