Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical support in multi GPU training with dask_cudf #8417

Closed
alexHeu opened this issue Nov 3, 2022 · 3 comments
Closed

Categorical support in multi GPU training with dask_cudf #8417

alexHeu opened this issue Nov 3, 2022 · 3 comments

Comments

@alexHeu
Copy link

alexHeu commented Nov 3, 2022

Hi,

currently I am working on a multi-gpu workflow using dask_cudf together with xgboost.
I am getting the following error:

File /databricks/python/lib/python3.8/site-packages/xgboost/data.py:603, in _cudf_array_interfaces()
601 for i, col in enumerate(data):
602 if is_categorical_dtype(data[col].dtype):
--> 603 codes = cat_codes[i]
604 interface = codes.cuda_array_interface
605 else:

IndexError: list index out of range

Library Versions:
dask_cudf 22.06.00
xgboost 1.6.1

My training code basically looks as follows:

cluster = LocalCUDACluster(n_workers=4)
client = Client(cluster)

df = dask_cudf.read_parquet("path_to_parquet")

for cat_feature in categorical_columns:
    df[cat_feature] = df[cat_feature].astype("category")

df = df.categorize()

X_train = df[df["CalDay"] < validation_start_date][features]
y_train = df[df["CalDay"] < validation_start_date][label]

X_val = df[df["CalDay"] >= validation_start_date][features]
y_val = df[df["CalDay"] >= validation_start_date][label]

params = {
  'tree_method': 'gpu_hist'
}

es = xgb.callback.EarlyStopping(rounds=early_stopping_rounds, save_best=False)

Xy = dxgb.DaskDeviceQuantileDMatrix(client, X_train, y_train, enable_categorical=True)
Xy_valid = dxgb.DaskDMatrix(client, X_val, y_val, enable_categorical=True)

booster = xgb.dask.train(
    client,
    params,
    Xy,
    evals=[(Xy_valid, "Valid")],
    num_boost_round=1000,
    callbacks=[es],
    obj=assym_loss,  # pass the custom objective
    verbose_eval=True
)

Are dask_cudf categoricals currently not supported?

Thanks!

@trivialfis
Copy link
Member

Could you please use 1.7? I believe it's fixed by #8280

@alexHeu
Copy link
Author

alexHeu commented Nov 4, 2022

@trivialfis You are right, 1.7 solved this. Thanks!

@trivialfis
Copy link
Member

Great news!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants