-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Support auto converting integer/other dtypes to supported dtypes during training of estimators #4477
Comments
This may be an edge case in the dtype conversion utilities across estimators, as I also see this with RandomForestClassifier import cuml
import cudf
df = cudf.DataFrame({
"x1": [0,1,2],
"x2": [-3,2,5],
"y": [0, 1, 2]
})
clf2 = cuml.ensemble.RandomForestClassifier()
print(clf2.fit(df[["x1", "x2"]], df["y"]))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_8028/2005697754.py in <module>
9
10 clf2 = cuml.ensemble.RandomForestClassifier()
---> 11 print(clf2.fit(df[["x1", "x2"]], df["y"]))
~/conda/envs/rapids-22.02-snow/lib/python3.8/contextlib.py in inner(*args, **kwds)
73 def inner(*args, **kwds):
74 with self._recreate_cm():
---> 75 return func(*args, **kwds)
76 return inner
77
~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
407 target_val=target_val)
408
--> 409 return func(*args, **kwargs)
410
411 @wraps(func)
cuml/ensemble/randomforestclassifier.pyx in cuml.ensemble.randomforestclassifier.RandomForestClassifier.fit()
~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_set(*args, **kwargs)
565
566 # Call the function
--> 567 ret_val = func(*args, **kwargs)
568
569 return cm.process_return(ret_val)
cuml/ensemble/randomforest_common.pyx in cuml.ensemble.randomforest_common.BaseRandomForestModel._dataset_setup_for_fit()
~/conda/envs/rapids-22.02-snow/lib/python3.8/contextlib.py in inner(*args, **kwds)
73 def inner(*args, **kwds):
74 with self._recreate_cm():
---> 75 return func(*args, **kwds)
76 return inner
77
~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner(*args, **kwargs)
358 def inner(*args, **kwargs):
359 with self._recreate_cm(func, args):
--> 360 return func(*args, **kwargs)
361
362 return inner
~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/common/input_utils.py in input_to_cuml_array(X, order, deepcopy, check_dtype, convert_to_dtype, safe_dtype_conversion, check_cols, check_rows, fail_on_order, force_contiguous)
388 type_str = X_m.dtype
389 del X_m
--> 390 raise TypeError("Expected input to be of type in " +
391 str(check_dtype) + " but got " + str(type_str))
392
TypeError: Expected input to be of type in [dtype('float32'), dtype('float64')] but got int64 |
Currently, the behavior of automatic dtype conversion is to convert |
This issue has been labeled |
This issue has been labeled |
This came up again in the context of using various estimators/transformers in Pipelines.
I think I agree. I could imagine throwing a warning about dtype conversion and then letting the user configure away from the default as needed. Some of this must be happening already with dataframe inputs for The value of formalizing this and having things "just work" out-of-the-box with is pretty high. Feels like some tech-debt that would also improve the UX. import cudf
from cuml.common.input_utils import input_to_cuml_array
import numpy as np
df = cudf.DataFrame({f"a{x}": range(50) for x in range(5)}) # int64 dtypes
df["a1"] = df["a1"].astype("float32")
X_m, n_rows, n_cols, dtype = input_to_cuml_array(
df, check_dtype=[np.float32, np.float64]
)
dtype
dtype('float64') |
cuml.decomposition.PCA and cuml.decomposition.PCA currently fail if all columns are integers and succeeds if at least one column is a float. We should be robust to all integer input dataframes.
The text was updated successfully, but these errors were encountered: