[BUG] Support auto converting integer/other dtypes to supported dtypes during training of estimators #4477

beckernick · 2022-01-11T20:00:35Z

cuml.decomposition.PCA and cuml.decomposition.PCA currently fail if all columns are integers and succeeds if at least one column is a float. We should be robust to all integer input dataframes.

import cuml
from sklearn.decomposition import PCA as sk_PCA
import cudf

df = cudf.DataFrame({
    "a":[0,1,2],
    "b":[0,1,2],
    "c":[0,10,12]
})

clf = sk_PCA()
print(clf.fit(df.to_pandas()))

clf = cuml.decomposition.PCA()
clf.fit(df)
PCA()
[W] [20:00:12.668792] Warning(`fit`): As of v0.16, PCA invoked without an n_components argument defauts to using min(n_samples, n_features) rather than 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_54110/1317928499.py in <module>
     13 
     14 clf = cuml.decomposition.PCA()
---> 15 clf.fit(df)

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
    407                                 target_val=target_val)
    408 
--> 409                 return func(*args, **kwargs)
    410 
    411         @wraps(func)

cuml/decomposition/pca.pyx in cuml.decomposition.pca.PCA.fit()

~/conda/envs/rapids-22.02-snow/lib/python3.8/contextlib.py in inner(*args, **kwds)
     73         def inner(*args, **kwds):
     74             with self._recreate_cm():
---> 75                 return func(*args, **kwds)
     76         return inner
     77 

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner(*args, **kwargs)
    358         def inner(*args, **kwargs):
    359             with self._recreate_cm(func, args):
--> 360                 return func(*args, **kwargs)
    361 
    362         return inner

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/common/input_utils.py in input_to_cuml_array(X, order, deepcopy, check_dtype, convert_to_dtype, safe_dtype_conversion, check_cols, check_rows, fail_on_order, force_contiguous)
    388             type_str = X_m.dtype
    389             del X_m
--> 390             raise TypeError("Expected input to be of type in " +
    391                             str(check_dtype) + " but got " + str(type_str))
    392 

TypeError: Expected input to be of type in [dtype('float32'), dtype('float64')] but got int64

conda list # packages in environment at /home/nicholasb/conda/envs/rapids-22.02-snow: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge abseil-cpp 20210324.2 h9c3ff4c_0 conda-forge aiohttp 3.8.1 py38h497a2fe_0 conda-forge aiosignal 1.2.0 pyhd8ed1ab_0 conda-forge anyio 3.5.0 py38h578d9bd_0 conda-forge appdirs 1.4.4 pyh9f0ad1d_0 conda-forge argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py38h497a2fe_1 conda-forge arrow-cpp 5.0.0 py38h579a05f_22_cuda conda-forge arrow-cpp-proc 3.0.0 cuda conda-forge asn1crypto 1.4.0 pyh9f0ad1d_0 conda-forge async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge async_generator 1.10 py_0 conda-forge attrs 21.4.0 pyhd8ed1ab_0 conda-forge aws-c-cal 0.5.11 h95a6274_0 conda-forge aws-c-common 0.6.2 h7f98852_0 conda-forge aws-c-event-stream 0.2.7 h3541f99_13 conda-forge aws-c-io 0.10.5 hfb6a706_0 conda-forge aws-checksums 0.1.11 ha31a3da_7 conda-forge aws-sdk-cpp 1.8.186 hb4091e7_3 conda-forge babel 2.9.1 pyh44b312d_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge bleach 4.1.0 pyhd8ed1ab_0 conda-forge blosc 1.21.0 h9c3ff4c_0 conda-forge bokeh 2.4.0 py38h578d9bd_0 conda-forge boost 1.74.0 py38h2b96118_4 conda-forge boost-cpp 1.74.0 h312852a_4 conda-forge brotli 1.0.9 h7f98852_6 conda-forge brotli-bin 1.0.9 h7f98852_6 conda-forge brotlipy 0.7.0 py38h497a2fe_1003 conda-forge brunsli 0.1 h9c3ff4c_0 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge c-blosc2 2.0.4 h5f21a17_1 conda-forge ca-certificates 2021.10.8 ha878542_0 conda-forge cachetools 5.0.0 pyhd8ed1ab_0 conda-forge cairo 1.16.0 h6cf1ce9_1008 conda-forge certifi 2021.10.8 py38h578d9bd_1 conda-forge cffi 1.15.0 py38h3931269_0 conda-forge cfitsio 3.470 hb418390_7 conda-forge charls 2.2.0 h9c3ff4c_0 conda-forge charset-normalizer 2.0.10 pyhd8ed1ab_0 conda-forge click 8.0.3 py38h578d9bd_1 conda-forge click-plugins 1.1.1 py_0 conda-forge cligj 0.7.2 pyhd8ed1ab_1 conda-forge cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge colorcet 3.0.0 pyhd8ed1ab_0 conda-forge cryptography 35.0.0 py38h3e25421_2 conda-forge cucim 22.02.00a220111 cuda_11_py38_gab8e6a4_31 rapidsai-nightly cuda-python 11.5.0 py38h3fd9d12_0 nvidia cudatoolkit 11.2.72 h2bc3f7f_0 nvidia cudf 22.02.00a220111 cuda_11_py38_g951f630dfe_250 rapidsai-nightly cudf_kafka 22.02.00a220111 py38_g951f630dfe_250 rapidsai-nightly cugraph 22.02.00a220111 cuda11_py38_g6883cc19_66 rapidsai-nightly cuml 22.02.00a220111 cuda11_py38_g416ce61a4_84 rapidsai-nightly cupy 9.6.0 py38h177b0fd_0 conda-forge curl 7.81.0 h2574ce0_0 conda-forge cusignal 22.02.00a220111 py38_g6a02566_9 rapidsai-nightly cuspatial 22.02.00a220110 py38_g55280e3_14 rapidsai-nightly custreamz 22.02.00a220111 py38_g951f630dfe_250 rapidsai-nightly cuxfilter 22.02.00a220111 py38_g7c4dc24_7 rapidsai-nightly cycler 0.11.0 pyhd8ed1ab_0 conda-forge cyrus-sasl 2.1.27 h230043b_5 conda-forge cytoolz 0.11.2 py38h497a2fe_1 conda-forge dask 2021.11.2 pyhd8ed1ab_0 conda-forge dask-core 2021.11.2 pyhd8ed1ab_0 conda-forge dask-cuda 22.02.00a220111 py38_45 rapidsai-nightly dask-cudf 22.02.00a220111 cuda_11_py38_g951f630dfe_250 rapidsai-nightly dask-snowflake 0.0.2 pyhd8ed1ab_0 conda-forge datashader 0.11.1 pyh9f0ad1d_0 conda-forge datashape 0.5.4 py_1 conda-forge debugpy 1.5.1 py38h709712a_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge distributed 2021.11.2 py38h578d9bd_0 conda-forge dlpack 0.5 h9c3ff4c_0 conda-forge entrypoints 0.3 py38h32f6830_1002 conda-forge expat 2.4.2 h9c3ff4c_0 conda-forge faiss-proc 1.0.0 cuda conda-forge fastavro 1.4.9 py38h497a2fe_0 conda-forge fastrlock 0.8 py38h709712a_1 conda-forge fiona 1.8.20 py38hbb147eb_2 conda-forge flit-core 3.6.0 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.13.1 hba837de_1005 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.28.5 py38h497a2fe_0 conda-forge freetype 2.10.4 h0708190_1 conda-forge freexl 1.0.6 h7f98852_0 conda-forge frozenlist 1.2.0 py38h497a2fe_1 conda-forge fsspec 2021.11.1 pyhd8ed1ab_0 conda-forge gdal 3.3.2 py38h81a01a0_3 conda-forge geopandas 0.9.0 pyhd8ed1ab_1 conda-forge geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge geos 3.9.1 h9c3ff4c_2 conda-forge geotiff 1.7.0 h08e826d_2 conda-forge gettext 0.19.8.1 h73d1719_1008 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge giflib 5.2.1 h36c2ea0_2 conda-forge glog 0.5.0 h48cff8f_0 conda-forge gmp 6.2.1 h58526e2_0 conda-forge greenlet 1.1.2 py38h709712a_1 conda-forge grpc-cpp 1.42.0 ha1441d3_1 conda-forge hdf4 4.2.15 h10796ff_3 conda-forge hdf5 1.12.1 nompi_h2750804_103 conda-forge heapdict 1.0.1 py_0 conda-forge icu 68.2 h9c3ff4c_0 conda-forge idna 3.1 pyhd3deb0d_0 conda-forge imagecodecs 2021.8.26 py38hb5ce8f7_1 conda-forge imageio 2.13.5 pyh239f2a4_0 conda-forge importlib-metadata 4.10.0 py38h578d9bd_0 conda-forge importlib_metadata 4.10.0 hd8ed1ab_0 conda-forge importlib_resources 5.4.0 pyhd8ed1ab_0 conda-forge ipykernel 6.6.1 py38he5a9106_0 conda-forge ipython 7.31.0 py38h578d9bd_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.6.5 pyhd8ed1ab_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge jedi 0.18.1 py38h578d9bd_0 conda-forge jinja2 3.0.3 pyhd8ed1ab_0 conda-forge joblib 1.1.0 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge json-c 0.15 h98cffda_0 conda-forge json5 0.9.5 pyh9f0ad1d_0 conda-forge jsonschema 4.3.3 pyhd8ed1ab_0 conda-forge jupyter-server-proxy 3.2.0 pyhd8ed1ab_0 conda-forge jupyter_client 7.1.0 pyhd8ed1ab_0 conda-forge jupyter_core 4.9.1 py38h578d9bd_1 conda-forge jupyter_server 1.13.1 pyhd8ed1ab_0 conda-forge jupyterlab 3.2.6 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_server 2.10.3 pyhd8ed1ab_0 conda-forge jupyterlab_widgets 1.0.2 pyhd8ed1ab_0 conda-forge jxrlib 1.1 h7f98852_2 conda-forge kealib 1.4.14 h87e4c3c_3 conda-forge kiwisolver 1.3.2 py38h1fd1430_1 conda-forge krb5 1.19.2 hcc1bbae_3 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge lerc 3.0 h9c3ff4c_0 conda-forge libaec 1.0.6 h9c3ff4c_0 conda-forge libblas 3.9.0 12_linux64_openblas conda-forge libbrotlicommon 1.0.9 h7f98852_6 conda-forge libbrotlidec 1.0.9 h7f98852_6 conda-forge libbrotlienc 1.0.9 h7f98852_6 conda-forge libcblas 3.9.0 12_linux64_openblas conda-forge libcucim 22.02.00a220111 cuda11_gab8e6a4_31 rapidsai-nightly libcudf 22.02.00a220111 cuda11_g951f630dfe_250 rapidsai-nightly libcudf_kafka 22.02.00a220111 g951f630dfe_250 rapidsai-nightly libcugraph 22.02.00a220111 cuda11_g6883cc19_66 rapidsai-nightly libcugraph_etl 22.02.00a220111 cuda11_g6883cc19_66 rapidsai-nightly libcuml 22.02.00a220111 cuda11_g416ce61a4_84 rapidsai-nightly libcumlprims 22.02.00a220106 cuda11_g06a42b1_14 rapidsai-nightly libcurl 7.81.0 h2574ce0_0 conda-forge libcusolver 11.3.2.107 hc875929_0 nvidia libcuspatial 22.02.00a220110 cuda11_g55280e3_14 rapidsai-nightly libdap4 3.20.6 hd7c4107_2 conda-forge libdeflate 1.8 h7f98852_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.10 h9b69904_4 conda-forge libfaiss 1.7.0 cuda112h5bea7ad_8_cuda conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 11.2.0 h1d223b6_11 conda-forge libgcrypt 1.9.4 h7f98852_0 conda-forge libgdal 3.3.2 h6acdded_3 conda-forge libgfortran-ng 11.2.0 h69a702a_11 conda-forge libgfortran5 11.2.0 h5c6108e_11 conda-forge libglib 2.70.2 h174f98d_1 conda-forge libgomp 11.2.0 h1d223b6_11 conda-forge libgpg-error 1.42 h9c3ff4c_0 conda-forge libgsasl 1.10.0 h5b4c23d_0 conda-forge libhwloc 2.3.0 h5e5b7d1_1 conda-forge libiconv 1.16 h516909a_0 conda-forge libkml 1.3.0 h238a007_1014 conda-forge liblapack 3.9.0 12_linux64_openblas conda-forge libllvm11 11.1.0 hf817b99_2 conda-forge libnetcdf 4.8.1 nompi_hb3fd0d9_101 conda-forge libnghttp2 1.43.0 h812cca2_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libntlm 1.4 h7f98852_1002 conda-forge libopenblas 0.3.18 pthreads_h8fe5266_0 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libpq 13.5 hd57d9b9_1 conda-forge libprotobuf 3.19.2 h780b84a_0 conda-forge librdkafka 1.7.0 hc49e61c_1 conda-forge librmm 22.02.00a220111 cuda11_g5a239d2_25 rapidsai-nightly librttopo 1.1.0 h1185371_6 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libspatialindex 1.9.3 h9c3ff4c_4 conda-forge libspatialite 5.0.1 h5cf074c_8 conda-forge libssh2 1.10.0 ha56f1ee_2 conda-forge libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge libthrift 0.15.0 he6d91bd_1 conda-forge libtiff 4.3.0 h6f004c6_2 conda-forge libutf8proc 2.7.0 h7f98852_0 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libuv 1.42.0 h7f98852_0 conda-forge libwebp 1.2.1 h3452ae3_0 conda-forge libwebp-base 1.2.1 h7f98852_0 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libxgboost 1.5.0dev.rapidsai22.02 cuda11.2_0 rapidsai-nightly libxml2 2.9.12 h72842e0_0 conda-forge libzip 1.8.0 h4de3113_1 conda-forge libzlib 1.2.11 h36c2ea0_1013 conda-forge libzopfli 1.0.3 h9c3ff4c_0 conda-forge llvmlite 0.37.0 py38h4630a5e_1 conda-forge locket 0.2.0 py_2 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge mapclassify 2.4.3 pyhd8ed1ab_0 conda-forge markdown 3.3.6 pyhd8ed1ab_0 conda-forge markupsafe 2.0.1 py38h497a2fe_1 conda-forge matplotlib-base 3.5.1 py38hf4fb855_0 conda-forge matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge mistune 0.8.4 py38h497a2fe_1005 conda-forge msgpack-python 1.0.3 py38h1fd1430_0 conda-forge multidict 5.2.0 py38h497a2fe_1 conda-forge multipledispatch 0.6.0 py_0 conda-forge munch 2.5.0 py_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge nbclassic 0.3.4 pyhd8ed1ab_0 conda-forge nbclient 0.5.9 pyhd8ed1ab_0 conda-forge nbconvert 6.4.0 py38h578d9bd_0 conda-forge nbformat 5.1.3 pyhd8ed1ab_0 conda-forge nccl 2.11.4.1 hdc17891_0 conda-forge ncurses 6.2 h58526e2_4 conda-forge nest-asyncio 1.5.4 pyhd8ed1ab_0 conda-forge networkx 2.6.3 pyhd8ed1ab_1 conda-forge nodejs 14.17.4 h92b4a50_0 conda-forge notebook 6.4.6 pyha770c72_0 conda-forge nspr 4.32 h9c3ff4c_1 conda-forge nss 3.74 hb5efdd6_0 conda-forge numba 0.54.1 py38h4bf6c61_0 conda-forge numpy 1.20.3 py38h9894fe3_1 conda-forge nvtx 0.2.3 py38h497a2fe_1 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 1.1.1l h7f98852_0 conda-forge orc 1.7.2 h1be678f_0 conda-forge oscrypto 1.2.1 pyhd3deb0d_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge pandas 1.3.5 py38h43a58ef_0 conda-forge pandoc 2.16.2 h7f98852_0 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge panel 0.12.4 pyhd8ed1ab_0 conda-forge param 1.12.0 pyh6c4a22f_0 conda-forge parquet-cpp 1.5.1 1 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge partd 1.2.0 pyhd8ed1ab_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pexpect 4.8.0 py38h32f6830_1 conda-forge pickleshare 0.7.5 py38h32f6830_1002 conda-forge pillow 8.4.0 py38h8e6f84c_0 conda-forge pip 21.3.1 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge pooch 1.5.2 pyhd8ed1ab_0 conda-forge poppler 21.09.0 ha39eefc_3 conda-forge poppler-data 0.4.11 hd8ed1ab_0 conda-forge postgresql 13.5 h2510834_1 conda-forge proj 8.1.0 h277dcde_1 conda-forge prometheus_client 0.12.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.24 pyha770c72_0 conda-forge protobuf 3.19.2 py38h709712a_0 conda-forge psutil 5.9.0 py38h497a2fe_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptxcompiler 0.2.0 py38hb739d79_0 rapidsai-nightly ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge py-xgboost 1.5.0dev.rapidsai22.02 cuda11.2py38_0 rapidsai-nightly pyarrow 5.0.0 py38ha746e9d_22_cuda conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pycryptodomex 3.12.0 py38h497a2fe_0 conda-forge pyct 0.4.6 py_0 conda-forge pyct-core 0.4.6 py_0 conda-forge pydeck 0.5.0 pyh9f0ad1d_0 conda-forge pyee 8.1.0 pyh9f0ad1d_0 conda-forge pygments 2.11.2 pyhd8ed1ab_0 conda-forge pyjwt 2.3.0 pyhd8ed1ab_1 conda-forge pylibcugraph 22.02.00a220111 cuda11_py38_g6883cc19_66 rapidsai-nightly pynvml 11.4.1 pyhd8ed1ab_0 conda-forge pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.6 pyhd8ed1ab_0 conda-forge pyppeteer 0.2.6 pyhd8ed1ab_0 conda-forge pyproj 3.1.0 py38h3701b11_4 conda-forge pyrsistent 0.18.0 py38h497a2fe_0 conda-forge pysocks 1.7.1 py38h578d9bd_4 conda-forge python 3.8.12 hb7a2778_2_cpython conda-forge python-confluent-kafka 1.7.0 py38h497a2fe_2 conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.8 2_cp38 conda-forge pytz 2021.3 pyhd8ed1ab_0 conda-forge pyviz_comms 2.1.0 pyhd8ed1ab_0 conda-forge pywavelets 1.2.0 py38h6c62de6_1 conda-forge pyyaml 6.0 py38h497a2fe_3 conda-forge pyzmq 22.3.0 py38h2035c66_1 conda-forge rapids 22.02.00a220111 cuda11.2_py38_g365c37f_104 rapidsai-nightly rapids-xgboost 22.02.00a220111 cuda11.2_py38_g365c37f_104 rapidsai-nightly re2 2021.11.01 h9c3ff4c_0 conda-forge readline 8.1 h46c0cb4_0 conda-forge requests 2.27.1 pyhd8ed1ab_0 conda-forge rmm 22.02.00a220111 cuda11_py38_g5a239d2_25_has_cma rapidsai-nightly rtree 0.9.7 py38h02d302b_3 conda-forge s2n 1.0.10 h9b69904_0 conda-forge scikit-image 0.18.1 py38h51da96c_0 conda-forge scikit-learn 1.0.2 py38h1561384_0 conda-forge scipy 1.7.3 py38h56a6a73_0 conda-forge send2trash 1.8.0 pyhd8ed1ab_0 conda-forge setuptools 60.5.0 py38h578d9bd_0 conda-forge shapely 1.8.0 py38hb7fe4a8_0 conda-forge simpervisor 0.4 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.8 he1b5a44_3 conda-forge sniffio 1.2.0 py38h578d9bd_2 conda-forge snowflake-connector-python 2.7.2 py38h8914348_0 conda-forge snowflake-sqlalchemy 1.3.3 pyhd8ed1ab_0 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge spdlog 1.8.5 h4bd325d_0 conda-forge sqlalchemy 1.4.29 py38h497a2fe_0 conda-forge sqlite 3.37.0 h9cd32fc_0 conda-forge streamz 0.6.3 pyh6c4a22f_0 conda-forge tblib 1.7.0 pyhd8ed1ab_0 conda-forge terminado 0.12.1 py38h578d9bd_1 conda-forge testpath 0.5.0 pyhd8ed1ab_0 conda-forge threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge tifffile 2021.11.2 pyhd8ed1ab_0 conda-forge tiledb 2.3.4 he87e0bf_0 conda-forge tk 8.6.11 h27826a3_1 conda-forge toolz 0.11.2 pyhd8ed1ab_0 conda-forge tornado 6.1 py38h497a2fe_2 conda-forge tqdm 4.62.3 pyhd8ed1ab_0 conda-forge traitlets 5.1.1 pyhd8ed1ab_0 conda-forge treelite 2.1.0 py38hdd725b4_0 conda-forge treelite-runtime 2.1.0 pypi_0 pypi typing-extensions 4.0.1 hd8ed1ab_0 conda-forge typing_extensions 4.0.1 pyha770c72_0 conda-forge tzcode 2021e h7f98852_0 conda-forge tzdata 2021e he74cb21_0 conda-forge ucx 1.11.2+gef2bbcf cuda11.2_0 rapidsai-nightly ucx-proc 1.0.0 gpu rapidsai-nightly ucx-py 0.24.0a220111 py38_gef2bbcf_24 rapidsai-nightly unicodedata2 14.0.0 py38h497a2fe_0 conda-forge urllib3 1.26.8 pyhd8ed1ab_1 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge websocket-client 1.2.3 pyhd8ed1ab_0 conda-forge websockets 9.1 py38h497a2fe_0 conda-forge wheel 0.37.1 pyhd8ed1ab_0 conda-forge widgetsnbextension 3.5.2 py38h578d9bd_1 conda-forge xarray 0.20.2 pyhd8ed1ab_0 conda-forge xerces-c 3.2.3 h9d8b166_3 conda-forge xgboost 1.5.0dev.rapidsai22.02 cuda11.2py38_0 rapidsai-nightly xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.7.2 h7f98852_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h7f98852_1 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h7f98852_2 conda-forge yarl 1.7.2 py38h497a2fe_1 conda-forge zeromq 4.3.4 h9c3ff4c_1 conda-forge zfp 0.5.5 h9c3ff4c_8 conda-forge zict 2.0.0 py_0 conda-forge zipp 3.7.0 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h36c2ea0_1013 conda-forge zstd 1.5.1 ha95c52a_0 conda-forge

beckernick · 2022-01-12T14:11:05Z

This may be an edge case in the dtype conversion utilities across estimators, as I also see this with RandomForestClassifier

import cuml
import cudf

df = cudf.DataFrame({
    "x1": [0,1,2],
    "x2": [-3,2,5],
    "y": [0, 1, 2]
})

clf2 = cuml.ensemble.RandomForestClassifier()
print(clf2.fit(df[["x1", "x2"]], df["y"]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_8028/2005697754.py in <module>
      9 
     10 clf2 = cuml.ensemble.RandomForestClassifier()
---> 11 print(clf2.fit(df[["x1", "x2"]], df["y"]))

~/conda/envs/rapids-22.02-snow/lib/python3.8/contextlib.py in inner(*args, **kwds)
     73         def inner(*args, **kwds):
     74             with self._recreate_cm():
---> 75                 return func(*args, **kwds)
     76         return inner
     77 

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
    407                                 target_val=target_val)
    408 
--> 409                 return func(*args, **kwargs)
    410 
    411         @wraps(func)

cuml/ensemble/randomforestclassifier.pyx in cuml.ensemble.randomforestclassifier.RandomForestClassifier.fit()

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_set(*args, **kwargs)
    565 
    566                 # Call the function
--> 567                 ret_val = func(*args, **kwargs)
    568 
    569             return cm.process_return(ret_val)

cuml/ensemble/randomforest_common.pyx in cuml.ensemble.randomforest_common.BaseRandomForestModel._dataset_setup_for_fit()

~/conda/envs/rapids-22.02-snow/lib/python3.8/contextlib.py in inner(*args, **kwds)
     73         def inner(*args, **kwds):
     74             with self._recreate_cm():
---> 75                 return func(*args, **kwds)
     76         return inner
     77 

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner(*args, **kwargs)
    358         def inner(*args, **kwargs):
    359             with self._recreate_cm(func, args):
--> 360                 return func(*args, **kwargs)
    361 
    362         return inner

~/conda/envs/rapids-22.02-snow/lib/python3.8/site-packages/cuml/common/input_utils.py in input_to_cuml_array(X, order, deepcopy, check_dtype, convert_to_dtype, safe_dtype_conversion, check_cols, check_rows, fail_on_order, force_contiguous)
    388             type_str = X_m.dtype
    389             del X_m
--> 390             raise TypeError("Expected input to be of type in " +
    391                             str(check_dtype) + " but got " + str(type_str))
    392 

TypeError: Expected input to be of type in [dtype('float32'), dtype('float64')] but got int64

dantegd · 2022-01-12T16:17:25Z

Currently, the behavior of automatic dtype conversion is to convert y to the same type of X if they differ, but we do not currently auto convert integer inputs to floating point. It is pretty easy to add, but there are a few considerations, for example should we convert both int64 and int32 to float32 or to their corresponding number of bits? This could also be user configurable, though my initial intuition would be to make things autoconvert to float32 if inputs are all int, what are your thoughts?

github-actions · 2022-02-11T17:02:40Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-05-12T17:13:50Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

beckernick · 2022-11-23T18:55:31Z

This came up again in the context of using various estimators/transformers in Pipelines.

Currently, the behavior of automatic dtype conversion is to convert y to the same type of X if they differ, but we do not currently auto convert integer inputs to floating point. It is pretty easy to add, but there are a few considerations, for example should we convert both int64 and int32 to float32 or to their corresponding number of bits? This could also be user configurable, though my initial intuition would be to make things autoconvert to float32 if inputs are all int, what are your thoughts?

I think I agree. I could imagine throwing a warning about dtype conversion and then letting the user configure away from the default as needed.

Some of this must be happening already with dataframe inputs for X containing mixed dtypes, as we need to create a contiguous buffer. It looks like we're coercing to float64 implicitly in some cases.

The value of formalizing this and having things "just work" out-of-the-box with is pretty high. Feels like some tech-debt that would also improve the UX.

import cudf
from cuml.common.input_utils import input_to_cuml_array
import numpy as np

df = cudf.DataFrame({f"a{x}": range(50) for x in range(5)}) # int64 dtypes
df["a1"] = df["a1"].astype("float32")

X_m, n_rows, n_cols, dtype = input_to_cuml_array(
    df, check_dtype=[np.float32, np.float64]
)
dtype
dtype('float64')

beckernick added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 11, 2022

github-actions bot added this to Needs prioritizing in Bug Squashing Jan 11, 2022

beckernick changed the title ~~[BUG] cuml PCA should be robust to all integer dtype inputs~~ [BUG] cuml estimators should be robust to all integer dtype inputs Jan 12, 2022

dantegd changed the title ~~[BUG] cuml estimators should be robust to all integer dtype inputs~~ [BUG] Support auto converting integer/other dtypes to supported dtypes during training of estimators Jan 12, 2022

github-actions bot added the inactive-30d label Feb 11, 2022

github-actions bot added the inactive-90d label May 12, 2022

dantegd linked a pull request May 14, 2024 that will close this issue

Allow estimators to accept any dtype #5888

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Support auto converting integer/other dtypes to supported dtypes during training of estimators #4477

[BUG] Support auto converting integer/other dtypes to supported dtypes during training of estimators #4477

beckernick commented Jan 11, 2022

beckernick commented Jan 12, 2022

dantegd commented Jan 12, 2022

github-actions bot commented Feb 11, 2022

github-actions bot commented May 12, 2022

beckernick commented Nov 23, 2022 •

edited

[BUG] Support auto converting integer/other dtypes to supported dtypes during training of estimators #4477

[BUG] Support auto converting integer/other dtypes to supported dtypes during training of estimators #4477

Comments

beckernick commented Jan 11, 2022

beckernick commented Jan 12, 2022

dantegd commented Jan 12, 2022

github-actions bot commented Feb 11, 2022

github-actions bot commented May 12, 2022

beckernick commented Nov 23, 2022 • edited

beckernick commented Nov 23, 2022 •

edited