Implement basic COALESCE functionality #823

charlesbluca · 2022-11-23T16:14:54Z

Could we throw in a comment here giving the reasoning for this override (IIUC differences in nullable "object" columns between cuDF and pandas)?

ChrisJar · 2022-11-30T19:09:59Z

@ayushdg @charlesbluca This line is now failing on gpu with the newest dask-sql environment. Specifically,

SELECT COALESCE(SUM(b), 'why', 2.2) FROM df

throws:

ValueError: could not convert string to float: 'why'

This doesn't fail on CPU nor does it fail on the roughly equivalent query

SELECT COALESCE(NULL, 'why', 2.2) FROM df

Any Idea what might be happening? Could this be due to a change to cudf?

Taking a look right now

Could you list the cuDF conda packages you're using (assuming this is using conda packages and not source)? I pulled in the latest 22.12 nightlies and wasn't able to reproduce:

# packages in environment at /raid/charlesb/mambaforge/envs/basic-coalesce: # # Name Version Build Channel cudf 22.12.00a221130 cuda_11_py39_geb271044c2_307 rapidsai-nightly dask-cudf 22.12.00a221130 cuda_11_py39_geb271044c2_307 rapidsai-nightly libcudf 22.12.00a221130 cuda11_geb271044c2_307 rapidsai-nightly

Yep here they are:

cudf 22.12.00a221130 cuda_11_py39_geb271044c2_307 rapidsai-nightly dask-cudf 22.12.00a221130 cuda_11_py39_geb271044c2_307 rapidsai-nightly libcudf 22.12.00a221130 cuda11_geb271044c2_307 rapidsai-nightly

And here's my full environment

# packages in environment at /raid/cjarrett/miniconda3/envs/dask-sql-11-30: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge _py-xgboost-mutex 2.0 cpu_0 conda-forge adagio 0.2.4 pyhd8ed1ab_0 conda-forge alabaster 0.7.12 py_0 conda-forge alembic 1.8.1 pyhd8ed1ab_0 conda-forge antlr-python-runtime 4.11.1 pyhd8ed1ab_0 conda-forge antlr4-python3-runtime 4.11.1 pyh1a96a4e_0 conda-forge anyio 3.6.2 pyhd8ed1ab_0 conda-forge appdirs 1.4.4 pyh9f0ad1d_0 conda-forge arrow-cpp 9.0.0 py39hd3ccb9b_2_cpu conda-forge attrs 22.1.0 pyh71513ae_1 conda-forge aws-c-cal 0.5.11 h95a6274_0 conda-forge aws-c-common 0.6.2 h7f98852_0 conda-forge aws-c-event-stream 0.2.7 h3541f99_13 conda-forge aws-c-io 0.10.5 hfb6a706_0 conda-forge aws-checksums 0.1.11 ha31a3da_7 conda-forge aws-sdk-cpp 1.8.186 hecaee15_4 conda-forge babel 2.11.0 pyhd8ed1ab_0 conda-forge backports 1.0 pyhd8ed1ab_3 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge bcrypt 3.2.2 py39hb9d737c_1 conda-forge blinker 1.5 pyhd8ed1ab_0 conda-forge bokeh 2.4.3 pyhd8ed1ab_3 conda-forge brotli 1.0.9 h166bdaf_8 conda-forge brotli-bin 1.0.9 h166bdaf_8 conda-forge brotlipy 0.7.0 py39hb9d737c_1005 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2022.9.24 ha878542_0 conda-forge cachetools 5.2.0 pyhd8ed1ab_0 conda-forge certifi 2022.9.24 pyhd8ed1ab_0 conda-forge cffi 1.15.1 py39he91dace_2 conda-forge cfgv 3.3.1 pyhd8ed1ab_0 conda-forge charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge ciso8601 2.2.0 py39hb9d737c_4 conda-forge click 8.1.3 unix_pyhd8ed1ab_2 conda-forge cloudpickle 2.2.0 pyhd8ed1ab_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge configparser 5.3.0 pyhd8ed1ab_0 conda-forge contourpy 1.0.6 py39hf939315_0 conda-forge coverage 6.5.0 py39hb9d737c_1 conda-forge cryptography 38.0.4 py39hd97740a_0 conda-forge cubinlinker 0.2.0 py39h11215e4_1 rapidsai cuda-cccl 11.8.89 0 nvidia cuda-command-line-tools 11.8.0 0 nvidia cuda-compiler 11.8.0 0 nvidia cuda-cudart 11.8.89 0 nvidia cuda-cudart-dev 11.8.89 0 nvidia cuda-cuobjdump 11.8.86 0 nvidia cuda-cupti 11.8.87 0 nvidia cuda-cuxxfilt 11.8.86 0 nvidia cuda-documentation 11.8.86 0 nvidia cuda-driver-dev 11.8.89 0 nvidia cuda-gdb 11.8.86 0 nvidia cuda-libraries 11.8.0 0 nvidia cuda-libraries-dev 11.8.0 0 nvidia cuda-memcheck 11.8.86 0 nvidia cuda-nsight 11.8.86 0 nvidia cuda-nsight-compute 11.8.0 0 nvidia cuda-nvcc 11.8.89 0 nvidia cuda-nvdisasm 11.8.86 0 nvidia cuda-nvml-dev 11.8.86 0 nvidia cuda-nvprof 11.8.87 0 nvidia cuda-nvprune 11.8.86 0 nvidia cuda-nvrtc 11.8.89 0 nvidia cuda-nvrtc-dev 11.8.89 0 nvidia cuda-nvtx 11.8.86 0 nvidia cuda-nvvp 11.8.87 0 nvidia cuda-profiler-api 11.8.86 0 nvidia cuda-python 11.8.0 py39h3fd9d12_0 nvidia cuda-sanitizer-api 11.8.86 0 nvidia cuda-toolkit 11.8.0 0 nvidia cuda-tools 11.8.0 0 nvidia cuda-visual-tools 11.8.0 0 nvidia cudatoolkit 11.5.1 hcf5317a_9 nvidia cudf 22.12.00a221130 cuda_11_py39_geb271044c2_307 rapidsai-nightly cuml 22.12.00a221130 cuda11_py39_gb962396dc_51 rapidsai-nightly cupy 11.3.0 py39hc3c280e_1 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge cytoolz 0.12.0 py39hb9d737c_1 conda-forge dask 2022.11.1 pyhd8ed1ab_0 conda-forge dask-core 2022.11.1 pyhd8ed1ab_0 conda-forge dask-cuda 22.12.00a221130 py39_g55375b8_33 rapidsai-nightly dask-cudf 22.12.00a221130 cuda_11_py39_geb271044c2_307 rapidsai-nightly dask-sql 2022.8.0+99.g73366d6.dirty pypi_0 pypi databricks-cli 0.17.3 pyhd8ed1ab_0 conda-forge deap 1.3.3 py39h4661b88_1 conda-forge distlib 0.3.6 pyhd8ed1ab_0 conda-forge distributed 2022.11.1 pyhd8ed1ab_0 conda-forge dlpack 0.5 h9c3ff4c_0 conda-forge docker-py 6.0.0 pyhd8ed1ab_0 conda-forge docutils 0.19 py39hf3d152e_1 conda-forge entrypoints 0.4 pyhd8ed1ab_0 conda-forge exceptiongroup 1.0.4 pyhd8ed1ab_0 conda-forge execnet 1.9.0 pyhd8ed1ab_0 conda-forge faiss-proc 1.0.0 cuda rapidsai fastapi 0.88.0 pyhd8ed1ab_0 conda-forge fastavro 1.7.0 py39hb9d737c_0 conda-forge fastrlock 0.8 py39h5a03fae_3 conda-forge filelock 3.8.0 pyhd8ed1ab_0 conda-forge flask 2.2.2 pyhd8ed1ab_0 conda-forge fonttools 4.38.0 py39hb9d737c_1 conda-forge freetype 2.12.1 hca18f0e_1 conda-forge fs 2.4.15 pyhd8ed1ab_0 conda-forge fsspec 2022.11.0 pyhd8ed1ab_0 conda-forge fugue 0.7.3 pyhd8ed1ab_0 conda-forge fugue-sql-antlr 0.1.1 pyhd8ed1ab_0 conda-forge future 0.18.2 pyhd8ed1ab_6 conda-forge gds-tools 1.4.0.31 0 nvidia gflags 2.2.2 he1b5a44_1004 conda-forge gitdb 4.0.10 pyhd8ed1ab_0 conda-forge gitpython 3.1.29 pyhd8ed1ab_0 conda-forge glog 0.6.0 h6f12383_0 conda-forge greenlet 2.0.1 py39h5a03fae_0 conda-forge grpc-cpp 1.47.1 hbad87ad_6 conda-forge gunicorn 20.1.0 py39hf3d152e_3 conda-forge h11 0.14.0 pyhd8ed1ab_0 conda-forge heapdict 1.0.1 py_0 conda-forge identify 2.5.9 pyhd8ed1ab_0 conda-forge idna 3.4 pyhd8ed1ab_0 conda-forge imagesize 1.4.1 pyhd8ed1ab_0 conda-forge importlib-metadata 5.1.0 pyha770c72_0 conda-forge importlib_resources 5.10.0 pyhd8ed1ab_0 conda-forge iniconfig 1.1.1 pyh9f0ad1d_0 conda-forge intake 0.6.6 pyhd8ed1ab_0 conda-forge itsdangerous 2.1.2 pyhd8ed1ab_0 conda-forge jinja2 3.1.2 pyhd8ed1ab_1 conda-forge joblib 1.2.0 pyhd8ed1ab_0 conda-forge jpeg 9e h166bdaf_2 conda-forge jsonschema 4.17.3 pyhd8ed1ab_0 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.4 py39hf939315_1 conda-forge krb5 1.19.3 h3790be6_0 conda-forge lcms2 2.14 h6ed2654_0 conda-forge ld_impl_linux-64 2.39 hcc3a1bd_1 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libabseil 20220623.0 cxx17_h48a1fff_5 conda-forge libblas 3.9.0 16_linux64_openblas conda-forge libbrotlicommon 1.0.9 h166bdaf_8 conda-forge libbrotlidec 1.0.9 h166bdaf_8 conda-forge libbrotlienc 1.0.9 h166bdaf_8 conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libcrc32c 1.1.2 h9c3ff4c_0 conda-forge libcublas 11.11.3.6 0 nvidia libcublas-dev 11.11.3.6 0 nvidia libcudf 22.12.00a221130 cuda11_geb271044c2_307 rapidsai-nightly libcufft 10.9.0.58 0 nvidia libcufft-dev 10.9.0.58 0 nvidia libcufile 1.4.0.31 0 nvidia libcufile-dev 1.4.0.31 0 nvidia libcuml 22.12.00a221130 cuda11_gb962396dc_51 rapidsai-nightly libcumlprims 22.12.00a221010 cuda11_geaadb5e_2 rapidsai-nightly libcurand 10.3.0.86 0 nvidia libcurand-dev 10.3.0.86 0 nvidia libcurl 7.86.0 h7bff187_1 conda-forge libcusolver 11.4.1.48 0 nvidia libcusolver-dev 11.4.1.48 0 nvidia libcusparse 11.7.5.86 0 nvidia libcusparse-dev 11.7.5.86 0 nvidia libdeflate 1.14 h166bdaf_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.10 h9b69904_4 conda-forge libfaiss 1.7.0 cuda112h5bea7ad_8_cuda conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge libgoogle-cloud 2.1.0 h9ebe8e8_2 conda-forge liblapack 3.9.0 16_linux64_openblas conda-forge libllvm11 11.1.0 he0ac6c6_5 conda-forge libnghttp2 1.47.0 hdcd2b5c_1 conda-forge libnpp 11.8.0.86 0 nvidia libnpp-dev 11.8.0.86 0 nvidia libnsl 2.0.0 h7f98852_0 conda-forge libnvjpeg 11.9.0.86 0 nvidia libnvjpeg-dev 11.9.0.86 0 nvidia libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libpng 1.6.39 h753d276_0 conda-forge libpq 14.5 hd77ab85_1 conda-forge libprotobuf 3.20.2 h6239696_0 conda-forge libraft-distance 22.12.00a221130 cuda11_g11c5105_136 rapidsai-nightly libraft-headers 22.12.00a221130 cuda11_g11c5105_136 rapidsai-nightly libraft-nn 22.12.00a221130 cuda11_g11c5105_136 rapidsai-nightly librmm 22.12.00a221130 cuda11_gda7036aa_57 rapidsai-nightly libsodium 1.0.18 h36c2ea0_1 conda-forge libsqlite 3.40.0 h753d276_0 conda-forge libssh2 1.10.0 haa6b8db_3 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libthrift 0.16.0 h491838f_2 conda-forge libtiff 4.4.0 h55922b4_4 conda-forge libutf8proc 2.8.0 h166bdaf_0 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libwebp-base 1.2.4 h166bdaf_0 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libxgboost 1.6.2dev.rapidsai22.12 cuda_11_0 rapidsai-nightly libzlib 1.2.13 h166bdaf_4 conda-forge lightgbm 3.3.3 py39h5a03fae_1 conda-forge llvmlite 0.39.1 py39h7d9a04d_1 conda-forge locket 1.0.0 pyhd8ed1ab_0 conda-forge lz4 4.0.2 py39h029007f_0 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge mako 1.2.4 pyhd8ed1ab_0 conda-forge markdown 3.4.1 pyhd8ed1ab_0 conda-forge markupsafe 2.1.1 py39hb9d737c_2 conda-forge matplotlib-base 3.6.2 py39hf9fd14e_0 conda-forge maturin 0.14.2 py39h4ef89ea_0 conda-forge mlflow 2.0.1 py39ha39b057_1 conda-forge mock 4.0.3 pyhd8ed1ab_4 conda-forge msgpack-python 1.0.4 py39hf939315_1 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge nccl 2.14.3.1 h0800d71_0 conda-forge ncurses 6.3 h27087fc_1 conda-forge nest-asyncio 1.5.6 pyhd8ed1ab_0 conda-forge nodeenv 1.7.0 pyhd8ed1ab_0 conda-forge nsight-compute 2022.3.0.22 0 nvidia numba 0.56.4 py39h61ddf18_0 conda-forge numpy 1.23.5 py39h3d75532_0 conda-forge nvtx 0.2.3 py39hb9d737c_2 conda-forge oauthlib 3.2.2 pyhd8ed1ab_0 conda-forge openjpeg 2.5.0 h7d73246_1 conda-forge openssl 1.1.1s h166bdaf_0 conda-forge orc 1.7.6 h6c59b99_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge pandas 1.5.2 py39h4661b88_0 conda-forge paramiko 2.12.0 pyhd8ed1ab_0 conda-forge parquet-cpp 1.5.1 2 conda-forge partd 1.3.0 pyhd8ed1ab_0 conda-forge pillow 9.2.0 py39hf3a2cdf_3 conda-forge pip 22.3.1 pyhd8ed1ab_0 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_0 conda-forge platformdirs 2.5.2 pyhd8ed1ab_1 conda-forge pluggy 1.0.0 pyhd8ed1ab_5 conda-forge pre-commit 2.20.0 py39hf3d152e_1 conda-forge prometheus_client 0.15.0 pyhd8ed1ab_0 conda-forge prometheus_flask_exporter 0.21.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.33 pyha770c72_0 conda-forge prompt_toolkit 3.0.33 hd8ed1ab_0 conda-forge protobuf 3.20.2 py39h5a03fae_1 conda-forge psutil 5.9.4 py39hb9d737c_0 conda-forge psycopg2 2.9.3 py39hb9d737c_1 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptxcompiler 0.7.0 py39h1eff087_2 conda-forge pure-sasl 0.6.2 pyhd8ed1ab_0 conda-forge py 1.11.0 pyh6c4a22f_0 conda-forge py-xgboost 1.6.2dev.rapidsai22.12 cuda_11_py39_0 rapidsai-nightly pyarrow 9.0.0 py39hc0775d8_2_cpu conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pydantic 1.10.2 py39hb9d737c_1 conda-forge pygments 2.13.0 pyhd8ed1ab_0 conda-forge pyhive 0.6.5 pyhd8ed1ab_0 conda-forge pyjwt 2.6.0 pyhd8ed1ab_0 conda-forge pylibraft 22.12.00a221130 cuda11_py39_g11c5105_136 rapidsai-nightly pynacl 1.5.0 py39hb9d737c_2 conda-forge pynvml 11.4.1 pyhd8ed1ab_0 conda-forge pyopenssl 22.1.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge pyrsistent 0.19.2 py39hb9d737c_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge pytest 7.2.0 pyhd8ed1ab_2 conda-forge pytest-cov 4.0.0 pyhd8ed1ab_0 conda-forge pytest-xdist 3.0.2 pyhd8ed1ab_0 conda-forge python 3.9.15 h47a2c10_0_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-tzdata 2022.6 pyhd8ed1ab_0 conda-forge python_abi 3.9 3_cp39 conda-forge pytz 2022.6 pyhd8ed1ab_0 conda-forge pytz-deprecation-shim 0.1.0.post0 py39hf3d152e_3 conda-forge pywin32-on-windows 0.1.0 pyh1179c8e_3 conda-forge pyyaml 6.0 py39hb9d737c_5 conda-forge qpd 0.3.3 pyhd8ed1ab_0 conda-forge querystring_parser 1.2.4 py_0 conda-forge raft-dask 22.12.00a221130 cuda11_py39_g11c5105_136 rapidsai-nightly re2 2022.06.01 h27087fc_1 conda-forge readline 8.1.2 h0f457ee_0 conda-forge requests 2.28.1 pyhd8ed1ab_1 conda-forge rmm 22.12.00a221130 cuda11_py39_gda7036aa_57 rapidsai-nightly s2n 1.0.10 h9b69904_0 conda-forge scikit-learn 1.1.3 py39hd5c8da3_1 conda-forge scipy 1.9.3 py39hddc5342_2 conda-forge semantic_version 2.10.0 pyhd8ed1ab_0 conda-forge setuptools 65.5.1 pyhd8ed1ab_0 conda-forge setuptools-rust 1.5.2 pyhd8ed1ab_0 conda-forge shap 0.41.0 py39h1832856_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge slicer 0.0.7 pyhd8ed1ab_0 conda-forge smmap 3.0.5 pyh44b312d_0 conda-forge snappy 1.1.9 hbd366e4_2 conda-forge sniffio 1.3.0 pyhd8ed1ab_0 conda-forge snowballstemmer 2.2.0 pyhd8ed1ab_0 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge spdlog 1.8.5 h4bd325d_1 conda-forge sphinx 5.3.0 pyhd8ed1ab_0 conda-forge sphinxcontrib-applehelp 1.0.2 py_0 conda-forge sphinxcontrib-devhelp 1.0.2 py_0 conda-forge sphinxcontrib-htmlhelp 2.0.0 pyhd8ed1ab_0 conda-forge sphinxcontrib-jsmath 1.0.1 py_0 conda-forge sphinxcontrib-qthelp 1.0.3 py_0 conda-forge sphinxcontrib-serializinghtml 1.1.5 pyhd8ed1ab_2 conda-forge sqlalchemy 1.4.44 py39hb9d737c_0 conda-forge sqlparse 0.4.3 pyhd8ed1ab_0 conda-forge starlette 0.22.0 pyhd8ed1ab_0 conda-forge stopit 1.1.2 py_0 conda-forge tabulate 0.9.0 pyhd8ed1ab_1 conda-forge tblib 1.7.0 pyhd8ed1ab_0 conda-forge threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge thrift 0.17.0 py39h5a03fae_0 conda-forge thrift_sasl 0.4.3 pyhd8ed1ab_2 conda-forge tk 8.6.12 h27826a3_0 conda-forge toml 0.10.2 pyhd8ed1ab_0 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge toolz 0.12.0 pyhd8ed1ab_0 conda-forge tornado 6.1 py39hb9d737c_3 conda-forge tpot 0.11.7 pyhd8ed1ab_1 conda-forge tqdm 4.64.1 pyhd8ed1ab_0 conda-forge treelite 3.0.0 py39hc7ff369_1 conda-forge treelite-runtime 3.0.0 pypi_0 pypi triad 0.7.0 pyhd8ed1ab_0 conda-forge typing-extensions 4.4.0 hd8ed1ab_0 conda-forge typing_extensions 4.4.0 pyha770c72_0 conda-forge tzdata 2022g h191b570_0 conda-forge tzlocal 4.2 py39hf3d152e_2 conda-forge ucx 1.13.1 h538f049_0 conda-forge ucx-proc 1.0.0 gpu rapidsai ucx-py 0.29.00a221129 py39_g707b335_22 rapidsai-nightly ukkonen 1.0.1 py39hf939315_3 conda-forge unicodedata2 15.0.0 py39hb9d737c_0 conda-forge update_checker 0.18.0 pyh9f0ad1d_0 conda-forge urllib3 1.26.13 pyhd8ed1ab_0 conda-forge uvicorn 0.20.0 py39hf3d152e_1 conda-forge virtualenv 20.17.0 py39hf3d152e_0 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge websocket-client 1.4.2 pyhd8ed1ab_0 conda-forge werkzeug 2.2.2 pyhd8ed1ab_0 conda-forge wheel 0.38.4 pyhd8ed1ab_0 conda-forge xgboost 1.6.2dev.rapidsai22.12 cuda_11_py39_0 rapidsai-nightly xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zict 2.2.0 pyhd8ed1ab_0 conda-forge zipp 3.11.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 h6239696_4 conda-forge

Hmm, I still don't seem to be able to reproduce; the only difference in my environment is that it's missing _py-xgboost-mutex, but even installing it manually doesn't seem to impact anything.

Why don't we pull in the latest changes to see if these failures crop up in gpuCI

charlesbluca · 2022-11-23T16:47:52Z

Following up the previous comment, it might be good to add tests using COALESCE on columns rather than only scalars so that we have coverage for that specific case

With 2000932 in, I am now trying to add something like COALESCE(a, b) to this query, but am running into parsing issues; a minimal example of the error I'm getting:

import numpy as np import pandas as pd from dask_sql import Context c = Context() c.create_table("df", pd.DataFrame({ "a": [np.nan, 1, np.nan], "b": [np.nan, np.nan, 2] })) c.sql(""" select coalesce(a, b) as c, coalesce(sum(b), 'why') as d from df """) # ParsingException: Plan("Projection references non-aggregate values: Expression df.a could not be resolved from available columns: SUM(df.b)")

cc @andygrove

-Original file line number
+Diff line change
@@ Expand Up / @@ -457,7 +457,16 @@ def _collect_aggregations( @@
                     filter_backend_col = None
                 try:
-                    aggregation_function = self.AGGREGATION_MAPPING[aggregation_name]
+                    if aggregation_name == "sum" and isinstance(df._meta, pd.DataFrame):
+                        aggregation_function = AggregationSpecification(
+                            dd.Aggregation(
+                                name="custom_sum",
+                                chunk=lambda s: s.sum(min_count=1),
+                                agg=lambda s0: s0.sum(min_count=1),
+                            )
+                        )
+                    else:
+                        aggregation_function = self.AGGREGATION_MAPPING[aggregation_name]
                 except KeyError:
                     try:
                         aggregation_function = context.schema[schema_name].functions[
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -572,6 +572,23 @@ def overlay(self, s, replace, start, length=None): @@
             return s
+    class CoalesceOperation(Operation):
+        def __init__(self):
+            super().__init__(self.coalesce)
+        def coalesce(self, *operands):
+            for operand in operands:
+                if is_frame(operand):
+                    # Check if frame evaluates to nan or NA
+                    if not operand.isnull().all().compute():
+                        return operand
+                    else:
+                        continue
+                if not pd.isna(operand):
+                    return operand
     class ExtractOperation(Operation):
         def __init__(self):
             super().__init__(self.extract)
@@ Expand Down Expand Up / @@ -978,6 +995,7 @@ class RexCallPlugin(BaseRexPlugin): @@
             "substr": SubStringOperation(),
             "substring": SubStringOperation(),
             "initcap": TensorScalarOperation(lambda x: x.str.title(), lambda x: x.title()),
+            "coalesce": CoalesceOperation(),
             "replace": ReplaceOperation(),
             # date/time operations
             "extract": ExtractOperation(),
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -355,6 +355,43 @@ def test_null(c): @@
         assert_eq(df, expected_df)
+    @pytest.mark.parametrize("gpu", [False, pytest.param(True, marks=pytest.mark.gpu)])
+    def test_coalesce(c, gpu):
+        df = dd.from_pandas(pd.DataFrame({"a": [1], "b": [np.nan]}), npartitions=1)
+        c.create_table("df", df, gpu=gpu)
+        df = c.sql(
+            """
+            SELECT
+                COALESCE(3, 5) as c1,
+                COALESCE(NULL, NULL) as c2,
+                COALESCE(NULL, 'hi') as c3,
+                COALESCE(NULL, NULL, 'bye', 5/0) as c4,
+                COALESCE(NULL, 3/2, NULL, 'fly') as c5,
+                COALESCE(SUM(b), 'why', 2.2) as c6,
+                COALESCE(NULL, MEAN(b), MEAN(a), 4/0) as c7
+            FROM df
+            """
+        )
+        expected_df = pd.DataFrame(
+            {
+                "c1": [3],
+                "c2": [np.nan],
+                "c3": ["hi"],
+                "c4": ["bye"],
+                "c5": ["1"],
+                "c6": ["why"],
+                "c7": [1.0],
+            }
+        )
+        df["c2"] = df["c2"].astype("float64")
+        df["c5"] = df["c5"].astype("O")
+        assert_eq(df, expected_df)
+        c.drop_table("df")
     def test_boolean_operations(c):
         df = dd.from_pandas(pd.DataFrame({"b": [1, 0, -1]}), npartitions=1)
         df["b"] = df["b"].apply(
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -37,11 +37,8 @@ @@
 ,
 ,
 ,
-,
 ,
-,
 ,
-,
 ,
 ,
 ,
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement basic COALESCE functionality #823

Diff view

Diff view

There are no files selected for viewing

charlesbluca Nov 23, 2022

ChrisJar Nov 30, 2022 •

edited

ayushdg Nov 30, 2022

charlesbluca Nov 30, 2022

ChrisJar Nov 30, 2022

ChrisJar Nov 30, 2022

charlesbluca Nov 30, 2022

charlesbluca Nov 23, 2022

charlesbluca Nov 23, 2022

Implement basic COALESCE functionality #823

Implement basic COALESCE functionality #823

Diff view

Diff view

There are no files selected for viewing

Choose a reason for hiding this comment

ChrisJar Nov 30, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisJar Nov 30, 2022 •

edited