Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [GPU Error Bug] "SELECT -2613 FROM <table> HAVING (<TIMESTAMP> NOT BETWEEN <TIMESTAMP> AND MAX(<TIMESTAMP>))" brings Error #1278

Open
qwebug opened this issue Nov 28, 2023 · 1 comment
Labels
bug Something isn't working needs triage Awaiting triage by a dask-sql maintainer

Comments

@qwebug
Copy link

qwebug commented Nov 28, 2023

What happened:

"SELECT -2613 FROM <table> HAVING (<TIMESTAMP> NOT BETWEEN <TIMESTAMP> AND MAX(<TIMESTAMP>)) " brings error, when using GPU.

However it is able to output result, when using CPU.

What you expected to happen:

It will not bring error, when using GPU.

Minimal Complete Verifiable Example:

import pandas as pd
import dask.dataframe as dd
from dask_sql import Context

c = Context()

df0 = pd.DataFrame({
    'c0': ['CAST((12998) AS SMALLINT)'],
})
t0 = dd.from_pandas(df0, npartitions=1)
c.create_table('t0', t0, gpu=False)
c.create_table('t0_gpu', t0, gpu=True)

print('CPU Result::')
result1= c.sql("SELECT -2613 FROM t0 HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND MAX(TIMESTAMP '2006-08-05 07:29:26'))").compute()
print(result1)

print('GPU Result::')
result2= c.sql("SELECT -2613 FROM t0_gpu HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND MAX(TIMESTAMP '2006-08-05 07:29:26'))").compute()
print(result2)

Result:

INFO:numba.cuda.cudadrv.driver:init
CPU Result::
Empty DataFrame
Columns: [Int64(-2613)]
Index: []
GPU Result::
Traceback (most recent call last):
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/utils.py", line 193, in raise_on_meta_error
    yield
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/core.py", line 6470, in elemwise
    meta = partial_by_order(*parts, function=op, other=other)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/utils.py", line 1327, in partial_by_order
    return function(*args2, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 3375, in __array_ufunc__
    ret = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/frame.py", line 1761, in __array_ufunc__
    return _array_ufunc(self, ufunc, method, inputs, kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/utils/utils.py", line 93, in _array_ufunc
    return getattr(obj, op)(other)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py", line 11, in wrapper
    return method(self, *args1, *args2, **kwargs1, **kwargs2)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 3350, in _binaryop
    ColumnAccessor(type(self)._colwise_binop(operands, op)),
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/nvtx/nvtx.py", line 101, in inner
    result = func(*args, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/frame.py", line 1750, in _colwise_binop
    else getattr(operator, fn)(left_column, right_column)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py", line 11, in wrapper
    return method(self, *args1, *args2, **kwargs1, **kwargs2)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/datetime.py", line 405, in _binaryop
    other = self._wrap_binop_normalization(other)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/column.py", line 606, in _wrap_binop_normalization
    other = other.dtype.type(other.item())
ValueError: Converting an integer to a NumPy datetime requires a specified unit

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/bug19/bug19.py", line 19, in <module>
    result2= c.sql("SELECT -2613 FROM t0_gpu HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND MAX(TIMESTAMP '2006-08-05 07:29:26'))").compute()
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/context.py", line 513, in sql
    return self._compute_table_from_rel(rel, return_futures)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/context.py", line 839, in _compute_table_from_rel
    dc = RelConverter.convert(rel, context=self)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/convert.py", line 61, in convert
    df = plugin_instance.convert(rel, context=context)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/logical/project.py", line 28, in convert
    (dc,) = self.assert_inputs(rel, 1, context)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/base.py", line 84, in assert_inputs
    return [RelConverter.convert(input_rel, context) for input_rel in input_rels]
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/base.py", line 84, in <listcomp>
    return [RelConverter.convert(input_rel, context) for input_rel in input_rels]
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/convert.py", line 61, in convert
    df = plugin_instance.convert(rel, context=context)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/logical/filter.py", line 65, in convert
    df_condition = RexConverter.convert(rel, condition, dc, context=context)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/convert.py", line 74, in convert
    df = plugin_instance.convert(rel, rex, dc, context=context)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/core/call.py", line 1129, in convert
    return operation(*operands, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/core/call.py", line 77, in __call__
    return self.f(*operands, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/core/call.py", line 140, in reduce
    return reduce(partial(self.operation, **kwargs), operands)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/core.py", line 617, in __array_ufunc__
    return elemwise(numpy_ufunc, *inputs, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/core.py", line 6469, in elemwise
    with raise_on_meta_error(funcname(op)):
  File "/opt/conda/envs/rapids/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/utils.py", line 214, in raise_on_meta_error
    raise ValueError(msg) from e
ValueError: Metadata inference failed in `greater`.

Original error is below:
------------------------
ValueError('Converting an integer to a NumPy datetime requires a specified unit')

Traceback:
---------
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/utils.py", line 193, in raise_on_meta_error
    yield
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/core.py", line 6470, in elemwise
    meta = partial_by_order(*parts, function=op, other=other)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/utils.py", line 1327, in partial_by_order
    return function(*args2, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 3375, in __array_ufunc__
    ret = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/frame.py", line 1761, in __array_ufunc__
    return _array_ufunc(self, ufunc, method, inputs, kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/utils/utils.py", line 93, in _array_ufunc
    return getattr(obj, op)(other)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py", line 11, in wrapper
    return method(self, *args1, *args2, **kwargs1, **kwargs2)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 3350, in _binaryop
    ColumnAccessor(type(self)._colwise_binop(operands, op)),
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/nvtx/nvtx.py", line 101, in inner
    result = func(*args, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/frame.py", line 1750, in _colwise_binop
    else getattr(operator, fn)(left_column, right_column)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py", line 11, in wrapper
    return method(self, *args1, *args2, **kwargs1, **kwargs2)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/datetime.py", line 405, in _binaryop
    other = self._wrap_binop_normalization(other)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/column.py", line 606, in _wrap_binop_normalization
    other = other.dtype.type(other.item())

Anything else we need to know?:

Environment:

@qwebug qwebug added bug Something isn't working needs triage Awaiting triage by a dask-sql maintainer labels Nov 28, 2023
@charlesbluca
Copy link
Collaborator

Thanks for filing @qwebug! In the past few months I haven't had as much capacity to be active on the issue tracker here so apologize in advance if many of the issues you've filed don't addressed right away, though we always invite external contributors if you have any interest in digging into this 😉 from your example, it's a little difficult to tell what in particular is causing the bug, but it does look like we seem to be passing an object that isn't supported into cuDF's datetime column mechanics.

I'd recommend trying to trim your example down a bit so it's more immediately obvious what the root cause here is. For example, I notice that the table in your example contains a SQL query - is this relevant to the failure you encountered? If not, it might make sense to use more trivial data here, i.e. ['a', 'b', 'c'] to quickly convey "this thing doesn't work on string data in general." It's also difficult to tell what part of the query causes things to break - do things work if we select a column instead of a scalar integer? Or if we choose a different type of scalar? Do things work if we include the MAX operation on one of the timestamps? I think if I were to rewrite your example, it'd probably look something like this (haven't tested any of this locally, purely an illustrative example):

import pandas as pd
from dask_sql import Context

c = Context()

df = pd.DataFrame({
    "a": list("abcde"),
})
c.create_table('df', df, gpu=True)

# this works!
res = c.sql("SELECT -2613 FROM df HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND TIMESTAMP '2006-08-05 07:29:26')").compute()
# this doesn't work!
res = c.sql("SELECT -2613 FROM df HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND MAX(TIMESTAMP '2006-08-05 07:29:26'))").compute()

Finally, I'm interested in if there's any additional context on how you encountered this issue (and the others you've filed)? Some of these queries seem like pretty carefully designed edge cases, which are great for unit testing even if they're sometimes difficult to find the bug in 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting triage by a dask-sql maintainer
Projects
None yet
Development

No branches or pull requests

2 participants