Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf.pandas wrapped numpy arrays not compatible with numba #15694

Open
AjayThorve opened this issue May 7, 2024 · 4 comments
Open

[BUG] cudf.pandas wrapped numpy arrays not compatible with numba #15694

AjayThorve opened this issue May 7, 2024 · 4 comments
Labels
bug Something isn't working cudf.pandas Issues specific to cudf.pandas

Comments

@AjayThorve
Copy link
Member

Describe the bug
When I try to use cudf.pandas with datashader, I get an error Cannot determine Numba type of <class 'cudf.pandas._wrappers.numpy.ndarray'>, full repro below. Datashader actually works directly with cudf, and a cudf.DataFrame is an exceptable data format. But using cudf as a no-code-change accelerator for pandas, this seems to fail.

Steps/Code to reproduce bug

import cudf.pandas
cudf.pandas.install()

import pandas as pd
import numpy as np
import datashader as ds
import datashader.transfer_functions as tf
from datashader.colors import inferno

# Create a small dataset
np.random.seed(0)
n = 1000
df = pd.DataFrame({
    'x': np.random.normal(0, 1, n),
    'y': np.random.normal(0, 1, n)
})

# Create a canvas to render the plot
cvs = ds.Canvas(plot_width=400, plot_height=400)

# Aggregate the points in the canvas
agg = cvs.points(df, 'x', 'y')

# Render the plot using a transfer function
img = tf.shade(agg, cmap=inferno, how='eq_hist')

# Display the plot
img

Output

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at [/home/ajay/miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py](http://localhost:8888/lab/tree/dev/miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py) (66)

File ".[./miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py", line 66](http://localhost:8888/lab/tree/dev/miniconda3/envs/rapids-24.06/lib/python3.11/site-packages/datashader/glyphs/glyph.py#line=65):
    def _compute_bounds(s):
        <source elided>

    @staticmethod
    ^ 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'cudf.pandas._wrappers.numpy.ndarray'>

Expected behavior
Ideally same output as a cudf or a pandas dataframe.

Environment overview (please complete the following information)

  • Environment location: Ubuntu
  • Method of cuDF install: Conda
@AjayThorve AjayThorve added the bug Something isn't working label May 7, 2024
@mroeschke
Copy link
Contributor

Thanks for the report. As your post highlights it looks like the core issue is that cudf.pandas wraps numpy arrays (to use cupy if possible) and this wrapped array is not compatible with numba

In [1]: import cudf.pandas
   ...: cudf.pandas.install()
   ...: 
   ...: import pandas as pd
i
In [2]: import numba

In [3]: @numba.jit(nopython=True, nogil=True)
   ...: def f(x):
   ...:     return x
   ...: 

In [4]: f(pd.Series([1]).values)
---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
Cell In[4], line 1
----> 1 f(pd.Series([1]).values)

File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/numba/core/dispatcher.py:468, in _DispatcherBase._compile_for_args(self, *args, **kws)
    464         msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
    465                f"by the following argument(s):\n{args_str}\n")
    466         e.patch_message(msg)
--> 468     error_rewrite(e, 'typing')
    469 except errors.UnsupportedError as e:
    470     # Something unsupported is present in the user code, add help info
    471     error_rewrite(e, 'unsupported_error')

File ~/miniforge3/envs/cudf-dev/lib/python3.11/site-packages/numba/core/dispatcher.py:409, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    407     raise e
    408 else:
--> 409     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at <ipython-input-3-88a5a2446c8f> (1)

File "<ipython-input-3-88a5a2446c8f>", line 1:
@numba.jit(nopython=True, nogil=True)
^ 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'cudf.pandas._wrappers.numpy.ndarray'>

Going to repurpose this issue to be about compatibility with numba.

@mroeschke mroeschke added the cudf.pandas Issues specific to cudf.pandas label May 9, 2024
@mroeschke mroeschke changed the title [BUG] Using cudf.pandas with datashader does not work [BUG] cudf.pandas wrapped numpy arrays not compatible with numba May 9, 2024
@quasiben
Copy link
Member

quasiben commented May 9, 2024

@brandon-b-miller when you have time can you also take a look at how cudf.pandas and numba are interoperating ?

@brandon-b-miller
Copy link
Contributor

There might be a way to write a little numba extension code within cudf.pandas that registers cudf.pandas._wrappers.numpy.ndarray objects as something numba can unbox into a numpy array or cupy array. If that worked we could probably do the registration at import time. I'll investigate.

@brandon-b-miller
Copy link
Contributor

Just a few quick updates here. We took a look at some simple ways of solving this with without too much hacking of numba and didn't come up with a solution we can merge into cuDF in the very immediate term. There's a few more medium term approaches in the form of updates to numba main that may do the trick however. I would like to keep this issue open as we progress and can give more updates here as we have them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf.pandas Issues specific to cudf.pandas
Projects
Status: In Progress
Development

No branches or pull requests

6 participants
@quasiben @vyasr @mroeschke @AjayThorve @brandon-b-miller and others