Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][GPU Logic Bug] "SELECT (<string>)||(<column(decimal)>) FROM <table>" brings Error #1226

Open
qwebug opened this issue Sep 19, 2023 · 3 comments
Labels
bug Something isn't working python Affects Python API

Comments

@qwebug
Copy link

qwebug commented Sep 19, 2023

What happened:

"SELECT (<string>)||(<column(decimal)>) FROM <table>" brings different results, when using CPU and GPU.

What you expected to happen:

It is the same result, when using CPU and GPU.

Minimal Complete Verifiable Example:

import pandas as pd
import dask.dataframe as dd
from dask_sql import Context

c = Context()

df = pd.DataFrame({
    'c0': [0.5113391810437729]
})
t1 = dd.from_pandas(df, npartitions=1)

c.create_table('t1', t1, gpu=False)
c.create_table('t1_gpu', t1, gpu=True)

print('CPU Result:')
result1= c.sql("SELECT ('A')||(t1.c0) FROM t1").compute()
print(result1)

print('GPU Result:')
result2= c.sql("SELECT ('A')||(t1_gpu.c0) FROM t1_gpu").compute()
print(result2)

Result:

CPU Result:
    Utf8("A") || t1.c0
0  A0.5113391810437729
GPU Result:
  Utf8("A") || t1_gpu.c0
0           A0.511339181

Anything else we need to know?:

Environment:

@qwebug qwebug added bug Something isn't working needs triage Awaiting triage by a dask-sql maintainer labels Sep 19, 2023
@charlesbluca
Copy link
Collaborator

Trying out your reproducer with latest main gives me an error 😕 looks like at some point between now and 2023.6.0 our logical plan has changed such that we skip the casting of the non-string column:

# 2023.6.0
Projection: Utf8("A") || CAST(t1.c0 AS Utf8)
  TableScan: t1 projection=[c0]

# main
Projection: Utf8("A") || t1.c0
  TableScan: t1 projection=[c0]

Leading to errors in the binary operation; cc @jdye64 if you have any capacity to look into this. As for the original issue, it seems like that generally comes down to difference in the behavior of cast operations on CPU/GPU, as the following shows the same issue:

print('CPU Result:')
result1= c.sql("SELECT CAST(c0 AS STRING) FROM t1").compute()
print(result1)

print('GPU Result:')
result2= c.sql("SELECT CAST(c0 AS STRING) FROM t1_gpu").compute()
print(result2)

Can look into that, would you mind modifying your issue description / title to reflect this?

@charlesbluca charlesbluca added python Affects Python API and removed needs triage Awaiting triage by a dask-sql maintainer labels Oct 25, 2023
@qwebug
Copy link
Author

qwebug commented Nov 29, 2023

Thanks for your confirmation.
We look forward to your replies about bug fixes.

@qwebug qwebug changed the title [BUG][Logic Bug] "SELECT (<string>)||(<column(decimal)>) FROM <table>" brings Error [BUG][GPU Logic Bug] "SELECT (<string>)||(<column(decimal)>) FROM <table>" brings Error Jan 22, 2024
@qwebug
Copy link
Author

qwebug commented Jun 5, 2024

This problem came up at dask-sql version: 2023.6.0 .
And it has been fixed at dask-sql version: 2024.3.0, after my verification.
Thanks to the developers for their contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Affects Python API
Projects
None yet
Development

No branches or pull requests

2 participants