Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] NotImplementedError: The python type string is not implemented (yet) #1225

Open
luzhengyang opened this issue Sep 18, 2023 · 5 comments
Labels
question Further information is requested

Comments

@luzhengyang
Copy link

What is your question?

I keep getting this error when trying to query a table created from dask dataframe reading a csv file. A couple of columns in the csv file are strings. I've tried multiple ways to convert the pyarrow string type but none of them worked and the type remained unchanged. How should I proceed?

df = dd.read_csv("../sales.csv")
print(df.dtypes)

c = Context()
c.create_table("sales", df)
result = c.sql("SELECT * FROM sales").compute()
print(result)

/ArrowFlightService/lib/python3.9/site-packages/dask_sql/mappings.py", line 120, in python_to_sql_type raise NotImplementedError( NotImplementedError: The python type string is not implemented (yet)

@luzhengyang luzhengyang added the question Further information is requested label Sep 18, 2023
@ayushdg
Copy link
Collaborator

ayushdg commented Sep 19, 2023

Thanks for raising the issue @luzhengyang.
Could you also share the dask and dask-sql versions you're using in this example?

@charlesbluca
Copy link
Collaborator

My assumption here is that we're getting bitten by Dask's eager conversion of object columns to pyarrow strings, which we haven't be able to fully support yet (working on this in #1220); are you able to disable this eager conversion with dask.config.set({"dataframe.convert-string": False})? Would be interested in if that unblocks things here for you

@guillaumeeb
Copy link

As discussed in Discourse, the basic documentation example reproduces this error, but disabling eager conversion fixes it.

import dask.datasets
df = dask.datasets.timeseries()
from dask_sql import Context

c = Context()
c.create_table("timeseries", df, persist=True)
result = c.sql("""
    SELECT
        name, SUM(x) AS "sum"
    FROM timeseries
    WHERE x > 0.5
    GROUP BY name
""")
result.compute()

@charlesbluca
Copy link
Collaborator

For now I've disabled eager string conversion in #1260 so that users aren't hit by this breakage by default

@hebian1994
Copy link

can use with PY3.8.19 version, I encounter the above issues when using version 3.9
dask 2023.5.0
dask_sql 2023.11.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants