Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column with object dtype get converted to string when selecting the column #11117

Closed
felix0097 opened this issue May 13, 2024 · 1 comment
Closed
Labels
needs triage Needs a response from a contributor

Comments

@felix0097
Copy link

Describe the issue:
If one selects a column from a dask.dataframe with an object dtype (e.g. a python list), it get converted to string[pyarrow] dtype. This behaviour does not occur when using plain pandas, there the object dtype is preserved. This behaviour only happens when using pandas 2.*, with pandas 1.* everything works as expected.

Minimal Complete Verifiable Example:

import dask.dataframe as dd
import pandas as pd
import numpy as np

x = np.zeros((20, 10))
df = pd.DataFrame({"X": x.tolist()})
ddf = dd.from_pandas(df)
print("df['X'].dtype: ")
print(df["X"].dtype)  # returns "X" has dtype object
print()

print("ddf['X'].dtype")
print(ddf["X"].dtype)  # returns dtype string / the lists get converted to a string 
print()

print("ddf['X'].compute().dtype")
print(ddf["X"].compute().dtype)  # returns dtype string / the lists get converted to a string
print()

Environment:

  • Dask version: 2024.5.0
  • Pandas version: 2.2.2
  • Python version: 3.11
  • Operating System: Ubuntu 22.04.4 LTS
  • Install method (conda, pip, source): pip
@github-actions github-actions bot added the needs triage Needs a response from a contributor label May 13, 2024
@phofl
Copy link
Collaborator

phofl commented May 13, 2024

Hi, thanks for your report. This actually happens on conversion in from_pandas, see #10139 for more context. This is unfortunate at the moment, sorry for the bad ux here. you can disable this through disabling the convert string option explained in that issue

closing here

@phofl phofl closed this as completed May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Needs a response from a contributor
Projects
None yet
Development

No branches or pull requests

2 participants