You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue:
If one selects a column from a dask.dataframe with an object dtype (e.g. a python list), it get converted to string[pyarrow] dtype. This behaviour does not occur when using plain pandas, there the object dtype is preserved. This behaviour only happens when using pandas 2.*, with pandas 1.* everything works as expected.
Minimal Complete Verifiable Example:
importdask.dataframeasddimportpandasaspdimportnumpyasnpx=np.zeros((20, 10))
df=pd.DataFrame({"X": x.tolist()})
ddf=dd.from_pandas(df)
print("df['X'].dtype: ")
print(df["X"].dtype) # returns "X" has dtype objectprint()
print("ddf['X'].dtype")
print(ddf["X"].dtype) # returns dtype string / the lists get converted to a string print()
print("ddf['X'].compute().dtype")
print(ddf["X"].compute().dtype) # returns dtype string / the lists get converted to a stringprint()
Environment:
Dask version: 2024.5.0
Pandas version: 2.2.2
Python version: 3.11
Operating System: Ubuntu 22.04.4 LTS
Install method (conda, pip, source): pip
The text was updated successfully, but these errors were encountered:
Hi, thanks for your report. This actually happens on conversion in from_pandas, see #10139 for more context. This is unfortunate at the moment, sorry for the bad ux here. you can disable this through disabling the convert string option explained in that issue
Describe the issue:
If one selects a column from a dask.dataframe with an object dtype (e.g. a python list), it get converted to string[pyarrow] dtype. This behaviour does not occur when using plain pandas, there the object dtype is preserved. This behaviour only happens when using pandas
2.*
, with pandas1.*
everything works as expected.Minimal Complete Verifiable Example:
Environment:
The text was updated successfully, but these errors were encountered: