You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's not possible to calculate the grouped mean for a very large column of pyarrow type.
importpandasaspdimportnumpyasnpimportpyarrowaspaimportdask.dataframeimportdask.distributeddf=pd.DataFrame({
# A series of ones that is larger than the maximum supported by uint32'a': pd.Series(np.ones(1<<32), dtype=pd.ArrowDtype(pa.uint8())),
# A distribution of values for which to compute the mean for'b': pd.Series(np.linspace(0, 1, 1<<32), dtype=pd.ArrowDtype(pa.float32())),
})
It's not possible to calculate the grouped mean for a very large column of pyarrow type.
It fails to compute with the legacy DataFrame:
I tried using the latest DataFrame API but this operation does not seem to be supported yet:
Pandas seems to execute this workflow without issues:
Environment:
The text was updated successfully, but these errors were encountered: