Read a parquet and ensure certain columns are nullable int #8405
Unanswered
cliffplaysdrums
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm reading a parquet with a number of missing values. This results in my int-type columns gettings converted to float. I want to force the read operation to result in
pd.Int64Dtype()
but haven't had any luck.When reading a csv, it's as simple as:
dd.read_csv(urlpath=my_glob, dtype={'my_int_field': pd.Int64Dtype()})
For parquet, I'm using pyarrow as my engine. I've tried passing all combinations of the
kwargs
dict below todd.read_parquet
:The result each time is still 'int64' unlike the csv method which correctly shows 'Int64' (capital I).
Beta Was this translation helpful? Give feedback.
All reactions