Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Investigate using native PyArrow backed data types #1520

Open
gsheni opened this issue Sep 9, 2022 · 0 comments
Open

Spike: Investigate using native PyArrow backed data types #1520

gsheni opened this issue Sep 9, 2022 · 0 comments
Labels
spike To generate additional issues and kick off a sprint.

Comments

@gsheni
Copy link
Contributor

gsheni commented Sep 9, 2022

Initial Investigation

Install

pip install pandas==1.5.0rc0
pip install pyarrow --upgrade

Mapping

import pandas as pd 
import pyarrow as pa
from pandas.core.arrays.arrow.dtype import ArrowDtype

PANDAS_DTYPE_TO_ARROWDTYPE = {
    "np.int64": pa.int64(),
    "int64": pa.int64(),
    "Int64": pa.int64(),
    "np.float64": pa.float64(),
    "float64": pa.float64(),
    "Float64Dtype": pa.float64(),
    "np.object": pa.string(),
    "object": pa.string(),
    "string": pa.string(),
    "StringDtype": pa.string(),
    "datetime64[s]": pa.timestamp(unit="s", tz=None),
    "datetime64[s, US/Eastern]": pa.timestamp(unit="s", tz='US/Eastern'),
    "datetime64[s, US/Central]": pa.timestamp(unit="s", tz='US/Central'),
    "datetime64[ms]": pa.timestamp(unit="ms", tz=None),
    "datetime64[ms, US/Eastern]": pa.timestamp(unit="ms", tz='US/Eastern'),
    "datetime64[ms, US/Central]": pa.timestamp(unit="ms", tz='US/Central'),
    "datetime64[us]": pa.timestamp(unit="us", tz=None),
    "datetime64[us, US/Eastern]": pa.timestamp(unit="us", tz='US/Eastern'),
    "datetime64[us, US/Central]": pa.timestamp(unit="us", tz='US/Central'),
    "datetime64[ns]": pa.timestamp(unit="ns", tz=None),
    "datetime64[ns, US/Eastern]": pa.timestamp(unit="ns", tz='US/Eastern'),
    "datetime64[ns, US/Central]": pa.timestamp(unit="ns", tz='US/Central'),
    "np.bool_": pa.bool_(),
    "boolean": pa.bool_(),
    "BooleanDtype": pa.bool_()
}
for k, v in PANDAS_DTYPE_TO_ARROWDTYPE.items():
    PANDAS_DTYPE_TO_ARROWDTYPE[k] = ArrowDtype(v)
@gsheni gsheni added the spike To generate additional issues and kick off a sprint. label Sep 9, 2022
@gsheni gsheni changed the title Spike: Investigate using native PyArrow-backed DataType Spike: Investigate using native PyArrow backed data types Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spike To generate additional issues and kick off a sprint.
Projects
None yet
Development

No branches or pull requests

2 participants