Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand support for non-str dtype("O") in pandas type system #258

Open
JacobHayes opened this issue Jul 25, 2022 · 0 comments
Open

Expand support for non-str dtype("O") in pandas type system #258

JacobHayes opened this issue Jul 25, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@JacobHayes
Copy link
Member

JacobHayes commented Jul 25, 2022

Is your feature request related to a problem? Please describe.

#257 adds support for numpy and pandas TypeSystems, but when converting from a pandas value to the artigraph Type (ie: to_artigraph), series with dtype("O") must be strings despite pandas supporting arbitrary objects. This prevents using complex types as cell values, for example:

df = pd.DataFrame({"A": [[1,2], [3,4]]})

The to_system direction should work better since we know the spec of the cell values, but I think there's an issue or two in how we instantiate/handle them that should be pretty easy to fix.

Describe the solution you'd like
We should expand the type conversions to support more structured types like list[str], dict[str, int], etc. We can't support every python object, but should try to handle those that could be represented as the arti.types and would reasonably be serialized (to json, parquet, etc).

The implementation would likely require some inspection to try and infer a python type hint from the values. Perhaps we can try to construct the type hint and then pass off to python_type_system?

Describe alternatives you've considered
We could require a hint filling in any metadata, but this would be less user friendly and not always easy to specify.

Additional context
Current errors cases:

[ins] In [1]: import pandas as pd
         ...: from arti.types import Float64, Int64, List, Map, String, Struct, Type
         ...: from arti.types.pandas import pandas_type_system

[ins] In [2]: pandas_type_system.to_artigraph(pd.DataFrame({"dict": [{"": 0}], "list": [[0]]}), hints={})
...
NotImplementedError: Non-string object is not supported yet, got values of: {'': 0}

[ins] In [3]: pandas_type_system.to_system(List(element=Struct(fields={"dict": Map(key=String(), value=Int64())})), hints={})
...
NotImplementedError: No TypeSystem(key='pandas', extends=(TypeSystem(key='numpy'),)) adapter for Artigraph type: Map(key=String(), value=Int64()).

[ins] In [4]: pandas_type_system.to_system(List(element=Struct(fields={"list": List(element=Int64())})), hints={})
...
TypeError: 'Series' object is not callable
@JacobHayes JacobHayes added the enhancement New feature or request label Jul 25, 2022
@JacobHayes JacobHayes changed the title Expand support for dtype("O Expand support for non-str dtype("O") in pandas type system Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant