Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series with PostalCode logical type can have float or str elements. #1577

Open
sbadithe opened this issue Nov 17, 2022 · 1 comment
Open

Comments

@sbadithe
Copy link
Contributor

Series with PostalCode logical type can have float or str elements.

For example,

ser = pd.Series([12345, 67890]).astype('category')
ser = ww.init_series(ser, logical_type='PostalCode') 

In the above code block, the elements of the series are floats, but in the following, they are strings:

ser = pd.Series(["12345", "67890"]).astype('category')
ser = ww.init_series(ser, logical_type='PostalCode')

Both are valid initializations. We should decide whether we want to support both data types for the PostalCode logical type.

This issue was discussed here. alteryx/featuretools#2365

@thehomebrewnerd
Copy link
Contributor

Just to add a little more, I think part of the inconsistent/confusing behavior is if you take a series that has numeric values, but not a category dtype, and initialize with the PostalCode logical type, the numeric values get converted to strings:

>>> ser = pd.Series([12345, 67890])
>>> ser = ww.init_series(ser, logical_type='PostalCode')
>>> type(ser[0])
<class 'str'>

But if you start with the same values and set the type as category before WW init, you end up with numeric values instead of strings:

>>> ser = pd.Series([12345, 67890]).astype("category")
>>> ser = ww.init_series(ser, logical_type='PostalCode')
>>> type(ser[0])
<class 'numpy.int64'>

I believe WW should provide a consistent output in this case, so that no matter the input dtype type we have the same type used in the output after WW initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants