Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nan type drift for np.nan #626

Open
gatorwatt opened this issue Oct 23, 2023 · 3 comments
Open

nan type drift for np.nan #626

gatorwatt opened this issue Oct 23, 2023 · 3 comments
Labels

Comments

@gatorwatt
Copy link

This issue appears to be common for dill and pickle, when serializing and recovering dictionaries populated with np.nan entries drift their data type to the default of float("nan"). Normally this shouldn't be a big deal as they both adhere to IEEE 754, except that python dictionaries have an edge case / incongruity for the two nan types when using nan as a key. Specifically, for the dictionary e.g. a_dict = {np.nan:1234} one can access the value with a_dict[np.nan}, but for the dictionary b_dict = {float("nan"):4321}, attempting to access b_dict[float("nan")] returns a halt bug, and similarly for other dictionary methods like .pop().

Ideally for a serialized dictionary one could be able to retain the nan type such as to retain any peculiarities of this nature.

@mmckerns
Copy link
Member

mmckerns commented Oct 24, 2023

Python 3.8.18 (default, Aug 25 2023, 04:23:37) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> d = {np.nan: 1234, float('nan'): 4321}
>>> d
{nan: 1234, nan: 4321}
>>> d[np.nan]
1234
>>> d[float('nan')]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: nan
>>> 

...and you are saying that pickling a np.nan can get converted to float('nan') after a dump then load? in some cases, or always?

@gatorwatt
Copy link
Author

gatorwatt commented Oct 25, 2023

Yeah to my experience this is pretty universal for dill / pickle, even in different import scenarios. I did a little digging and appears that the Numpy version of np.nan refers to some global representation such that the root of matter is that np.nan is np.nan == True while in more generic floats float("nan") is float("nan") == False, where I believe this disparity is source of several other python edge cases like using nan as a key in a dictionary (which is supported for np.nan but not for float("nan").

After thinking about it, for my specific use case in the Automunge library decided to remove exposure to nan dictionary key scenario and use None in place of nan, so if you want to close this issue I think my concern is resolved by that update.

Thanks

@mmckerns
Copy link
Member

mmckerns commented Oct 26, 2023

Python 3.8.18 (default, Aug 25 2023, 04:23:37) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> import numpy as np
>>> n = dill.copy(np.nan)
>>> m = dill.copy(float('nan'))
>>> n is np.nan
False
>>> m is np.nan
False
>>> import copy
>>> o = copy.deepcopy(np.nan)
>>> o is np.nan
True

I'm going to reopen this issue, as I think the dill.copy should produce a np.nan and not a nan, and this is something that can be corrected for within dill.

@mmckerns mmckerns reopened this Oct 26, 2023
@mmckerns mmckerns added the bug label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants