Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parsing NaN, Infinity and -Infinity #514

Merged
merged 7 commits into from Apr 4, 2022

Conversation

Erotemic
Copy link
Contributor

@Erotemic Erotemic commented Apr 3, 2022

Fixes #146

This ports the changes from pandas (https://github.com/pandas-dev/pandas/pull/30295/files
) mentioned by @WillAyd in #146 back to ujson.

Thus far I've only done a nearly verbatim port of the diff. There seems to be an issue. Currently:

python -c "import ujson; print(ujson.loads('[ true, false,null]'))"

is printing

[True, inf, -inf]

Also

python -c "import ujson; print(ujson.loads('[Infinity, NaN, -Infinity]'))"

is returning

[False, -inf, None]

So there are some details missing from the direct patch. I'll try and check for them, but my C is much worse than my Python, so any insights would be helpful.

EDIT

I've fixed the above issue, although I'm not sure why the ordering of the function pointers in ultrajson.h __JSONObjectDecoder was important. I assume it's a C thing, if anyone wants to tell me why that fixed worked, I'd like to know.

While working on this I discovered a (hidden) bug in the pandas port. When they parse NaN in C, they actually return None instead of NaN. The pandas part after that seems to interpret None as Nan, which is why their test passes:

import pandas as pd
open('foo.json', 'w').write('[NaN]')
pd.read_json('foo.json')
Out[3]: 
    0
0 NaN

But the parser itself is actually wrong

from  pandas._libs import json as pd_ujson
pd_ujson.loads('[NaN]')
Out[10]: [None]

And that can be demonstrated by adding a dict to the list, which prevents the pandas engine from realizing that None should have been a nan. This probably needs to be a pandas bug report (see here).

open('foo.json', 'w').write('[NaN, 2, 3, {}]')
pd.read_json('foo.json')
Out[12]: 
      0
0  None
1     2
2     3
3    {}

But I think I can fix it here. I'm getting a handle on how this is working.

@codecov-commenter
Copy link

codecov-commenter commented Apr 3, 2022

Codecov Report

Merging #514 (2421524) into main (b3c96d0) will increase coverage by 0.37%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #514      +/-   ##
==========================================
+ Coverage   90.03%   90.41%   +0.37%     
==========================================
  Files           6        6              
  Lines        1686     1752      +66     
==========================================
+ Hits         1518     1584      +66     
  Misses        168      168              
Impacted Files Coverage Δ
lib/ultrajsondec.c 91.66% <100.00%> (+0.89%) ⬆️
python/JSONtoObj.c 87.35% <100.00%> (+0.93%) ⬆️
tests/test_ujson.py 99.57% <100.00%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3c96d0...2421524. Read the comment docs.

@Erotemic
Copy link
Contributor Author

Erotemic commented Apr 4, 2022

Note, I had to disable the black pre-commit check because it is failing due to an issue in the click library. Once they issue a patch (which I hope will be soon), I'll revert that.

@hugovk
Copy link
Member

hugovk commented Apr 4, 2022

I merged the Black fix in #515, so if you sync with main it will work now.

@bwoodsend
Copy link
Collaborator

Can we chuck in a couple of failure tests too. i.e. Stuff like:

with pytest.raises(ujson.JSONDecodeError, match="Unexpected .*'Infinity'"):
    ujson.dumps("Infin")

Just so that we know that exceptions do get raised instead of segfaults or garbled output 🙃 .

@Erotemic
Copy link
Contributor Author

Erotemic commented Apr 4, 2022

@hugovk Rebased on main,
@bwoodsend Added more tests for incomplete and trailing output, as well as simple singleton tests

tests/test_ujson.py Outdated Show resolved Hide resolved
@hugovk hugovk changed the title NaN and Inf in loads - Port of Pandas #30295 Support for parsing NaN, Infinity and -Infinity Apr 4, 2022
@hugovk hugovk added enhancement New feature or request changelog: Added For new features labels Apr 4, 2022
Erotemic and others added 2 commits April 4, 2022 15:27
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
@hugovk hugovk changed the title Support for parsing NaN, Infinity and -Infinity Support parsing NaN, Infinity and -Infinity Apr 4, 2022
@hugovk
Copy link
Member

hugovk commented Apr 4, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: Added For new features enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NaN handling differs from other json packages
4 participants