Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrap around issues on large numerics #543

Closed
wants to merge 4 commits into from
Closed

Fix wrap around issues on large numerics #543

wants to merge 4 commits into from

Conversation

jlusiardi
Copy link

@jlusiardi jlusiardi commented May 27, 2022

There seemed to be a mistake on LLONG_MAX vs ULLONG_MAX (the latter is
the maximum value for an object of type unsigned long long int which
JSUINT64 should be). I fixed that.

I am unsure about how much performance is lost on the check for overflow
on digit check...

Fixes #440

Changes proposed in this pull request:

  • improve wrap around detection
  • add test for wrap arounds
  • make error messages more consistent (add ! at the end)

Joachim Lusiardi and others added 2 commits May 27, 2022 10:14
There seemed to be a mistake on LLONG_MAX vs ULLONG_MAX (the latter is
the maximum value for an object of type `unsigned long long int` which
JSUINT64 should be). I fixed that.

I am unsure about how much performance is lost on the check for overflow
on digit check...
@codecov-commenter
Copy link

Codecov Report

Merging #543 (0157bbe) into main (b300d64) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 0157bbe differs from pull request most recent head 06b2e63. Consider uploading reports for the commit 06b2e63 to get more accurate results

@@            Coverage Diff             @@
##             main     #543      +/-   ##
==========================================
+ Coverage   91.76%   91.77%   +0.01%     
==========================================
  Files           6        6              
  Lines        1821     1824       +3     
==========================================
+ Hits         1671     1674       +3     
  Misses        150      150              
Impacted Files Coverage Δ
lib/ultrajsondec.c 91.66% <100.00%> (ø)
tests/test_ujson.py 99.61% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b300d64...06b2e63. Read the comment docs.

lib/ultrajsondec.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@bwoodsend bwoodsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bear with me here - this stuff makes my head hurt...

@@ -169,7 +171,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_numeric (struct DecoderState *ds
}
else if (intNeg == -1)
{
return SetError(ds, -1, overflowLimit == LLONG_MAX ? "Value is too big!" : "Value is too small");
return SetError(ds, -1, overflowLimit == LLONG_MAX ? "Value is too big" : "Value is too small");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the overflowLimit == LLONG_MAX ? ... switch not redundant because of the if (intNeg == 1) and else if (intNeg == -1) conditions above it?

Copy link
Collaborator

@bwoodsend bwoodsend May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(P.S. If we're ditching !s in error messages then there's that one still lurking on line 170.)

@@ -131,15 +131,17 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_numeric (struct DecoderState *ds
case '8':
case '9':
{
// detect overflow on digit shift
if ((intValue * 10ULL) % 10 != 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would if (intValue > ULLONG_MAX / 10) not be equivalent? It's less headache inducing and probably fractionally more performant since the ULLONG_MAX / 10 will be evaluated at compile time leaving just the > operation at runtime.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's functionally identical. In both cases, an intValue of ULLONG_MAX / 10 = 1844674407370955161 would pass the check, as it should since it does not overflow on the multiplication. It might still overflow on the addition if the next digit is 6 to 9, but that would be caught in the check below. And intValue = ULLONG_MAX / 10 + 1 would fail this added check in both forms.

Copy link
Collaborator

@JustAnotherArchivist JustAnotherArchivist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, integer arithmetic is bad enough as it is, but decoding from a string definitely induces headaches.

{
hasError = 1;
}
else if (intNeg == -1 && intValue > overflowLimit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the removal of this check mean that there is no test against the smaller overflowLimit of LLONG_MAX for negative numbers anymore? Then we could get an overflow further down in the BREAK_INT_LOOP section.

Comment on lines +1060 to +1071
@pytest.mark.parametrize(
"test_input",
[
("33333333303333333333"),
("18446744073709551616"), # 64 bit
("-18446744073709551616"), # 64 bit
("-80888888888888888888"),
],
)
def test_decode_big_numeric(test_input):
with pytest.raises(ujson.JSONDecodeError):
ujson.loads(test_input)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be merged with test_decode_raises, which already has some tests for big ints. I have no strong opinion on whether it should all be in one test or not, though I tend towards yes (since the test code is basically identical, just with different input).

@jlusiardi
Copy link
Author

I'll close this PR and recommend looking at #544 instead.

@jlusiardi jlusiardi closed this May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UltraJSON has inconsistent behavior on parsing large integral values.
5 participants