Fix wrap around issues on large numerics #543

jlusiardi · 2022-05-27T08:16:39Z

There seemed to be a mistake on LLONG_MAX vs ULLONG_MAX (the latter is
the maximum value for an object of type unsigned long long int which
JSUINT64 should be). I fixed that.

I am unsure about how much performance is lost on the check for overflow
on digit check...

Fixes #440

Changes proposed in this pull request:

improve wrap around detection
add test for wrap arounds
make error messages more consistent (add ! at the end)

There seemed to be a mistake on LLONG_MAX vs ULLONG_MAX (the latter is the maximum value for an object of type `unsigned long long int` which JSUINT64 should be). I fixed that. I am unsure about how much performance is lost on the check for overflow on digit check...

for more information, see https://pre-commit.ci

codecov-commenter · 2022-05-27T08:18:40Z

Codecov Report

Merging #543 (0157bbe) into main (b300d64) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 0157bbe differs from pull request most recent head 06b2e63. Consider uploading reports for the commit 06b2e63 to get more accurate results

@@            Coverage Diff             @@
##             main     #543      +/-   ##
==========================================
+ Coverage   91.76%   91.77%   +0.01%     
==========================================
  Files           6        6              
  Lines        1821     1824       +3     
==========================================
+ Hits         1671     1674       +3     
  Misses        150      150

Impacted Files	Coverage Δ
lib/ultrajsondec.c	`91.66% <100.00%> (ø)`
tests/test_ujson.py	`99.61% <100.00%> (+<0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b300d64...06b2e63. Read the comment docs.

lib/ultrajsondec.c

…_440

bwoodsend

Bear with me here - this stuff makes my head hurt...

bwoodsend · 2022-05-27T19:01:32Z

lib/ultrajsondec.c

@@ -169,7 +171,7 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_numeric (struct DecoderState *ds
          }
          else if (intNeg == -1)
          {
-            return SetError(ds, -1, overflowLimit == LLONG_MAX ? "Value is too big!" : "Value is too small");
+            return SetError(ds, -1, overflowLimit == LLONG_MAX ? "Value is too big" : "Value is too small");
          }


Is the overflowLimit == LLONG_MAX ? ... switch not redundant because of the if (intNeg == 1) and else if (intNeg == -1) conditions above it?

(P.S. If we're ditching !s in error messages then there's that one still lurking on line 170.)

bwoodsend · 2022-05-27T19:05:12Z

lib/ultrajsondec.c

@@ -131,15 +131,17 @@ static FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_numeric (struct DecoderState *ds
      case '8':
      case '9':
      {
+        // detect overflow on digit shift
+        if ((intValue * 10ULL) % 10 != 0)


Would if (intValue > ULLONG_MAX / 10) not be equivalent? It's less headache inducing and probably fractionally more performant since the ULLONG_MAX / 10 will be evaluated at compile time leaving just the > operation at runtime.

Yeah, I think that's functionally identical. In both cases, an intValue of ULLONG_MAX / 10 = 1844674407370955161 would pass the check, as it should since it does not overflow on the multiplication. It might still overflow on the addition if the next digit is 6 to 9, but that would be caught in the check below. And intValue = ULLONG_MAX / 10 + 1 would fail this added check in both forms.

JustAnotherArchivist

Likewise, integer arithmetic is bad enough as it is, but decoding from a string definitely induces headaches.

JustAnotherArchivist · 2022-05-27T19:40:06Z

lib/ultrajsondec.c

-        {
-          hasError = 1;
-        }
-        else if (intNeg == -1 && intValue > overflowLimit)


Doesn't the removal of this check mean that there is no test against the smaller overflowLimit of LLONG_MAX for negative numbers anymore? Then we could get an overflow further down in the BREAK_INT_LOOP section.

JustAnotherArchivist · 2022-05-27T19:40:13Z

tests/test_ujson.py

+@pytest.mark.parametrize(
+    "test_input",
+    [
+        ("33333333303333333333"),
+        ("18446744073709551616"),  # 64 bit
+        ("-18446744073709551616"),  # 64 bit
+        ("-80888888888888888888"),
+    ],
+)
+def test_decode_big_numeric(test_input):
+    with pytest.raises(ujson.JSONDecodeError):
+        ujson.loads(test_input)


This should be merged with test_decode_raises, which already has some tests for big ints. I have no strong opinion on whether it should all be in one test or not, though I tend towards yes (since the test code is basically identical, just with different input).

jlusiardi · 2022-05-30T07:06:47Z

I'll close this PR and recommend looking at #544 instead.

Joachim Lusiardi and others added 2 commits May 27, 2022 10:14

[pre-commit.ci] auto fixes from pre-commit.com hooks

06b2e63

for more information, see https://pre-commit.ci

hugovk reviewed May 27, 2022

View reviewed changes

lib/ultrajsondec.c Outdated Show resolved Hide resolved

Joachim Lusiardi added 2 commits May 27, 2022 10:53

remove exclamation marks

15f3635

Merge branch 'issue_440' of github.com:jlusiardi/ultrajson into issue…

c98d5cb

…_440

bwoodsend reviewed May 27, 2022

View reviewed changes

JustAnotherArchivist reviewed May 27, 2022

View reviewed changes

NaN-git mentioned this pull request May 28, 2022

Integer parsing: always detect overflows #544

Merged

jlusiardi closed this May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wrap around issues on large numerics #543

Fix wrap around issues on large numerics #543

jlusiardi commented May 27, 2022 •

edited by hugovk

codecov-commenter commented May 27, 2022

bwoodsend left a comment

bwoodsend May 27, 2022

bwoodsend May 27, 2022 •

edited

bwoodsend May 27, 2022

JustAnotherArchivist May 27, 2022

JustAnotherArchivist left a comment

JustAnotherArchivist May 27, 2022

JustAnotherArchivist May 27, 2022

jlusiardi commented May 30, 2022

Fix wrap around issues on large numerics #543

Fix wrap around issues on large numerics #543

Conversation

jlusiardi commented May 27, 2022 • edited by hugovk

codecov-commenter commented May 27, 2022

Codecov Report

bwoodsend left a comment

Choose a reason for hiding this comment

bwoodsend May 27, 2022

Choose a reason for hiding this comment

bwoodsend May 27, 2022 • edited

Choose a reason for hiding this comment

bwoodsend May 27, 2022

Choose a reason for hiding this comment

JustAnotherArchivist May 27, 2022

Choose a reason for hiding this comment

JustAnotherArchivist left a comment

Choose a reason for hiding this comment

JustAnotherArchivist May 27, 2022

Choose a reason for hiding this comment

JustAnotherArchivist May 27, 2022

Choose a reason for hiding this comment

jlusiardi commented May 30, 2022

jlusiardi commented May 27, 2022 •

edited by hugovk

bwoodsend May 27, 2022 •

edited