New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unchecked buffer overflows (CVE-2021-45958). #504
Conversation
I really hope that |
Codecov Report
@@ Coverage Diff @@
## main #504 +/- ##
==========================================
+ Coverage 88.96% 89.03% +0.06%
==========================================
Files 6 6
Lines 1685 1723 +38
==========================================
+ Hits 1499 1534 +35
- Misses 186 189 +3
Continue to review full report at Codecov.
|
Ahh phew. I've been spared having to dig around MS docs looking for the MSVC equivalent. |
@JustAnotherArchivist Now's your chance to see if you can dig out any more segfaults. |
Wonderful, thanks for this! So I first looked at why #503 no longer happens. And to be honest, the reason for that seems ... not very great. Here's why. For every escaped string, there are now two buffer reservation calls. One is made before the quote is written to the buffer, the other after, but the length calculation accounts for the starting quote both times. This means that the reservation call inside ... unless you disable Indentation with lists can still overrun the buffer even in the normal config though: The code mentions a worst-case representation of doubles in a few places, and I think that could also be a potential way to trigger a segfault via lists/dicts. However, I'm not sure how to actually get such a double to play around with it. I tried a few values that I thought should have very long string representations but couldn't get anywhere near 256 bytes. To further my confusion, the Another problematic scenario that comes to mind is that the indentation level as well as the indentation depth are defined as |
def test_dump_huge_indent(): | ||
ujson.encode({"a": True}, indent=65539) | ||
|
||
|
||
def test_dump_long_string(): | ||
ujson.dumps(["aaaa", "\x00" * 10921]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a good idea to test broader than this, like light fuzz tests with slightly varying indentation depths and string lengths. That way, future changes that shift these things in minor ways (e.g. spaces after key and item separators by default) wouldn't render these tests immediately irrelevant.
Likewise, the longest double I can produce is 23 bytes. If I try using infinite precision types like >>> ujson.dumps(-1.000000000000001e100)
'-1.000000000000001e+100'
@hugovk How sentimental are you feeling about that
Grr, back to the drawing board then... |
Not sentimental at all. Are you proposing removing the flag and keeping the current default? For example: -#ifdef JSON_NO_EXTRA_WHITESPACE
if (enc->indent)
{
Buffer_AppendCharUnchecked (enc, ' ');
}
-#else
- Buffer_AppendCharUnchecked (enc, ' ');
-#endif |
In general, things to consider:
When removing things, it's always good to raise deprecation warnings in a release (or several) first, then remove with a major bump. Ideally deprecations should be around for some time, but for a security issue we can do it quickly. |
I agree with removing |
Eww not black! I hate black. |
8450986
to
f0e3aec
Compare
I've introduced a fuzz test but I haven't made it part of the test suite or CI. Any preference as to what I do with it? It crunches all permutations of about 150 random objects a second on my PC so you can get quite good coverage by giving it say a minute on CI? When it has failed it normally fails within the first few objects anyway. |
I can't think of a way to overrun the buffer now – apart from the potential I still don't like that the code relies on the |
Add a few extra memory reserve calls to account for the extra space that indentation needs. These kinds of memory issues are hard to spot because the buffer is resized in powers of 2 meaning that a miscalculation would only show any symptoms if the required buffer size is estimated to be just below a 2 power but is actually just above. Add a debug mode which replaces the 2 power scheme with reserving only the memory explicitly requested and adds some overflow checks.
f0e3aec
to
86e5fa7
Compare
I take it that you're referring to lines like this where we deliberately overestimate the size of the string to include the comma (or possibly the newline) that will come after it? Yeah, it's not pretty. I notice now that a couple of them are redundant but honestly I think that strengthening the test suite will keep the ugliness from becoming regressions. |
Yes, that's what I'm referring to. And yeah, also some of the other reservations are bigger than needed (like the |
If I reduce that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've introduced a fuzz test but I haven't made it part of the test suite or CI. Any preference as to what I do with it? It crunches all permutations of about 150 random objects a second on my PC so you can get quite good coverage by giving it say a minute on CI? When it has failed it normally fails within the first few objects anyway.
Sounds good, let's put it on the CI in its own workflow. A single ubuntu-latest
and 3.10 should be enough?
Unsetting it can lead to seg-faults. I don't think it's worth having to fix and then test this undocumented permutation.
86e5fa7
to
3b31a77
Compare
3b31a77
to
6e7eeab
Compare
Oh, I think I see why now. That By the way, the removal of the |
default=(0, 1), | ||
action=ListOption, | ||
help="Sets the ensure_ascii option to ujson.dumps(). " | ||
"May be 0 or 1 or 0,1 to testboth.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"May be 0 or 1 or 0,1 to testboth.", | |
"May be 0 or 1 or 0,1 to test both.", |
Ughh, I hate C memory |
python -m pip install -U pip | ||
python -m pip install . | ||
env: | ||
CFLAGS: '-DDEBUG' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we might want to do in testing this is to change the initial size of the buffer in objToJSON to something much, much smaller. This will slow it down due to the repeated resizing of the buffer, but means we'll spend much more time near the limit of the buffer.
import ujson | ||
|
||
|
||
class FuzzGenerator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to implement this all ourselves rather than using hypothesis?
Is there anything blocking the merge? This vulnerability ranks high on severity scale: |
Yeah, this PR doesn't fix the overflows yet, cf. my last comment. |
@JustAnotherArchivist Would you be willing to take over this one? I lack both the time and the knowledge of C to do this effectively. |
@bwoodsend Yeah, sure. Plenty of the changes here are fine, e.g. the testing and the extra whitespace removal. Shall I just start from your branch, make my changes in separate commits, and create a PR with everything in the end? |
Thanks. Just do whatever's easiest (which probably is to branch off my branch as you say). If my commit authorship gets clobbered in the process then that's fine. |
Really appreciate all the work being put in here! All of our builds are currently blocked due to the security warning, and would love to avoid replacing ujson if we don't have to. Is there a timeline on this fix? |
I can't provide an ETA, though I hope to finish it soon. I have made the necessary changes locally, but verification that it is safe is still ongoing. |
Replacement PR is now up: #519 It includes all commits from this PR (although not exactly due to rebasing onto main). |
Fixes #334,
fixes #501,
fixes #502,
fixes #503.
#402 also passes ok with this change but it worked before it too. I could only reproduce the issue by
git checkout 2.0.2
so I vote that we close that one too.Changes proposed in this pull request: