A crash issue happens during fuzzing test #537

baltsers · 2022-05-12T04:47:27Z

What did you do?

We did a fuzzing test on ultrajson, a crash issue happened.

What did you expect to happen?

python should not crash with any inputs

What actually happened?

Starting program: /root/anaconda3/bin/python ujson_poc.py input
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
PyBytes_Size () at /tmp/build/80754af9/python-split_1631797238431/work/Objects/bytesobject.c:1199
1199    /tmp/build/80754af9/python-split_1631797238431/work/Objects/bytesobject.c: No such file or directory.
(gdb) bt
#0  PyBytes_Size () at /tmp/build/80754af9/python-split_1631797238431/work/Objects/bytesobject.c:1199
#1  0x00007ffff7e25e99 in ?? () from /root/anaconda3/lib/python3.9/site-packages/ujson.cpython-39-x86_64-linux-gnu.so
#2  0x00007ffff7e24b94 in ?? () from /root/anaconda3/lib/python3.9/site-packages/ujson.cpython-39-x86_64-linux-gnu.so
#3  0x00007ffff7e253cf in JSON_EncodeObject () from /root/anaconda3/lib/python3.9/site-packages/ujson.cpython-39-x86_64-linux-gnu.so
#4  0x00007ffff7e26e93 in objToJSON () from /root/anaconda3/lib/python3.9/site-packages/ujson.cpython-39-x86_64-linux-gnu.so
#5  0x00005555556c8714 in cfunction_call () at /tmp/build/80754af9/python-split_1631797238431/work/Objects/methodobject.c:543
#6  0x00005555556989ef in _PyObject_MakeTpCall () at /tmp/build/80754af9/python-split_1631797238431/work/Objects/call.c:191
#7  0x0000555555722d89 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff7ed5758, callable=<optimized out>, tstate=<optimized out>)
    at /tmp/build/80754af9/python-split_1631797238431/work/Include/cpython/abstract.h:116
#8  PyObject_Vectorcall () at /tmp/build/80754af9/python-split_1631797238431/work/Include/cpython/abstract.h:127
#9  call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555914680)
    at /tmp/build/80754af9/python-split_1631797238431/work/Python/ceval.c:5075
#10 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python-split_1631797238431/work/Python/ceval.c:3487
#11 0x00005555556d68e2 in _PyEval_EvalFrame () at /tmp/build/80754af9/python-split_1631797238431/work/Include/internal/pycore_ceval.h:40
#12 _PyEval_EvalCode () at /tmp/build/80754af9/python-split_1631797238431/work/Python/ceval.c:4327
#13 0x0000555555788bac in _PyEval_EvalCodeWithName (qualname=0x0, name=0x0, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=0, 
    kwargs=<optimized out>, kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, 
    _co=<optimized out>) at /tmp/build/80754af9/python-split_1631797238431/work/Python/ceval.c:4359
#14 PyEval_EvalCodeEx () at /tmp/build/80754af9/python-split_1631797238431/work/Python/ceval.c:4375
#15 0x00005555556d79eb in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at /tmp/build/80754af9/python-split_1631797238431/work/Python/ceval.c:826
#16 0x0000555555788c5b in run_eval_code_obj () at /tmp/build/80754af9/python-split_1631797238431/work/Python/pythonrun.c:1219
#17 0x00005555557bc705 in run_mod () at /tmp/build/80754af9/python-split_1631797238431/work/Python/pythonrun.c:1240
#18 0x000055555566160d in pyrun_file (fp=0x555555976b20, filename=0x7ffff6c7dc00, start=<optimized out>, globals=0x7ffff7f01f80, locals=0x7ffff7f01f80, closeit=1, 
    flags=0x7fffffffdfa8) at /tmp/build/80754af9/python-split_1631797238431/work/Python/pythonrun.c:1138
#19 0x00005555557c149f in pyrun_simple_file (flags=0x7fffffffdfa8, closeit=1, filename=0x7ffff6c7dc00, fp=0x555555976b20)
    at /tmp/build/80754af9/python-split_1631797238431/work/Python/pythonrun.c:449
#20 PyRun_SimpleFileExFlags () at /tmp/build/80754af9/python-split_1631797238431/work/Python/pythonrun.c:482
#21 0x00005555557c1c7f in pymain_run_file (cf=0x7fffffffdfa8, config=0x555555912f90) at /tmp/build/80754af9/python-split_1631797238431/work/Modules/main.c:379
#22 pymain_run_python (exitcode=0x7fffffffdfa0) at /tmp/build/80754af9/python-split_1631797238431/work/Modules/main.c:604
#23 Py_RunMain () at /tmp/build/80754af9/python-split_1631797238431/work/Modules/main.c:683
#24 0x00005555557c1d79 in Py_BytesMain () at /tmp/build/80754af9/python-split_1631797238431/work/Modules/main.c:1129
#25 0x00007ffff703fbf7 in __libc_start_main (main=0x555555669d80 <main>, argc=3, argv=0x7fffffffe198, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffe188) at ../csu/libc-start.c:310
#26 0x0000555555746bc3 in _start ()

What versions are you using?

OS: Ubuntu18.04
Python: 3.9
UltraJSON: latest (the main branch)

Please include code that reproduces the issue.
Code and Input,

The text was updated successfully, but these errors were encountered:

JustAnotherArchivist · 2022-05-18T20:03:47Z

Equivalent reproducer as a single line:

ujson.dumps({'u26¶1\udddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd\x99000125d2522': 4})

I narrowed it down to this:

ujson.dumps({'\udddd': 1})

Dumping the string on its own, in a list, or as a dict value gives a surrogate-related UnicodeEncodeError. This suggests that the error isn't properly propagated on dict key encoding.

JustAnotherArchivist · 2022-05-18T21:21:57Z

I think this happens because the return value of PyUnicode_AsUTF8String in Dict_iterNext on conversion to a C string is never checked. It returns NULL because the surrogate can't be encoded, and then later in Dict_iterGetName, this causes a NULL pointer dereferencing. The issue is effectively fixed by #530 since that allows for surrogates in strings. However, that PR does not add error checks, so there could still be other scenarios that would trigger the same crash.

JustAnotherArchivist · 2022-05-20T16:54:30Z

I looked a bit through the CPython code, and apart from running out of memory or similar generic things, I can't think of a way to trigger similar errors with #530 included. Neither private-use nor unallocated Unicode codepoints cause any error. So while this is easy enough to fix, I'm not sure how we can test that it's actually fixed after the inclusion of that PR. In the bigger picture of the entire library, that could be a pretty significant problem I think. If we can't find specific ways of triggering errors in every Python C API function call (except the ones guaranteed to always work, like PyUnicode_Check), perhaps it might be worth thinking about a fuzz tester that wraps those calls and randomly pretends they failed. Or maybe something like that already exists? Although I couldn't find anything in a quick search.

bwoodsend · 2022-05-20T20:39:02Z

Just to confirm, you want an if PyUnicode_AsUTF8String() output is NULL: abort check in there but once #530 is merged, we'll have no way to test that if branch?

I suppose if you were really determined to get that test coverage you could do something ugly like (note I haven't tested this at all):

#ifndef DEBUG
#define PyUnicode_AsUTF8String_ PyUnicode_AsUTF8String
#else

bool sabotage_PyUnicode_AsUTF8String = false;

PyObject *PyUnicode_AsUTF8String_(PyObject *unicode)
{
  if (sabotage_PyUnicode_AsUTF8String) {
    return NULL;
  }
  return PyUnicode_AsUTF8String(unicode);
}
#endif

Then replace PyUnicode_AsUTF8String() with PyUnicode_AsUTF8String_() throughout and have a test which temporarily sets sabotage_PyUnicode_AsUTF8String to true.

JustAnotherArchivist · 2022-05-20T21:07:41Z

Yes, that's correct. At least no easy way to cause an error specifically on that call as far as I can tell. Causing errors randomly would probably be possible with ulimit to simulate running out of memory. But controlling that to a specific PyUnicode_AsUTF8String (or PyUnicode_AsEncodedString after #530) call would be ... challenging and likely prone to break.

That kind of code is more or less what I was thinking, yeah, albeit not just for this particular function; there are other calls that have the same general problem (cf. #505 (comment)). Also, as you wrote it, it probably wouldn't suffice. If the test code can only control the sabotage wholesale from Python, a call to e.g. ujson.dumps will be sabotaged entirely. So it would still be impossible to test an encoding error on a dict value because the key encoding would already fail. That's why I'm thinking more in the direction of fuzzing.

JustAnotherArchivist · 2022-05-29T08:06:15Z

I looked into the C API error fuzzing idea over the past week and also asked in the Python community IRC about it. I couldn't find anything like it, nor could anyone point me to something, and I think many C extension projects could benefit from a generic tool for that; I'm sure there are numerous segfaults lurking out there. I have a PoC, but actually implementing it and getting it to a somewhat usable state will take a while.

I think we can consider this issue fixed by #530 though since it does resolve the problem of surrogates in dict keys causing a segfault.

This allows surrogates anywhere in the input, compatible with the json module from the standard library. This also refactors two interfaces: - The `PyUnicode` to `char*` conversion is moved into its own function, separated from the `JSONTypeContext` handling, so it can be reused for other things in the future (e.g. indentation and separators) which don't have a type context. - Converting the `char*` output to a Python string with surrogates intact requires the string length for `PyUnicode_Decode` & Co. While `strlen` could be used, the length is already known inside the encoder, so the encoder function now also takes an extra `size_t` pointer argument to return that and no longer NUL-terminates the string. This also permits output that contains NUL bytes (even though that would be invalid JSON), e.g. if an object's `__json__` method return value were to contain them. Fixes ultrajson#156 Fixes ultrajson#447 Fixes ultrajson#537 Supersedes ultrajson#284

hugovk added the bug Something isn't working label May 12, 2022

JustAnotherArchivist mentioned this issue May 30, 2022

Fix handling of surrogates on encoding #530

Merged

hugovk closed this as completed in #530 Jun 1, 2022

JustAnotherArchivist mentioned this issue Jul 10, 2022

Add separators encoding parameter #557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A crash issue happens during fuzzing test #537

A crash issue happens during fuzzing test #537

baltsers commented May 12, 2022 •

edited by hugovk

JustAnotherArchivist commented May 18, 2022

JustAnotherArchivist commented May 18, 2022

JustAnotherArchivist commented May 20, 2022

bwoodsend commented May 20, 2022

JustAnotherArchivist commented May 20, 2022

JustAnotherArchivist commented May 29, 2022

A crash issue happens during fuzzing test #537

A crash issue happens during fuzzing test #537

Comments

baltsers commented May 12, 2022 • edited by hugovk

What did you do?

What did you expect to happen?

What actually happened?

What versions are you using?

JustAnotherArchivist commented May 18, 2022

JustAnotherArchivist commented May 18, 2022

JustAnotherArchivist commented May 20, 2022

bwoodsend commented May 20, 2022

JustAnotherArchivist commented May 20, 2022

JustAnotherArchivist commented May 29, 2022

baltsers commented May 12, 2022 •

edited by hugovk