New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A crash issue happens during fuzzing test #537
Comments
Equivalent reproducer as a single line: ujson.dumps({'u26¶1\udddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd\x99000125d2522': 4}) I narrowed it down to this: ujson.dumps({'\udddd': 1}) Dumping the string on its own, in a list, or as a dict value gives a surrogate-related UnicodeEncodeError. This suggests that the error isn't properly propagated on dict key encoding. |
I think this happens because the return value of |
I looked a bit through the CPython code, and apart from running out of memory or similar generic things, I can't think of a way to trigger similar errors with #530 included. Neither private-use nor unallocated Unicode codepoints cause any error. So while this is easy enough to fix, I'm not sure how we can test that it's actually fixed after the inclusion of that PR. In the bigger picture of the entire library, that could be a pretty significant problem I think. If we can't find specific ways of triggering errors in every Python C API function call (except the ones guaranteed to always work, like |
Just to confirm, you want an if PyUnicode_AsUTF8String() output is NULL: abort check in there but once #530 is merged, we'll have no way to test that if branch? I suppose if you were really determined to get that test coverage you could do something ugly like (note I haven't tested this at all): #ifndef DEBUG
#define PyUnicode_AsUTF8String_ PyUnicode_AsUTF8String
#else
bool sabotage_PyUnicode_AsUTF8String = false;
PyObject *PyUnicode_AsUTF8String_(PyObject *unicode)
{
if (sabotage_PyUnicode_AsUTF8String) {
return NULL;
}
return PyUnicode_AsUTF8String(unicode);
}
#endif Then replace |
Yes, that's correct. At least no easy way to cause an error specifically on that call as far as I can tell. Causing errors randomly would probably be possible with That kind of code is more or less what I was thinking, yeah, albeit not just for this particular function; there are other calls that have the same general problem (cf. #505 (comment)). Also, as you wrote it, it probably wouldn't suffice. If the test code can only control the sabotage wholesale from Python, a call to e.g. |
I looked into the C API error fuzzing idea over the past week and also asked in the Python community IRC about it. I couldn't find anything like it, nor could anyone point me to something, and I think many C extension projects could benefit from a generic tool for that; I'm sure there are numerous segfaults lurking out there. I have a PoC, but actually implementing it and getting it to a somewhat usable state will take a while. I think we can consider this issue fixed by #530 though since it does resolve the problem of surrogates in dict keys causing a segfault. |
This allows surrogates anywhere in the input, compatible with the json module from the standard library. This also refactors two interfaces: - The `PyUnicode` to `char*` conversion is moved into its own function, separated from the `JSONTypeContext` handling, so it can be reused for other things in the future (e.g. indentation and separators) which don't have a type context. - Converting the `char*` output to a Python string with surrogates intact requires the string length for `PyUnicode_Decode` & Co. While `strlen` could be used, the length is already known inside the encoder, so the encoder function now also takes an extra `size_t` pointer argument to return that and no longer NUL-terminates the string. This also permits output that contains NUL bytes (even though that would be invalid JSON), e.g. if an object's `__json__` method return value were to contain them. Fixes ultrajson#156 Fixes ultrajson#447 Fixes ultrajson#537 Supersedes ultrajson#284
What did you do?
We did a fuzzing test on ultrajson, a crash issue happened.
What did you expect to happen?
python should not crash with any inputs
What actually happened?
Segmentation fault.
What versions are you using?
Please include code that reproduces the issue.
Code and Input,
The text was updated successfully, but these errors were encountered: