SIGSEGV (Segmentation fault) instead of TypeError on dumping list of 254 or more objects #402

vdmit · 2020-04-10T13:38:09Z

What did you do?

My sample program tries to make dump a list of python classes, that are not JSON-serializable.
I expect an exception on ujson.dump() call.

But, If list size is smaller than 254 elements, it raises TypeError and this is OK.
But when array becomes longer, program crashes:

$ ./ujson-crash-test.py 
list_size: 250
<__main__.ListRecord object at 0x7f8088fedba8> is not JSON serializable
list_size: 251
<__main__.ListRecord object at 0x7f8088febc18> is not JSON serializable
list_size: 252
<__main__.ListRecord object at 0x7f8088febcc0> is not JSON serializable
list_size: 253
<__main__.ListRecord object at 0x7f8088febcf8> is not JSON serializable
list_size: 254
Segmentation fault (core dumped)

What did you expect to happen?

exception TypeError raises

What actually happened?

Program crashes with SIGSEGV instead of throwing an TypeError exception

What versions are you using?

OS: Ubuntu Linux 18.04.4 LTS
Python: 3.6.9
UltraJSON: 2.0.2

Please include code that reproduces the issue.

The best reproductions are self-contained scripts with minimal dependencies.

My almost minimal sample to reproduce this crash:

#!/usr/bin/env python3

import ujson


class ListRecord:
    def __init__(self):
        pass


def dump_to_file(obj, path):
    with open(path, "w") as fd:
        ujson.dump(obj, fd, ensure_ascii=False, sort_keys=True,
                   encode_html_chars=False, escape_forward_slashes=False, indent=2)


def run_crash_test(list_size):
    print("list_size:", list_size)
    records = [ListRecord() for _ in range(list_size)]
    dump_to_file(records, "crash-test-{}.json".format(list_size))


def main():
    for list_size in range(250, 260):
        try:
            run_crash_test(list_size)
        except TypeError as exc:
            # this is expected
            print(exc)
            pass


if __name__ == "__main__":
    main()

I currenly don't have debug symbols for ujson, so coredump is not very informative:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5f3a880 in objToJSON (self=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ./python/objToJSON.c:854
854	./python/objToJSON.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5f3a880 in objToJSON (self=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ./python/objToJSON.c:854
#1  0x2020202020202020 in ?? ()
#2  0x2020202020202020 in ?? ()
#3  0x2020202020202020 in ?? ()
... <same address>
#27 0x2020202020202020 in ?? ()
#28 0x2020202020202020 in ?? ()
#29 0x00007f005d202020 in ?? ()
#30 0x0000000000b64620 in ?? ()
#31 0x00007ffff61bd798 in ?? ()
#32 0x00007ffff7f350a0 in ?? ()
#33 0x00007ffff61e3e88 in ?? ()
#34 0x00007ffff61c49c0 in ?? ()
#35 0x00007ffff6174cf0 in ?? ()
#36 0x0000000000b64638 in ?? ()
#37 0xffffffffffffffff in ?? ()
#38 0x0000000000000000 in ?? ()

The text was updated successfully, but these errors were encountered:

jeffallen · 2020-05-01T15:06:33Z

I wonder if I've ran across the same thing, when trying to round-trip the sample.json file from the distribution:

#!/usr/bin/env python

import sys
import ujson as json

def main():
    s = sys.stdin.read()
    try:
        x1 = json.loads(s)
        x2 = json.dumps(x1, indent=True)
        json.loads(x2)
    except ValueError as e:
        # we expect malformed input to cause ValueErrors
        # any other exception or crash is interesting
        print(repr(e))


if __name__ == '__main__':
    main()

I ran that like this: python3 crash.js < tests/sample.json and got a crash. It goes away if you remove indent=True.

I am using ujson 2.0.3, installed via pip3 install --user ujson.

bauman · 2020-05-01T19:38:19Z

I feel like this is a close answer but I can't figure out why it lets you get to 255 instead of segfaulting on 65. If I had to guess, your example from 65 up to 256 is just a standard heap buffer overflow back up into some other memory still owned by your process until you touch a memory segment which is definitely not yours. Weird that it's on such a nice round number though :-(. Really not sure.

Putting some notes here

theencode of the big list starts here

ultrajson/lib/ultrajsonenc.c

Line 593 in 7c60d47

    
           static void encode(JSOBJ obj, JSONObjectEncoder *enc, const char *name, size_t cbName)

then each value (something that fails to encode) gets recursively hit on L685 (so nothing is on the encoder and the error is set).

Regardless of what happneed to error, ujson loops each item through the encoder calls this function to realloc some space on the encoder (if needed) on L615

which macros over to this guy
Buffer_Realloc at L115

requests to reallocs the encoder buffer to at least 2 bytes more than whatever you used to have.
L118 just doubles the previous size of the your heap memory.

When you roll around your loop and hit 2^64 (or whatever SYS_MAX is defined on your system), your size value will be 0. The realloc ends up freeing your memory back to the kernel.

the next roll around the loop (back at L685) will add a comma using the Buffer_AppendCharUnchecked to the encoder string obj->offset (some non-zero number) of your alloc'd 0 byte buffer. I guess it's up to your OS on how lenient it is and how fast the kernel reclaims your free'd memory before it kills you?

I think this is a rough simulation of what's going on in
https://github.com/ultrajson/ultrajson/blob/7c60d4721f60da284bd24017e12679935e97e99f/lib/ultrajsonenc.c

Any thoughts?

#include <stdio.h>
#include <stddef.h>

int main() {
    size_t wanted_buffer = 2;  

    //initial encoder
    char *str;

    str = (char *) malloc(wanted_buffer);

    // every encoder loop

    int i = 0;
    // simulating you with 256 or more items
    for (i=0; i<256; i++){ 
        wanted_buffer *= 2; // simulating math at 118  oof the TODO on 113 even calls out this scenario

        str = (char *) realloc(str, wanted_buffer); // simulating realloc at L128
        if (wanted_buffer == 0){
            printf("%s, %d\n", "we've now realloc'd to zero without nulling the pointer", i);
        }
    }
    // the 64th through the loop reallocs to size should overflow and end as zero?
    // really don't understand how it's getting to 255
    // and then we appendUnchecked a comma to a memory region we no longer own
    printf("%s\n", "AppendCharUnchecked on a pointer to 'nowhere' that I don't own ");
   
    // simulating the Buffer_AppendCharUnchecked call adding a comma
    str[256] = ',';  // segfault!
    return (0);
}

I think taking care of that TODO on line 113 for a more sane memory allocation strategy (rather than doubling the heap size every loop) would be the best course of action.

Some decent middle ground to not be potentially realloc forcing a memory copy to a new address and just blindly doubling every time.

Thoughts?

liampauling · 2020-06-17T13:50:14Z

Not sure if it is related but I have started getting a lot of seg faults (daily)

v3.0.0

Jun 17 13:44:18 ip-172-xx-xx-xx kernel: python[6362]: segfault at 0 ip 00007fea61550a69 sp 00007fea5affb6b0 error 4 in ujson.cpython-38-x86_64-linux-gnu.so[7fea61547000+14000]

JustAnotherArchivist · 2022-07-02T22:57:05Z

This was fixed by #519 and #505. See #505 (comment) for an explanation of what (I think) happened here.

liampauling mentioned this issue Aug 15, 2020

Migrate to orjson betcode-org/betfair#328

Closed

JustAnotherArchivist mentioned this issue Feb 6, 2022

Segmentation fault with large indent #501

Closed

bwoodsend mentioned this issue Feb 9, 2022

Fix unchecked buffer overflows (CVE-2021-45958). #504

Closed

JustAnotherArchivist mentioned this issue Feb 12, 2022

Fix exceptions on encoding list or dict elements and non-overflow errors on int handling getting silenced #505

Merged

bwoodsend closed this as completed Jul 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGSEGV (Segmentation fault) instead of TypeError on dumping list of 254 or more objects #402

SIGSEGV (Segmentation fault) instead of TypeError on dumping list of 254 or more objects #402

vdmit commented Apr 10, 2020 •

edited by hugovk

jeffallen commented May 1, 2020 •

edited by hugovk

bauman commented May 1, 2020 •

edited

liampauling commented Jun 17, 2020

JustAnotherArchivist commented Jul 2, 2022

SIGSEGV (Segmentation fault) instead of TypeError on dumping list of 254 or more objects #402

SIGSEGV (Segmentation fault) instead of TypeError on dumping list of 254 or more objects #402

Comments

vdmit commented Apr 10, 2020 • edited by hugovk

What did you do?

What did you expect to happen?

What actually happened?

What versions are you using?

jeffallen commented May 1, 2020 • edited by hugovk

bauman commented May 1, 2020 • edited

liampauling commented Jun 17, 2020

JustAnotherArchivist commented Jul 2, 2022

vdmit commented Apr 10, 2020 •

edited by hugovk

jeffallen commented May 1, 2020 •

edited by hugovk

bauman commented May 1, 2020 •

edited