Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV (Segmentation fault) instead of TypeError on dumping list of 254 or more objects #402

Closed
vdmit opened this issue Apr 10, 2020 · 4 comments

Comments

@vdmit
Copy link

vdmit commented Apr 10, 2020

What did you do?

My sample program tries to make dump a list of python classes, that are not JSON-serializable.
I expect an exception on ujson.dump() call.

But, If list size is smaller than 254 elements, it raises TypeError and this is OK.
But when array becomes longer, program crashes:

$ ./ujson-crash-test.py 
list_size: 250
<__main__.ListRecord object at 0x7f8088fedba8> is not JSON serializable
list_size: 251
<__main__.ListRecord object at 0x7f8088febc18> is not JSON serializable
list_size: 252
<__main__.ListRecord object at 0x7f8088febcc0> is not JSON serializable
list_size: 253
<__main__.ListRecord object at 0x7f8088febcf8> is not JSON serializable
list_size: 254
Segmentation fault (core dumped)

What did you expect to happen?

exception TypeError raises

What actually happened?

Program crashes with SIGSEGV instead of throwing an TypeError exception

What versions are you using?

  • OS: Ubuntu Linux 18.04.4 LTS
  • Python: 3.6.9
  • UltraJSON: 2.0.2

Please include code that reproduces the issue.

The best reproductions are self-contained scripts with minimal dependencies.

My almost minimal sample to reproduce this crash:

#!/usr/bin/env python3

import ujson


class ListRecord:
    def __init__(self):
        pass


def dump_to_file(obj, path):
    with open(path, "w") as fd:
        ujson.dump(obj, fd, ensure_ascii=False, sort_keys=True,
                   encode_html_chars=False, escape_forward_slashes=False, indent=2)


def run_crash_test(list_size):
    print("list_size:", list_size)
    records = [ListRecord() for _ in range(list_size)]
    dump_to_file(records, "crash-test-{}.json".format(list_size))


def main():
    for list_size in range(250, 260):
        try:
            run_crash_test(list_size)
        except TypeError as exc:
            # this is expected
            print(exc)
            pass


if __name__ == "__main__":
    main()

I currenly don't have debug symbols for ujson, so coredump is not very informative:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5f3a880 in objToJSON (self=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ./python/objToJSON.c:854
854	./python/objToJSON.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5f3a880 in objToJSON (self=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ./python/objToJSON.c:854
#1  0x2020202020202020 in ?? ()
#2  0x2020202020202020 in ?? ()
#3  0x2020202020202020 in ?? ()
... <same address>
#27 0x2020202020202020 in ?? ()
#28 0x2020202020202020 in ?? ()
#29 0x00007f005d202020 in ?? ()
#30 0x0000000000b64620 in ?? ()
#31 0x00007ffff61bd798 in ?? ()
#32 0x00007ffff7f350a0 in ?? ()
#33 0x00007ffff61e3e88 in ?? ()
#34 0x00007ffff61c49c0 in ?? ()
#35 0x00007ffff6174cf0 in ?? ()
#36 0x0000000000b64638 in ?? ()
#37 0xffffffffffffffff in ?? ()
#38 0x0000000000000000 in ?? ()
@jeffallen
Copy link

jeffallen commented May 1, 2020

I wonder if I've ran across the same thing, when trying to round-trip the sample.json file from the distribution:

#!/usr/bin/env python

import sys
import ujson as json

def main():
    s = sys.stdin.read()
    try:
        x1 = json.loads(s)
        x2 = json.dumps(x1, indent=True)
        json.loads(x2)
    except ValueError as e:
        # we expect malformed input to cause ValueErrors
        # any other exception or crash is interesting
        print(repr(e))


if __name__ == '__main__':
    main()

I ran that like this: python3 crash.js < tests/sample.json and got a crash. It goes away if you remove indent=True.

I am using ujson 2.0.3, installed via pip3 install --user ujson.

@bauman
Copy link

bauman commented May 1, 2020

I feel like this is a close answer but I can't figure out why it lets you get to 255 instead of segfaulting on 65. If I had to guess, your example from 65 up to 256 is just a standard heap buffer overflow back up into some other memory still owned by your process until you touch a memory segment which is definitely not yours. Weird that it's on such a nice round number though :-(. Really not sure.

Putting some notes here

theencode of the big list starts here

static void encode(JSOBJ obj, JSONObjectEncoder *enc, const char *name, size_t cbName)

then each value (something that fails to encode) gets recursively hit on L685 (so nothing is on the encoder and the error is set).

Regardless of what happneed to error, ujson loops each item through the encoder calls this function to realloc some space on the encoder (if needed) on L615

which macros over to this guy
Buffer_Realloc at L115

requests to reallocs the encoder buffer to at least 2 bytes more than whatever you used to have.
L118 just doubles the previous size of the your heap memory.

When you roll around your loop and hit 2^64 (or whatever SYS_MAX is defined on your system), your size value will be 0. The realloc ends up freeing your memory back to the kernel.

the next roll around the loop (back at L685) will add a comma using the Buffer_AppendCharUnchecked to the encoder string obj->offset (some non-zero number) of your alloc'd 0 byte buffer. I guess it's up to your OS on how lenient it is and how fast the kernel reclaims your free'd memory before it kills you?

I think this is a rough simulation of what's going on in
https://github.com/ultrajson/ultrajson/blob/7c60d4721f60da284bd24017e12679935e97e99f/lib/ultrajsonenc.c

Any thoughts?

#include <stdio.h>
#include <stddef.h>

int main() {
    size_t wanted_buffer = 2;  

    //initial encoder
    char *str;

    str = (char *) malloc(wanted_buffer);

    // every encoder loop

    int i = 0;
    // simulating you with 256 or more items
    for (i=0; i<256; i++){ 
        wanted_buffer *= 2; // simulating math at 118  oof the TODO on 113 even calls out this scenario

        str = (char *) realloc(str, wanted_buffer); // simulating realloc at L128
        if (wanted_buffer == 0){
            printf("%s, %d\n", "we've now realloc'd to zero without nulling the pointer", i);
        }
    }
    // the 64th through the loop reallocs to size should overflow and end as zero?
    // really don't understand how it's getting to 255
    // and then we appendUnchecked a comma to a memory region we no longer own
    printf("%s\n", "AppendCharUnchecked on a pointer to 'nowhere' that I don't own ");
   
    // simulating the Buffer_AppendCharUnchecked call adding a comma
    str[256] = ',';  // segfault!
    return (0);
}

I think taking care of that TODO on line 113 for a more sane memory allocation strategy (rather than doubling the heap size every loop) would be the best course of action.

Some decent middle ground to not be potentially realloc forcing a memory copy to a new address and just blindly doubling every time.

Thoughts?

@liampauling
Copy link

Not sure if it is related but I have started getting a lot of seg faults (daily)

v3.0.0

Jun 17 13:44:18 ip-172-xx-xx-xx kernel: python[6362]: segfault at 0 ip 00007fea61550a69 sp 00007fea5affb6b0 error 4 in ujson.cpython-38-x86_64-linux-gnu.so[7fea61547000+14000]

@JustAnotherArchivist
Copy link
Collaborator

This was fixed by #519 and #505. See #505 (comment) for an explanation of what (I think) happened here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants