New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak on encoding errors when the buffer was resized #549
Fix memory leak on encoding errors when the buffer was resized #549
Conversation
`JSON_EncodeObject` returns `NULL` when an error occurs, but without freeing the buffer. This leads to a memory leak when the buffer is internally allocated (because the caller's buffer was insufficient or none was provided at all) and any error occurs. Similarly, `objToJSON` did not clean up the buffer in all error conditions either. This adds the missing buffer free in `JSON_EncodeObject` (iff the buffer was allocated internally) and refactors the error handling in `objToJSON` slightly to also free the buffer when a Python exception occurred without the encoder's `errorMsg` being set.
Codecov Report
@@ Coverage Diff @@
## main #549 +/- ##
==========================================
+ Coverage 91.59% 91.64% +0.05%
==========================================
Files 6 6
Lines 1808 1820 +12
==========================================
+ Hits 1656 1668 +12
Misses 152 152
Continue to review full report at Codecov.
|
Based on Codecov, I guess we don't have any tests that cause a realloc and an error? That would be something to add then I suppose, regardless of testing for the memory leak, since it could hide bugs in the error handling as well. |
That little manual test example you've got put in its own file and invoked as a subproccess doesn't sound too unpleasant.
I've tried doing out of memory testing before and I was amazed at how painful it was. If we limit ourselves to just running these tests on Linux then we can use: ulimit -Sv 200000
python some-test.py which will run the subprocess.run("ulimit -Sv 200000; " + shlex.join([sys.executable, "memory-test.py"]), shell=True) Example memory test fileimport ctypes
import humanize
malloc = ctypes.CDLL("").malloc
malloc.argtypes = [ctypes.c_size_t]
malloc.restype = ctypes.c_void_p
for i in range(30):
print(i, humanize.naturalsize(1 << i), malloc(1 << i)) |
Yeah, memory-anything is really annoying with Python, which isn't really surprising since it's a high-level language.
|
I meant to use For detecting memory leaks, I've used this in the past. I'm pretty uncertain about the choice between rss and vms - from what I've read, neither are really applicable but vms does at least go up and down on |
Hmm, right. We'd still need to write test cases that cover those paths to trigger it though, at which point we might as well use a more exact test. Or maybe I'm still misunderstanding you.
Yeah, I'll play around with |
ed270d9
to
0d1b532
Compare
0d1b532
to
5013dd4
Compare
After some experimenting, it turns out that |
Hmm, memory usage grows slightly (under 2 %) in non-debug mode except on macOS... From local testing, this growth happens only on buffer resizes, does not depend on the number of repetitions, but does change with the input size: # No resizing as it fits into the 64 KiB buffer
$ python3 tests/memory.py 'import functools, ujson; func = functools.partial(ujson.dumps, ["a" * 10000, b""])'; echo $?
VMS before: 20779008, after: 20779008
0
# Resizing, different number of iterations
$ python3 tests/memory.py 'import functools, ujson; func = functools.partial(ujson.dumps, ["a" * 11000, b""])'; echo $?
VMS before: 20779008, after: 21102592
1
$ python3 tests/memory.py 'import functools, ujson; func = functools.partial(ujson.dumps, ["a" * 11000, b""])' 2000; echo $?
VMS before: 20783104, after: 21102592
1
$ python3 tests/memory.py 'import functools, ujson; func = functools.partial(ujson.dumps, ["a" * 11000, b""])' 4000; echo $?
VMS before: 20783104, after: 21102592
1
# Resizing, different size
$ python3 tests/memory.py 'import functools, ujson; func = functools.partial(ujson.dumps, ["a" * 22000, b""])'; echo $?
VMS before: 20779008, after: 21114880
1
$ python3 tests/memory.py 'import functools, ujson; func = functools.partial(ujson.dumps, ["a" * 44000, b""])'; echo $?
VMS before: 20779008, after: 21397504
1 This makes no sense to me, especially since it doesn't happen in debug mode and the differences between debug and non-debug are almost exclusively additional |
The growth is only in VMS, not in RSS. The growth also disappears if I run # Measuring and printing also RSS
$ python3 tests/memory.py 'import functools, ujson; func = functools.partial(ujson.dumps, ["a" * 11000, b""])'; echo $?
VMS before: 20803584, after: 21155840
RSS before: 14282752, after: 14282752
1
# Two `func()` calls at the start of `run`:
$ python3 tests/memory.py 'import functools, ujson; func = functools.partial(ujson.dumps, ["a" * 11000, b""])'; echo $?
VMS before: 21123072, after: 21123072
RSS before: 14385152, after: 14385152
0 I have no idea what this means... |
Yeah, I think this is normal I'm afraid. |
I see. Never heard of USS before, but that sounds very reasonable indeed. Still quite weird under which circumstances this VMS weirdness happens. It's perfectly reproducible, too. ¯\_(ツ)_/¯ USS grows as well but by a much smaller amount (well under 1 %) and regardless of the number of pre-runs and repetitions. VMS/RSS/USSAll with one pre-run of
But yeah, USS seems most reasonable, so I'll switch to that. |
So PyPy fails on testing, various deploy wheel tests fail except on PyPy 3.7 (which we aren't testing directly), the Windows native one due to no psutil even though it's getting installed just a few lines up. Maybe I shouldn't have looked under this particular rock... |
Co-authored-by: Brénainn Woodsend <bwoodsend@gmail.com>
I can reproduce the implied memory leak failure locally. Seeing if I can diagnose it now... |
Hmm, the memory increase seems to be constant irregardless of |
Does running |
It doesn't change it at all. What does make it work is if I put the |
In fact, just twice is enough. diff --git i/tests/memory.py w/tests/memory.py
index 8035bd1..e2f4f71 100644
--- i/tests/memory.py
+++ w/tests/memory.py
@@ -11,7 +11,8 @@ def run(func, n):
pass
# Take a measurement
- before = psutil.Process().memory_full_info().uss
+ for i in range(2):
+ before = psutil.Process().memory_full_info().uss
# Run n times
for i in range(n): |
Hmm, interesting. I also set this up locally now and am getting this:
So a constant +12 KiB with CPython. With PyPy, it varies strongly, but I still get growth with the extra calls, and it doesn't really seem to be smaller. The differences are small enough on my machine that it doesn't fail in either case though. |
Perhaps valgrind is a better way of doing this. It doesn't appear to be very cross platform so again this would probably have to be Linux only but squirting your reproducer code through it does detect the leak (see the 3rd block from below referencing > git checkout main
> UJSON_BUILD_NO_STRIP=1 CFLAGS='-g -O0' python setup.py build_ext --inplace
> echo -e 'import ujson\ntry: ujson.dumps(["a" * 65536, b""]);\nexcept: pass' | valgrind --leak-check=yes python
==945== Memcheck, a memory error detector
==945== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==945== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==945== Command: python
==945==
==945==
==945== HEAP SUMMARY:
==945== in use at exit: 831,129 bytes in 230 blocks
==945== total heap usage: 1,560 allocs, 1,330 frees, 2,957,191 bytes allocated
==945==
==945== 568 bytes in 1 blocks are possibly lost in loss record 12 of 149
==945== at 0x483877F: malloc (vg_replace_malloc.c:307)
==945== by 0x4971301: UnknownInlinedFun (obmalloc.c:99)
==945== by 0x4971301: UnknownInlinedFun (obmalloc.c:572)
==945== by 0x4971301: UnknownInlinedFun (obmalloc.c:1956)
==945== by 0x4971301: UnknownInlinedFun (obmalloc.c:1949)
==945== by 0x4971301: UnknownInlinedFun (obmalloc.c:685)
==945== by 0x4971301: UnknownInlinedFun (gcmodule.c:2274)
==945== by 0x4971301: UnknownInlinedFun (gcmodule.c:2301)
==945== by 0x4971301: _PyObject_GC_NewVar (gcmodule.c:2332)
==945== by 0x49929FB: UnknownInlinedFun (frameobject.c:794)
==945== by 0x49929FB: UnknownInlinedFun (frameobject.c:837)
==945== by 0x49929FB: _PyEval_MakeFrameVector (ceval.c:4790)
==945== by 0x49A43A9: UnknownInlinedFun (ceval.c:5057)
==945== by 0x49A43A9: _PyFunction_Vectorcall (call.c:342)
==945== by 0x499350A: UnknownInlinedFun (abstract.h:114)
==945== by 0x499350A: UnknownInlinedFun (abstract.h:123)
==945== by 0x499350A: UnknownInlinedFun (ceval.c:5867)
==945== by 0x499350A: _PyEval_EvalFrameDefault (ceval.c:4198)
==945== by 0x49A43D7: UnknownInlinedFun (pycore_ceval.h:46)
==945== by 0x49A43D7: UnknownInlinedFun (ceval.c:5065)
==945== by 0x49A43D7: _PyFunction_Vectorcall (call.c:342)
==945== by 0x499350A: UnknownInlinedFun (abstract.h:114)
==945== by 0x499350A: UnknownInlinedFun (abstract.h:123)
==945== by 0x499350A: UnknownInlinedFun (ceval.c:5867)
==945== by 0x499350A: _PyEval_EvalFrameDefault (ceval.c:4198)
==945== by 0x49A43D7: UnknownInlinedFun (pycore_ceval.h:46)
==945== by 0x49A43D7: UnknownInlinedFun (ceval.c:5065)
==945== by 0x49A43D7: _PyFunction_Vectorcall (call.c:342)
==945== by 0x4993127: UnknownInlinedFun (abstract.h:114)
==945== by 0x4993127: UnknownInlinedFun (abstract.h:123)
==945== by 0x4993127: UnknownInlinedFun (ceval.c:5867)
==945== by 0x4993127: _PyEval_EvalFrameDefault (ceval.c:4213)
==945== by 0x49A43D7: UnknownInlinedFun (pycore_ceval.h:46)
==945== by 0x49A43D7: UnknownInlinedFun (ceval.c:5065)
==945== by 0x49A43D7: _PyFunction_Vectorcall (call.c:342)
==945== by 0x4993127: UnknownInlinedFun (abstract.h:114)
==945== by 0x4993127: UnknownInlinedFun (abstract.h:123)
==945== by 0x4993127: UnknownInlinedFun (ceval.c:5867)
==945== by 0x4993127: _PyEval_EvalFrameDefault (ceval.c:4213)
==945== by 0x49A43D7: UnknownInlinedFun (pycore_ceval.h:46)
==945== by 0x49A43D7: UnknownInlinedFun (ceval.c:5065)
==945== by 0x49A43D7: _PyFunction_Vectorcall (call.c:342)
==945==
==945== 720 bytes in 1 blocks are possibly lost in loss record 64 of 149
==945== at 0x483877F: malloc (vg_replace_malloc.c:307)
==945== by 0x49707D8: UnknownInlinedFun (obmalloc.c:99)
==945== by 0x49707D8: UnknownInlinedFun (obmalloc.c:572)
==945== by 0x49707D8: UnknownInlinedFun (obmalloc.c:1956)
==945== by 0x49707D8: _PyObject_Malloc (obmalloc.c:1949)
==945== by 0x49AA563: UnknownInlinedFun (obmalloc.c:2304)
==945== by 0x49AA563: UnknownInlinedFun (obmalloc.c:2323)
==945== by 0x49AA563: PyObject_Realloc (obmalloc.c:703)
==945== by 0x4A5BEF8: _PyObject_GC_Resize (gcmodule.c:2350)
==945== by 0x4992D24: UnknownInlinedFun (frameobject.c:809)
==945== by 0x4992D24: UnknownInlinedFun (frameobject.c:837)
==945== by 0x4992D24: _PyEval_MakeFrameVector (ceval.c:4790)
==945== by 0x49A43A9: UnknownInlinedFun (ceval.c:5057)
==945== by 0x49A43A9: _PyFunction_Vectorcall (call.c:342)
==945== by 0x4A51303: _PyObject_VectorcallTstate.lto_priv.5 (abstract.h:114)
==945== by 0x4A5122D: _PyObject_CallFunctionVa (call.c:479)
==945== by 0x4A54A23: PyObject_CallMethod (call.c:577)
==945== by 0x57CD46A: PyInit__decimal (_decimal.c:5734)
==945== by 0x4A78036: UnknownInlinedFun (importdl.c:167)
==945== by 0x4A78036: UnknownInlinedFun (import.c:2049)
==945== by 0x4A78036: _imp_create_dynamic (import.c.h:330)
==945== by 0x49A45D8: cfunction_vectorcall_FASTCALL (methodobject.c:430)
==945==
==945== 524,288 bytes in 1 blocks are definitely lost in loss record 149 of 149
==945== at 0x483877F: malloc (vg_replace_malloc.c:307)
==945== by 0x49706D8: UnknownInlinedFun (obmalloc.c:99)
==945== by 0x49706D8: UnknownInlinedFun (obmalloc.c:572)
==945== by 0x49706D8: UnknownInlinedFun (obmalloc.c:1956)
==945== by 0x49706D8: UnknownInlinedFun (obmalloc.c:1949)
==945== by 0x49706D8: PyObject_Malloc (obmalloc.c:685)
==945== by 0x55AA7B5: Buffer_Realloc (ultrajsonenc.c:150)
==945== by 0x55ABFB5: encode (ultrajsonenc.c:894)
==945== by 0x55AB8E3: encode (ultrajsonenc.c:757)
==945== by 0x55AC36C: JSON_EncodeObject (ultrajsonenc.c:985)
==945== by 0x55AE79D: objToJSON (objToJSON.c:874)
==945== by 0x49A3F72: cfunction_call (methodobject.c:543)
==945== by 0x499D509: _PyObject_MakeTpCall (call.c:215)
==945== by 0x499837E: UnknownInlinedFun (abstract.h:112)
==945== by 0x499837E: UnknownInlinedFun (abstract.h:99)
==945== by 0x499837E: UnknownInlinedFun (abstract.h:123)
==945== by 0x499837E: UnknownInlinedFun (ceval.c:5867)
==945== by 0x499837E: _PyEval_EvalFrameDefault (ceval.c:4181)
==945== by 0x4A532C1: UnknownInlinedFun (pycore_ceval.h:46)
==945== by 0x4A532C1: _PyEval_Vector (ceval.c:5065)
==945== by 0x4A53221: PyEval_EvalCode (ceval.c:1134)
==945==
==945== LEAK SUMMARY:
==945== definitely lost: 524,288 bytes in 1 blocks
==945== indirectly lost: 0 bytes in 0 blocks
==945== possibly lost: 1,288 bytes in 2 blocks
==945== still reachable: 305,553 bytes in 227 blocks
==945== suppressed: 0 bytes in 0 blocks
==945== Reachable blocks (those to which a pointer was found) are not shown.
==945== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==945==
==945== For lists of detected and suppressed errors, rerun with: -s
==945== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0) |
I remembered that tracemalloc is a thing. Since all allocations in ujson happen through Python's memory management methods (rather than plain Sadly, this isn't supported by PyPy. But we already skip some memory-related tests on PyPy for the same reason, and since I don't think there is any CPython-specific code in ujson, I'd expect memory leaks to always show up in a CPython-only test. So that might be fine. |
tracemalloc it is then. And yes, let's just skip PyPy. |
For reference, this is how it fails on main (4ac30c9): $ python3 tests/memory.py 'import functools, ujson; func = functools.partial(ujson.dumps, ["a" * 11000, b""])'; echo $?
<path>/ultrajson/tests/memory.py:32: size=64.5 KiB (+64.5 KiB), count=1 (+1), average=64.5 KiB
1 Or in the test suite:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
(I've set this to automerge when/if Travis completes, we might need to give it a nudge later.)
Travis CI's PowerPCs said no. |
@hugovk Would you prefer that those travis jobs were on GitHub Actions? Docker buildx would handle the architecture emulation (like in the wheel builder pipeline). Then we can just use the generic python base docker images which come in pretty much every architecture you can think of? |
Generally I'd like to ditch Travis completely! But for normal CI testing, the main question is speed, we want things to run in just a few minutes, and emulation on GHA can be very slow. (Not too big a deal for a release every few weeks and post merge builds.) Travis takes about 2 minutes each for these, but having jobs (ppc64le) in this case fail to queue up isn't great. Although Travis has been mostly stable recently. Do you have an idea how long it would run with emulation? |
Hmm, it takes 89 seconds to run the Ubuntu variants or 70 seconds to run the Alpine variants on my machine. Then you needs to add about 20 seconds to setup docker buildx and however long it takes GA to download 200MBs (Ubuntu) or 20MBs (Alpine) worth of docker images. So I'd guess about the same. I think I'll just try it and see... |
JSON_EncodeObject
returnsNULL
when an error occurs, but without freeing the buffer. This leads to a memory leak when the buffer is internally allocated (because the caller's buffer was insufficient or none was provided at all) and any error occurs. Similarly,objToJSON
did not clean up the buffer in all error conditions either.This adds the missing buffer free in
JSON_EncodeObject
(iff the buffer was allocated internally) and refactors the error handling inobjToJSON
slightly to also free the buffer when a Python exception occurred without the encoder'serrorMsg
being set.I haven't added a test for this so far, and I'm not sure how to do it. The stdlib only has
resource.getrusage
for this (as far as I could see), and on Linux, this only provides the max RSS (ixrss, idrss, and isrss are always zero per the man page). Since the test suite constantly allocates and frees memory, including some huge objects, that makes it kind of messy (subprocess?). Let me know what you think about that.A quick manual test using a
bytes
object to trigger an exception:With main: 10224 → 78068 (kB)
With this branch: 10152 → 10152
(The exact numbers vary slightly, so that difference of 72 kB is meaningless.)
Note that the initial
ujson.dumps
causes an increase in memory usage (by ~350 kB on my machine). I haven't looked into that at all, but I assume it's static variables etc.?