Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of surrogates on encoding #530

Merged
merged 3 commits into from Jun 1, 2022

Conversation

JustAnotherArchivist
Copy link
Collaborator

@JustAnotherArchivist JustAnotherArchivist commented Apr 17, 2022

This allows surrogates anywhere in the input, compatible with the json module from the standard library.

This also refactors two interfaces:

  • The PyUnicode to char* conversion is moved into its own function, separated from the JSONTypeContext handling, so it can be reused for other things in the future (e.g. indentation and separators) which don't have a type context.
  • Converting the char* output to a Python string with surrogates intact requires the string length for PyUnicode_Decode & Co. While strlen could be used, the length is already known inside the encoder, so the encoder function now also takes an extra size_t pointer argument to return that and no longer NUL-terminates the string. This also permits output that contains NUL bytes (even though that would be invalid JSON), e.g. if an object's __json__ method return value were to contain them.

Fixes #156
Fixes #447
Fixes #537
Supersedes #284

@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2022

Codecov Report

Merging #530 (59aa3bf) into main (b300d64) will decrease coverage by 0.07%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #530      +/-   ##
==========================================
- Coverage   91.76%   91.68%   -0.08%     
==========================================
  Files           6        6              
  Lines        1821     1828       +7     
==========================================
+ Hits         1671     1676       +5     
- Misses        150      152       +2     
Impacted Files Coverage Δ
lib/ultrajsonenc.c 85.78% <100.00%> (-0.30%) ⬇️
python/objToJSON.c 90.21% <100.00%> (-0.22%) ⬇️
tests/test_ujson.py 99.61% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b300d64...59aa3bf. Read the comment docs.

@JustAnotherArchivist
Copy link
Collaborator Author

Regarding the change of the JSON_EncodeObject interface, I figured that this is the cleanest way to implement it. However, if it's better avoided (is that considered public API? I'm not really sure...), the information is also available in the encoder struct, so it could be calculated from that in objToJSON instead.

lib/ultrajsonenc.c Show resolved Hide resolved
python/objToJSON.c Outdated Show resolved Hide resolved
@Erotemic
Copy link
Contributor

@JustAnotherArchivist I ran benchmarks on this branch and on main to compare them.

Trying to compare those number really made we wish I had any easy way to dump the numbers from specific runs into a file and then reload them so I can compare across versions using t-tests instead of trying to look at these pointwise measurements. I ran each benchmark 4 times. Here are the results from the main branch:

#########
# Before: main - b3f8754c8a0c743e9f80c06472ec7a7adc96f438

|                                                                               | ujson      | nujson     | orjson     | simplejson | json       |
|-------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:|-----------:|
| Array with 256 doubles                                                        |            |            |            |            |            |
| encode                                                                        |     27,517 |      8,628 |    146,771 |      6,223 |      6,508 |
| decode                                                                        |     51,582 |     74,452 |     80,940 |     18,849 |     20,600 |
| Array with 256 UTF-8 strings                                                  |            |            |            |            |            |
| encode                                                                        |      6,343 |      5,106 |     29,019 |      5,143 |      5,437 |
| decode                                                                        |      3,200 |      3,222 |      2,096 |        650 |      2,814 |
| Array with 256 strings                                                        |            |            |            |            |            |
| encode                                                                        |     81,984 |     55,767 |    151,685 |     30,407 |     44,763 |
| decode                                                                        |     47,109 |     45,275 |     64,663 |     59,936 |     62,081 |
| Medium complex object                                                         |            |            |            |            |            |
| encode                                                                        |     21,927 |     23,676 |     68,060 |      8,082 |      9,093 |
| decode                                                                        |     19,358 |     20,629 |     26,938 |     11,895 |     15,046 |
| Array with 256 True values                                                    |            |            |            |            |            |
| encode                                                                        |    231,129 |    198,916 |    714,383 |    132,045 |    139,363 |
| decode                                                                        |    383,955 |    310,780 |    398,080 |    250,025 |    266,290 |
| Array with 256 dict{string, int} pairs                                        |            |            |            |            |            |
| encode                                                                        |     25,301 |     26,959 |     96,947 |      6,824 |     12,689 |
| decode                                                                        |     22,374 |     20,400 |     27,277 |     13,244 |     17,475 |
| Dict with 256 arrays with 256 dict{string, int} pairs                         |            |            |            |            |            |
| encode                                                                        |         86 |         93 |        237 |         20 |         43 |
| decode                                                                        |         56 |         54 |         57 |         33 |         42 |
| Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys |            |            |            |            |            |
| encode                                                                        |         82 |         68 |            |         15 |         44 |
| Complex object                                                                |            |            |            |            |            |
| encode                                                                        |        874 |      1,052 |            |        749 |        768 |
| decode                                                                        |        779 |        762 |            |        293 |        531 |


- CPython 3.9.9 (main, Jan  6 2022, 18:33:12) [GCC 10.3.0]
- ujson        : 5.2.1.dev14

|                                                                               | ujson      |
|-------------------------------------------------------------------------------|-----------:|
| Array with 256 doubles                                                        |            |
| encode                                                                        |     32,143 |
| decode                                                                        |     53,294 |
| Array with 256 UTF-8 strings                                                  |            |
| encode                                                                        |      6,551 |
| decode                                                                        |      3,488 |
| Array with 256 strings                                                        |            |
| encode                                                                        |     84,152 |
| decode                                                                        |     44,547 |
| Medium complex object                                                         |            |
| encode                                                                        |     20,955 |
| decode                                                                        |     19,723 |
| Array with 256 True values                                                    |            |
| encode                                                                        |    222,522 |
| decode                                                                        |    388,101 |
| Array with 256 dict{string, int} pairs                                        |            |
| encode                                                                        |     23,855 |
| decode                                                                        |     22,189 |
| Dict with 256 arrays with 256 dict{string, int} pairs                         |            |
| encode                                                                        |         90 |
| decode                                                                        |         51 |
| Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys |            |
| encode                                                                        |         73 |
| Complex object                                                                |            |
| encode                                                                        |        770 |
| decode                                                                        |        762 |


|                                                                               | ujson      |
|-------------------------------------------------------------------------------|-----------:|
| Array with 256 doubles                                                        |            |
| encode                                                                        |     38,800 |
| decode                                                                        |     55,194 |
| Array with 256 UTF-8 strings                                                  |            |
| encode                                                                        |      6,370 |
| decode                                                                        |      3,460 |
| Array with 256 strings                                                        |            |
| encode                                                                        |     78,901 |
| decode                                                                        |     43,462 |
| Medium complex object                                                         |            |
| encode                                                                        |     22,154 |
| decode                                                                        |     20,283 |
| Array with 256 True values                                                    |            |
| encode                                                                        |    228,851 |
| decode                                                                        |    402,941 |
| Array with 256 dict{string, int} pairs                                        |            |
| encode                                                                        |     24,935 |
| decode                                                                        |     23,396 |
| Dict with 256 arrays with 256 dict{string, int} pairs                         |            |
| encode                                                                        |         91 |
| decode                                                                        |         52 |
| Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys |            |
| encode                                                                        |         78 |
| Complex object                                                                |            |
| encode                                                                        |        826 |
| decode                                                                        |        753 |


|                                                                               | ujson      |
|-------------------------------------------------------------------------------|-----------:|
| Array with 256 doubles                                                        |            |
| encode                                                                        |     31,242 |
| decode                                                                        |     52,362 |
| Array with 256 UTF-8 strings                                                  |            |
| encode                                                                        |      6,217 |
| decode                                                                        |      3,581 |
| Array with 256 strings                                                        |            |
| encode                                                                        |     82,842 |
| decode                                                                        |     46,859 |
| Medium complex object                                                         |            |
| encode                                                                        |     20,831 |
| decode                                                                        |     20,506 |
| Array with 256 True values                                                    |            |
| encode                                                                        |    220,400 |
| decode                                                                        |    405,073 |
| Array with 256 dict{string, int} pairs                                        |            |
| encode                                                                        |     25,198 |
| decode                                                                        |     22,576 |
| Dict with 256 arrays with 256 dict{string, int} pairs                         |            |
| encode                                                                        |         93 |
| decode                                                                        |         57 |
| Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys |            |
| encode                                                                        |         85 |
| Complex object                                                                |            |
| encode                                                                        |        852 |
| decode                                                                        |        708 |

And here are the results from this branch:

########
# After: c9df7129c1dbd94f333c0ebd9de8470142aa092d

|                                                                               | ujson      | nujson     | orjson     | simplejson | json       |
|-------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:|-----------:|
| Array with 256 doubles                                                        |            |            |            |            |            |
| encode                                                                        |     38,501 |      9,889 |    141,171 |      7,040 |      6,792 |
| decode                                                                        |     52,256 |     80,021 |     85,377 |     19,820 |     19,733 |
| Array with 256 UTF-8 strings                                                  |            |            |            |            |            |
| encode                                                                        |      5,817 |      5,453 |     28,334 |      5,393 |      5,704 |
| decode                                                                        |      3,600 |      3,509 |      2,195 |        715 |      3,171 |
| Array with 256 strings                                                        |            |            |            |            |            |
| encode                                                                        |     85,067 |     54,707 |    143,638 |     30,560 |     42,362 |
| decode                                                                        |     45,363 |     41,422 |     67,748 |     58,435 |     61,740 |
| Medium complex object                                                         |            |            |            |            |            |
| encode                                                                        |     18,818 |     24,514 |     69,272 |      8,066 |     10,271 |
| decode                                                                        |     19,737 |     21,578 |     28,814 |     11,917 |     15,318 |
| Array with 256 True values                                                    |            |            |            |            |            |
| encode                                                                        |    230,512 |    201,239 |    744,935 |    135,794 |    139,228 |
| decode                                                                        |    385,795 |    294,987 |    398,714 |    239,698 |    266,018 |
| Array with 256 dict{string, int} pairs                                        |            |            |            |            |            |
| encode                                                                        |     22,762 |     14,233 |     66,452 |      6,463 |     12,255 |
| decode                                                                        |     22,778 |     22,681 |     27,304 |     11,938 |     18,121 |
| Dict with 256 arrays with 256 dict{string, int} pairs                         |            |            |            |            |            |
| encode                                                                        |         85 |         92 |        333 |         21 |         45 |
| decode                                                                        |         53 |         52 |         53 |         37 |         44 |
| Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys |            |            |            |            |            |
| encode                                                                        |         72 |         71 |            |         16 |         45 |
| Complex object                                                                |            |            |            |            |            |
| encode                                                                        |        806 |      1,036 |            |        711 |        763 |
| decode                                                                        |        754 |        759 |            |        280 |        505 |



|                                                                               | ujson      |
|-------------------------------------------------------------------------------|-----------:|
| Array with 256 doubles                                                        |            |
| encode                                                                        |     41,131 |
| decode                                                                        |     55,802 |
| Array with 256 UTF-8 strings                                                  |            |
| encode                                                                        |      5,655 |
| decode                                                                        |      3,403 |
| Array with 256 strings                                                        |            |
| encode                                                                        |     85,681 |
| decode                                                                        |     47,392 |
| Medium complex object                                                         |            |
| encode                                                                        |     19,911 |
| decode                                                                        |     19,984 |
| Array with 256 True values                                                    |            |
| encode                                                                        |    227,750 |
| decode                                                                        |    394,301 |
| Array with 256 dict{string, int} pairs                                        |            |
| encode                                                                        |     24,164 |
| decode                                                                        |     23,753 |
| Dict with 256 arrays with 256 dict{string, int} pairs                         |            |
| encode                                                                        |         86 |
| decode                                                                        |         47 |
| Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys |            |
| encode                                                                        |         68 |
| Complex object                                                                |            |
| encode                                                                        |        792 |
| decode                                                                        |        725 |


|                                                                               | ujson      |
|-------------------------------------------------------------------------------|-----------:|
| Array with 256 doubles                                                        |            |
| encode                                                                        |     38,121 |
| decode                                                                        |     54,035 |
| Array with 256 UTF-8 strings                                                  |            |
| encode                                                                        |      5,180 |
| decode                                                                        |      3,581 |
| Array with 256 strings                                                        |            |
| encode                                                                        |     84,799 |
| decode                                                                        |     45,156 |
| Medium complex object                                                         |            |
| encode                                                                        |     17,860 |
| decode                                                                        |     21,003 |
| Array with 256 True values                                                    |            |
| encode                                                                        |    214,279 |
| decode                                                                        |    369,041 |
| Array with 256 dict{string, int} pairs                                        |            |
| encode                                                                        |     23,489 |
| decode                                                                        |     23,781 |
| Dict with 256 arrays with 256 dict{string, int} pairs                         |            |
| encode                                                                        |         87 |
| decode                                                                        |         52 |
| Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys |            |
| encode                                                                        |         64 |
| Complex object                                                                |            |
| encode                                                                        |        761 |
| decode                                                                        |        724 |


|                                                                               | ujson      |
|-------------------------------------------------------------------------------|-----------:|
| Array with 256 doubles                                                        |            |
| encode                                                                        |     38,605 |
| decode                                                                        |     53,182 |
| Array with 256 UTF-8 strings                                                  |            |
| encode                                                                        |      5,304 |
| decode                                                                        |      3,605 |
| Array with 256 strings                                                        |            |
| encode                                                                        |     81,199 |
| decode                                                                        |     45,518 |
| Medium complex object                                                         |            |
| encode                                                                        |     19,501 |
| decode                                                                        |     21,424 |
| Array with 256 True values                                                    |            |
| encode                                                                        |    226,025 |
| decode                                                                        |    371,919 |
| Array with 256 dict{string, int} pairs                                        |            |
| encode                                                                        |     22,941 |
| decode                                                                        |     24,784 |
| Dict with 256 arrays with 256 dict{string, int} pairs                         |            |
| encode                                                                        |         88 |
| decode                                                                        |         57 |
| Dict with 256 arrays with 256 dict{string, int} pairs, outputting sorted keys |            |
| encode                                                                        |         77 |
| Complex object                                                                |            |
| encode                                                                        |        832 |
| decode                                                                        |        741 |

Overall, I think there might be a slight performance regression here, but it's hard to be sure without the t-test. The "Array with 256 UTF-8" test does seem to be ~16% slower.

@JustAnotherArchivist JustAnotherArchivist marked this pull request as draft May 29, 2022 05:40
@Erotemic
Copy link
Contributor

So it does seem that there is a very slight, but measurable performance regression. I suppose this is to be expected given that this is doing strictly more than the previous code. I'm not sure if there is any way around it. But I did 104 paired t-tests and got that (impl_version=5.2.1.dev51 without this PR) performs better than (impl_version=5.2.1.dev9 with this PR) on average (p=0.00001616).

Here is an example plot:

image

Might need to run it a few more times to buff up the stats because I'm getting a high std on the times for this PR. It's probably a fluke, but more stats will help figure that out.

image

This allows surrogates anywhere in the input, compatible with the json module from the standard library.

This also refactors two interfaces:
- The `PyUnicode` to `char*` conversion is moved into its own function, separated from the `JSONTypeContext` handling, so it can be reused for other things in the future (e.g. indentation and separators) which don't have a type context.
- Converting the `char*` output to a Python string with surrogates intact requires the string length for `PyUnicode_Decode` & Co. While `strlen` could be used, the length is already known inside the encoder, so the encoder function now also takes an extra `size_t` pointer argument to return that and no longer NUL-terminates the string. This also permits output that contains NUL bytes (even though that would be invalid JSON), e.g. if an object's `__json__` method return value were to contain them.

Fixes ultrajson#156
Fixes ultrajson#447
Fixes ultrajson#537
Supersedes ultrajson#284
@JustAnotherArchivist
Copy link
Collaborator Author

JustAnotherArchivist commented May 30, 2022

@Erotemic Thank you! It's good that you see a difference between the two because there should be one in the code so far as it does extra work (the encoding string comparison, in particular). :-)
The array of dict loads graph is funny. Over an order of magnitude of stdev for that? That's insane...

I've also run my own benchmarks tonight and noticed that I appear to have introduced a memory leak here (which, of course, would also have a performance impact via constant allocations). I don't immediately see where that might be coming from though. It only occurs on non-ASCII characters, so I'm pretty sure it has to be related to my use of PyUnicode_AsEncodedString. But PyUnicodeToUTF8 stores the bytes object pointer in the context's newObj member (by passing that to PyUnicodeToUTF8Raw), which should get XDECREF'd in Object_endTypeContext. An easy way to reproduce is this, which peaks at 481 MB RSS for me (reduce the number to get a smaller leak, naturally) compared to ~10 MB RSS with main or an ASCII character:

python3 -c 'import ujson,timeit; o = "‽"; timeit.timeit("ujson.dumps(o)", number = 10000000, globals = globals())'

I haven't analysed it yet, only just saw it minutes ago really, so it might be something totally obvious. If anyone sees something, please let me know, else I'll throw some tooling at it tomorrow.

Edit: Nevermind, I realised what's going on. I need to pass the newObj pointer in by reference of course, not by value, else it's never overwritten.

@Erotemic
Copy link
Contributor

Over an order of magnitude of stdev for that? That's insane...

One of my FOSS goals to accomplish on my vacation was to get nice benchmarks for this. I went a bit overboard and started writing a very general benchmark and analysis framework (#542), and I scrambled a bit on Sunday night to get something out the door for this PR. As such, I wouldn't be surprised if I just didn't run enough iterations or use robust enough outlier rejection. Next change I get to work on this, I'll run a more thorough set of measurements. Another clue that this could be the case is that Python's json also has a high std for some values of size, but not others.

I could also see a memory leak messing with cache efficiency, so perhaps now that that's fixed the high std will go away.

@Erotemic
Copy link
Contributor

I ran the stats again 10 times on 5.2.1.dev9, and found that the std did not go away. In fact it got smoother, indicating that it might be a real effect. (although, I'm not sure if I can explain the behavior of Python's json in the dumps variant of that test).

However, when updating the branch to the latest (which is now 5.3.1.dev3), the issue does seem to have gone away

image

With the fix to the memory leak, the benchmarks are much closer. The openskill win probs give this branch a 33.5% and main a 37.4%.

Overall they are very close:

mean_time        count      mean       std           min       25%       50%       75%       max
impl_version                                                                                    
5.3.1.dev3    247192.0  0.000334  0.001044  3.670000e-07  0.000001  0.000010  0.000144  0.008951
5.2.1.dev51   157304.0  0.000335  0.001056  3.705000e-07  0.000001  0.000010  0.000142  0.009026
('json', '2.0.9') ('ujson', '5.2.1.dev51') ('ujson', '5.2.1.dev9') ('ujson', '5.3.1.dev3')
('Array of Dict[str, int]', 'dumps') 13,465.78 24,483.13 24,079.77 24,448.54
('Array of Dict[str, int]', 'loads') 19,205.54 24,485.10 23,134.04 24,175.13
('Array with True values', 'dumps') 143,011.13 247,006.81 239,535.62 246,471.54
('Array with True values', 'loads') 265,125.40 374,553.88 363,526.20 369,392.79
('Array with UTF-8 strings', 'dumps') 6,098.56 8,523.53 7,072.90 8,275.22
('Array with UTF-8 strings', 'loads') 3,555.35 3,828.23 3,830.45 3,833.94
('Array with doubles', 'dumps') 7,808.38 41,552.05 41,808.82 41,069.89
('Array with doubles', 'loads') 22,633.05 57,433.73 56,682.77 57,067.74
('Complex object', 'dumps') 800.46 1,153.45 1,110.43 1,159.12
('Complex object', 'loads') 556.07 815.78 807.11 813.17

@JustAnotherArchivist
Copy link
Collaborator Author

@Erotemic Thank you! This does match what I'm seeing in my own benchmarks as well: very similar timings and no statistically significant differences, though I'm not very confident in my statistics, so I'll exclude that here.

I wrote my own small script rather than using the existing benchmarks for that. I'm specifically focusing only on dumps involving strings since anything else should be unaffected. I'm comparing main and this PR with encoding = "utf-8" and NULL; the middle one is mostly included because I wondered what the performance impact of that string comparison is. I built ujson in debug and production mode for all of these (though I doubt that matters). I'm running the tests on Python 3.10.1 and Debian using timeit.repeat (i.e. many runs in each measurement). The number of repetitions is determined dynamically such that a call is roughly 0.03 seconds, and then that's repeated 100 times. I'm testing strings of different lengths and ASCII/UTF-8 on their own, in lists, in dicts, and in dicts with sort_keys = True, 70 tests in total. That should cover everything of relevance to this PR, I think, but if anyone has additional suggestions, please let me know.

Code

Warning: poor and dirty code ahead.

Building from three branches in the repo: main (b300d64), fix-encode-surrogates (59aa3bf), and fix-encode-surrogates-with-string-comp (9b9af1a plus cherry-pick 59aa3bf)

for branch in main fix-encode-surrogates fix-encode-surrogates-with-string-comp; do git switch ${branch} && make clean && make build && mv -nv ujson.cpython-310-x86_64-linux-gnu.so ujson.cpython-310-x86_64-linux-gnu.so_${branch}_debug && make clean && make build-prod && mv -nv ujson.cpython-310-x86_64-linux-gnu.so ujson.cpython-310-x86_64-linux-gnu.so_${branch}_prod; done

with this simple Makefile:

clean:
	rm -rf build/ ujson.cpython-*.so

build:
	CFLAGS='-DDEBUG' python3 setup.py develop

build-prod:
	python3 setup.py develop

This results in six .so files:

ujson.cpython-310-x86_64-linux-gnu.so_fix-encode-surrogates_debug
ujson.cpython-310-x86_64-linux-gnu.so_fix-encode-surrogates_prod
ujson.cpython-310-x86_64-linux-gnu.so_fix-encode-surrogates-with-string-comp_debug
ujson.cpython-310-x86_64-linux-gnu.so_fix-encode-surrogates-with-string-comp_prod
ujson.cpython-310-x86_64-linux-gnu.so_main_debug
ujson.cpython-310-x86_64-linux-gnu.so_main_prod

Test driver script (./test):

#!/bin/bash
set -e

strings=(
	'""'
	'"a"'
	'"a" * 10'
	'"a" * 1000'
	'"‽"'
	'"‽" * 10'
	'"‽" * 1000'
        )

os=("${strings[@]}")
for string in "${strings[@]}"
do
	os+=(
		"[${string}]"
		"[${string} for i in range(10)]"
		"[${string} for i in range(100)]"
		"{${string}: ${string}}"
		"{${string} + str(i): ${string} for i in range(10)}"
		"{${string} + str(i): ${string} for i in range(100)}"
	    )
done

for o in "${os[@]}"
do
	num=
	for extra in '' sort
	do
		for so in ujson.cpython-310-x86_64-linux-gnu.so_*
		do
			echo "${so} ${o} ${extra}" >&2
			ln -sf "${so}" ujson.cpython-310-x86_64-linux-gnu.so

			if [[ -z "${num}" ]]
			then
				num=$(python3 test.py --num "${o}")
				echo "Repeats: ${num}" >&2
			fi

			python3 test.py "${so}" "${o}" "${num}" ${extra}
		done
		if [[ "${o}" != '{'* ]]
		then
			break
		fi
	done
done

Actual test code:

import sys
import timeit
import ujson

o = eval(sys.argv[2])
stmt = 'ujson.dumps(o)'
if sys.argv[4:]:
	stmt = 'ujson.dumps(o, sort_keys = True)'
t = timeit.Timer(stmt, globals = globals())

if sys.argv[1] == '--num':
	num = None
	def cb(number, timeTaken):
		global num
		num = number
	t.autorange(cb)
	num = num // 10
	print(num)
	sys.exit(0)
else:
	num = int(sys.argv[3])
out = t.repeat(repeat = 100, number = num)
print(ujson.dumps({'so': sys.argv[1], 'o': sys.argv[2], 'sorted': bool(sys.argv[4:]), 'stmt': stmt, 'num': num, 'times': out}))

Aggregation script:

import collections
import json
import math
import sys


UNITS = ['s', 'ms', 'μs']
def format_time(t):
	for unit in UNITS:
		if t >= 1:
			return f'{t:.3f} {unit}'
		t *= 1000
	return f'{t:.3f} ns'


names = {
	'ujson.cpython-310-x86_64-linux-gnu.so_fix-encode-surrogates_debug': 'PR dbg',
	'ujson.cpython-310-x86_64-linux-gnu.so_fix-encode-surrogates_prod': 'PR',
	'ujson.cpython-310-x86_64-linux-gnu.so_fix-encode-surrogates-with-string-comp_debug': 'PR utf-8 dbg',
	'ujson.cpython-310-x86_64-linux-gnu.so_fix-encode-surrogates-with-string-comp_prod': 'PR utf-8',
	'ujson.cpython-310-x86_64-linux-gnu.so_main_debug': 'main dbg',
	'ujson.cpython-310-x86_64-linux-gnu.so_main_prod': 'main',
}


objs = [json.loads(line) for line in sys.stdin]
nums = collections.defaultdict(set)
for obj in objs:
	nums[(obj['o'], obj['sorted'])].add(obj['num'])
assert all(len(s) == 1 for s in nums.values()), nums
nums = {k: v.pop() for k, v in nums.items()}

results = collections.defaultdict(dict)
for obj in objs:
	results[(obj['o'], obj['sorted'])][names[obj['so']]] = [t / nums[(obj['o'], obj['sorted'])] for t in obj['times']]
assert all(sorted(v.keys()) == sorted(list(results.values())[0].keys()) for v in results.values())
table = []
sos = sorted(list(results.values())[0].keys())
table.append(['object', 'repetitions'] + sos)
for o, sorted_ in sorted(results.keys()):
	objStr = f'`{o}`' + (' (sorted)' if sorted_ else '')
	num = nums[(o, sorted_)]
	row = [objStr, num]
	for so in sos:
		times = results[(o, sorted_)][so]
		avg = sum(times) / len(times)
		std = math.sqrt(sum((t - avg) ** 2 for t in times) / (len(times) - 1))
		row.append(f'{format_time(avg)} ± {format_time(std)}')
	table.append(row)

for row in table:
	print(' | '.join(map(str, row)))

It isn't pretty, but it gets the job done. :-)

Detailed results
  • Times are mean ± standard deviation across the 100 timings of calls to ujson.dumps (divided by the repetitions, so it's the time for one call).
  • main = b300d64, PR = 59aa3bf, PR utf-8 = 59aa3bf but with encoding = "utf-8" in the PyUnicode_AsEncodedString calls (9b9af1a plus cherry-pick 59aa3bf); 'dbg' = -DDEBUG builds
object repetitions PR PR dbg PR utf-8 PR utf-8 dbg main main dbg
"" 100000 330.962 ns ± 21.579 ns 331.388 ns ± 26.562 ns 330.559 ns ± 9.437 ns 327.031 ns ± 12.451 ns 357.985 ns ± 66.177 ns 343.491 ns ± 35.661 ns
"a" 100000 331.411 ns ± 8.811 ns 331.438 ns ± 19.186 ns 335.556 ns ± 25.898 ns 334.828 ns ± 23.093 ns 341.940 ns ± 28.286 ns 342.782 ns ± 13.895 ns
"a" * 10 100000 359.705 ns ± 23.576 ns 374.189 ns ± 43.659 ns 361.411 ns ± 10.400 ns 373.530 ns ± 64.270 ns 367.340 ns ± 29.009 ns 408.382 ns ± 46.798 ns
"a" * 1000 10000 2.062 μs ± 64.108 ns 2.670 μs ± 272.481 ns 2.003 μs ± 123.380 ns 2.570 μs ± 73.442 ns 2.098 μs ± 70.130 ns 2.581 μs ± 163.473 ns
"‽" 100000 388.676 ns ± 18.003 ns 388.502 ns ± 24.181 ns 404.346 ns ± 26.557 ns 404.844 ns ± 29.161 ns 407.890 ns ± 58.940 ns 388.106 ns ± 19.613 ns
"‽" * 10 50000 555.224 ns ± 108.290 ns 514.468 ns ± 58.516 ns 507.981 ns ± 36.753 ns 507.013 ns ± 17.370 ns 527.522 ns ± 74.904 ns 513.023 ns ± 75.479 ns
"‽" * 1000 5000 7.865 μs ± 1.002 μs 8.810 μs ± 1.401 μs 7.348 μs ± 342.788 ns 8.477 μs ± 1.059 μs 7.496 μs ± 502.910 ns 8.811 μs ± 1.379 μs
["" for i in range(10)] 50000 808.299 ns ± 38.030 ns 844.152 ns ± 58.747 ns 843.949 ns ± 70.050 ns 818.540 ns ± 20.382 ns 823.642 ns ± 57.387 ns 815.330 ns ± 17.406 ns
["" for i in range(100)] 5000 4.864 μs ± 262.122 ns 4.689 μs ± 156.891 ns 4.979 μs ± 464.768 ns 4.806 μs ± 264.168 ns 4.840 μs ± 553.113 ns 4.978 μs ± 602.529 ns
[""] 50000 464.924 ns ± 44.260 ns 445.698 ns ± 16.507 ns 460.880 ns ± 64.904 ns 498.537 ns ± 101.675 ns 553.022 ns ± 153.658 ns 465.797 ns ± 74.576 ns
["a" * 10 for i in range(10)] 20000 1.009 μs ± 117.170 ns 1.068 μs ± 66.220 ns 977.184 ns ± 30.245 ns 1.039 μs ± 79.822 ns 1.010 μs ± 78.960 ns 1.057 μs ± 52.710 ns
["a" * 10 for i in range(100)] 5000 6.323 μs ± 576.793 ns 6.747 μs ± 179.390 ns 6.347 μs ± 362.491 ns 6.722 μs ± 260.656 ns 6.508 μs ± 276.877 ns 6.738 μs ± 362.473 ns
["a" * 1000 for i in range(10)] 1000 15.841 μs ± 563.196 ns 21.570 μs ± 773.005 ns 15.783 μs ± 1.702 μs 21.437 μs ± 761.532 ns 16.631 μs ± 717.250 ns 21.774 μs ± 1.680 μs
["a" * 1000 for i in range(100)] 100 160.445 μs ± 9.618 μs 251.155 μs ± 16.864 μs 166.693 μs ± 18.840 μs 247.247 μs ± 18.211 μs 165.248 μs ± 5.803 μs 248.047 μs ± 16.170 μs
["a" * 1000] 10000 1.985 μs ± 70.743 ns 2.537 μs ± 114.215 ns 1.962 μs ± 43.358 ns 2.536 μs ± 129.308 ns 2.101 μs ± 99.484 ns 2.612 μs ± 258.642 ns
["a" * 10] 50000 462.762 ns ± 51.515 ns 460.432 ns ± 48.403 ns 488.346 ns ± 37.602 ns 450.428 ns ± 39.787 ns 472.680 ns ± 69.874 ns 466.314 ns ± 34.472 ns
["a" for i in range(10)] 50000 868.938 ns ± 39.353 ns 883.671 ns ± 80.923 ns 844.851 ns ± 26.965 ns 860.837 ns ± 47.408 ns 859.650 ns ± 59.082 ns 860.623 ns ± 60.249 ns
["a" for i in range(100)] 5000 5.006 μs ± 280.968 ns 5.146 μs ± 627.172 ns 4.993 μs ± 239.469 ns 4.880 μs ± 266.675 ns 5.051 μs ± 412.666 ns 4.951 μs ± 234.397 ns
["a"] 50000 421.045 ns ± 20.659 ns 429.624 ns ± 16.557 ns 431.644 ns ± 70.713 ns 434.711 ns ± 40.802 ns 433.016 ns ± 37.992 ns 437.467 ns ± 37.961 ns
["‽" * 10 for i in range(10)] 10000 2.008 μs ± 93.952 ns 2.231 μs ± 378.806 ns 2.158 μs ± 71.979 ns 2.355 μs ± 265.526 ns 2.200 μs ± 412.928 ns 2.022 μs ± 33.960 ns
["‽" * 10 for i in range(100)] 2000 16.535 μs ± 1.415 μs 16.848 μs ± 1.895 μs 17.333 μs ± 1.563 μs 18.998 μs ± 1.420 μs 16.014 μs ± 775.730 ns 15.583 μs ± 474.698 ns
["‽" * 1000 for i in range(10)] 500 68.734 μs ± 4.193 μs 74.115 μs ± 1.556 μs 68.692 μs ± 3.590 μs 75.018 μs ± 5.948 μs 69.124 μs ± 1.685 μs 75.870 μs ± 2.469 μs
["‽" * 1000 for i in range(100)] 20 671.081 μs ± 33.647 μs 1.082 ms ± 99.842 μs 664.693 μs ± 59.960 μs 1.068 ms ± 50.550 μs 700.504 μs ± 81.766 μs 1.184 ms ± 282.350 μs
["‽" * 1000] 5000 6.970 μs ± 187.767 ns 7.594 μs ± 179.000 ns 6.993 μs ± 195.717 ns 7.652 μs ± 271.062 ns 7.017 μs ± 162.788 ns 7.661 μs ± 153.459 ns
["‽" * 10] 50000 661.926 ns ± 111.926 ns 637.064 ns ± 124.779 ns 658.298 ns ± 104.951 ns 657.599 ns ± 147.319 ns 576.827 ns ± 119.654 ns 566.266 ns ± 43.490 ns
["‽" for i in range(10)] 20000 1.290 μs ± 120.439 ns 1.313 μs ± 36.971 ns 1.456 μs ± 130.919 ns 1.570 μs ± 259.890 ns 1.285 μs ± 49.978 ns 1.261 μs ± 60.233 ns
["‽" for i in range(100)] 5000 9.176 μs ± 566.832 ns 10.385 μs ± 1.521 μs 10.248 μs ± 871.610 ns 11.451 μs ± 928.792 ns 8.983 μs ± 175.036 ns 9.132 μs ± 404.129 ns
["‽"] 50000 459.685 ns ± 19.125 ns 460.470 ns ± 31.992 ns 494.480 ns ± 46.169 ns 487.892 ns ± 15.926 ns 457.412 ns ± 8.236 ns 470.631 ns ± 66.596 ns
{"" + str(i): "" for i in range(10)} 20000 1.448 μs ± 178.279 ns 1.504 μs ± 105.600 ns 1.645 μs ± 82.541 ns 1.571 μs ± 105.794 ns 1.779 μs ± 357.363 ns 1.495 μs ± 244.853 ns
{"" + str(i): "" for i in range(10)} (sorted) 20000 2.340 μs ± 113.971 ns 2.577 μs ± 520.136 ns 2.590 μs ± 412.858 ns 2.588 μs ± 490.597 ns 2.268 μs ± 85.880 ns 2.635 μs ± 441.124 ns
{"" + str(i): "" for i in range(100)} 2000 12.661 μs ± 3.115 μs 13.987 μs ± 4.396 μs 13.162 μs ± 455.384 ns 13.837 μs ± 1.201 μs 11.597 μs ± 1.415 μs 12.076 μs ± 841.905 ns
{"" + str(i): "" for i in range(100)} (sorted) 2000 17.813 μs ± 1.518 μs 18.737 μs ± 1.219 μs 19.775 μs ± 4.103 μs 19.637 μs ± 1.113 μs 17.280 μs ± 699.506 ns 18.892 μs ± 4.041 μs
{"": ""} 50000 512.216 ns ± 49.006 ns 514.969 ns ± 32.391 ns 535.441 ns ± 40.912 ns 540.447 ns ± 50.803 ns 533.904 ns ± 25.071 ns 510.613 ns ± 24.115 ns
{"": ""} (sorted) 50000 992.358 ns ± 68.370 ns 1.030 μs ± 83.488 ns 1.006 μs ± 71.661 ns 1.064 μs ± 73.835 ns 1.045 μs ± 70.550 ns 1.002 μs ± 46.819 ns
{"a" * 10 + str(i): "a" * 10 for i in range(10)} 10000 2.032 μs ± 74.509 ns 2.031 μs ± 83.832 ns 2.189 μs ± 125.064 ns 2.225 μs ± 85.969 ns 2.013 μs ± 195.430 ns 2.066 μs ± 176.608 ns
{"a" * 10 + str(i): "a" * 10 for i in range(10)} (sorted) 10000 2.948 μs ± 185.798 ns 3.209 μs ± 272.913 ns 3.044 μs ± 305.726 ns 3.176 μs ± 345.496 ns 2.862 μs ± 145.853 ns 2.999 μs ± 141.117 ns
{"a" * 10 + str(i): "a" * 10 for i in range(100)} 2000 15.642 μs ± 1.255 μs 16.746 μs ± 574.249 ns 16.764 μs ± 545.844 ns 19.249 μs ± 1.664 μs 15.364 μs ± 1.745 μs 15.931 μs ± 347.373 ns
{"a" * 10 + str(i): "a" * 10 for i in range(100)} (sorted) 2000 21.560 μs ± 1.867 μs 22.631 μs ± 1.682 μs 22.990 μs ± 2.037 μs 24.288 μs ± 2.178 μs 21.980 μs ± 2.179 μs 23.018 μs ± 2.295 μs
{"a" * 1000 + str(i): "a" * 1000 for i in range(10)} 500 31.836 μs ± 1.909 μs 43.113 μs ± 1.143 μs 32.363 μs ± 2.925 μs 43.804 μs ± 3.606 μs 34.213 μs ± 4.043 μs 43.522 μs ± 1.748 μs
{"a" * 1000 + str(i): "a" * 1000 for i in range(10)} (sorted) 500 32.976 μs ± 822.254 ns 45.163 μs ± 4.429 μs 33.829 μs ± 3.523 μs 44.067 μs ± 945.833 ns 33.695 μs ± 3.187 μs 45.067 μs ± 3.510 μs
{"a" * 1000 + str(i): "a" * 1000 for i in range(100)} 50 323.762 μs ± 33.111 μs 525.391 μs ± 25.537 μs 339.531 μs ± 55.616 μs 581.457 μs ± 102.544 μs 327.819 μs ± 10.353 μs 540.379 μs ± 37.670 μs
{"a" * 1000 + str(i): "a" * 1000 for i in range(100)} (sorted) 50 457.764 μs ± 11.453 μs 578.741 μs ± 39.808 μs 456.773 μs ± 36.551 μs 570.697 μs ± 12.694 μs 462.546 μs ± 35.306 μs 586.032 μs ± 29.981 μs
{"a" * 1000: "a" * 1000} 5000 3.638 μs ± 111.291 ns 4.780 μs ± 260.507 ns 4.117 μs ± 938.619 ns 4.763 μs ± 137.732 ns 3.693 μs ± 407.699 ns 5.153 μs ± 872.270 ns
{"a" * 1000: "a" * 1000} (sorted) 5000 4.126 μs ± 168.282 ns 5.230 μs ± 165.105 ns 4.216 μs ± 200.370 ns 5.314 μs ± 360.869 ns 4.113 μs ± 98.996 ns 5.481 μs ± 178.557 ns
{"a" * 10: "a" * 10} 50000 559.787 ns ± 23.744 ns 630.646 ns ± 60.463 ns 658.801 ns ± 178.613 ns 596.396 ns ± 44.788 ns 591.999 ns ± 20.885 ns 573.709 ns ± 44.375 ns
{"a" * 10: "a" * 10} (sorted) 50000 1.069 μs ± 19.611 ns 1.090 μs ± 129.267 ns 1.079 μs ± 111.368 ns 1.090 μs ± 63.895 ns 1.087 μs ± 82.415 ns 1.101 μs ± 179.307 ns
{"a" + str(i): "a" for i in range(10)} 20000 1.595 μs ± 93.374 ns 1.756 μs ± 134.943 ns 1.843 μs ± 221.463 ns 1.868 μs ± 207.812 ns 1.707 μs ± 50.682 ns 1.844 μs ± 381.616 ns
{"a" + str(i): "a" for i in range(10)} (sorted) 20000 2.438 μs ± 62.248 ns 2.589 μs ± 148.294 ns 2.872 μs ± 537.920 ns 2.732 μs ± 204.015 ns 2.665 μs ± 557.911 ns 2.666 μs ± 563.152 ns
{"a" + str(i): "a" for i in range(100)} 2000 12.981 μs ± 1.230 μs 13.194 μs ± 481.495 ns 14.152 μs ± 1.295 μs 15.162 μs ± 1.900 μs 12.064 μs ± 1.040 μs 12.340 μs ± 347.782 ns
{"a" + str(i): "a" for i in range(100)} (sorted) 2000 17.664 μs ± 1.506 μs 18.241 μs ± 653.215 ns 20.911 μs ± 4.320 μs 19.302 μs ± 1.055 μs 18.069 μs ± 1.945 μs 18.367 μs ± 917.251 ns
{"a": "a"} 50000 597.640 ns ± 25.139 ns 498.800 ns ± 18.347 ns 611.569 ns ± 165.416 ns 602.315 ns ± 179.341 ns 494.121 ns ± 16.147 ns 560.490 ns ± 150.525 ns
{"a": "a"} (sorted) 50000 1.047 μs ± 146.205 ns 974.692 ns ± 32.447 ns 1.077 μs ± 94.959 ns 1.123 μs ± 42.643 ns 1.058 μs ± 113.001 ns 1.025 μs ± 90.689 ns
{"‽" * 10 + str(i): "‽" * 10 for i in range(10)} 10000 3.592 μs ± 108.392 ns 3.898 μs ± 474.092 ns 4.633 μs ± 1.578 μs 4.179 μs ± 174.091 ns 3.625 μs ± 246.603 ns 3.966 μs ± 464.331 ns
{"‽" * 10 + str(i): "‽" * 10 for i in range(10)} (sorted) 10000 5.227 μs ± 1.030 μs 5.808 μs ± 1.307 μs 5.329 μs ± 932.538 ns 5.700 μs ± 1.296 μs 4.628 μs ± 277.763 ns 5.035 μs ± 912.219 ns
{"‽" * 10 + str(i): "‽" * 10 for i in range(100)} 1000 31.508 μs ± 1.176 μs 32.719 μs ± 1.511 μs 33.604 μs ± 904.663 ns 35.264 μs ± 2.036 μs 31.745 μs ± 2.288 μs 32.682 μs ± 3.127 μs
{"‽" * 10 + str(i): "‽" * 10 for i in range(100)} (sorted) 1000 41.031 μs ± 850.535 ns 44.331 μs ± 8.925 μs 43.623 μs ± 988.271 ns 45.266 μs ± 3.913 μs 40.777 μs ± 1.192 μs 41.681 μs ± 2.454 μs
{"‽" * 1000 + str(i): "‽" * 1000 for i in range(10)} 100 139.376 μs ± 8.547 μs 216.094 μs ± 39.039 μs 138.553 μs ± 11.026 μs 200.044 μs ± 9.055 μs 140.704 μs ± 14.005 μs 199.116 μs ± 10.668 μs
{"‽" * 1000 + str(i): "‽" * 1000 for i in range(10)} (sorted) 100 223.860 μs ± 45.564 μs 228.023 μs ± 12.153 μs 237.516 μs ± 53.341 μs 272.447 μs ± 56.124 μs 261.851 μs ± 64.149 μs 291.881 μs ± 55.386 μs
{"‽" * 1000 + str(i): "‽" * 1000 for i in range(100)} 10 1.397 ms ± 121.731 μs 2.262 ms ± 160.086 μs 1.424 ms ± 210.156 μs 2.206 ms ± 117.800 μs 1.456 ms ± 279.166 μs 2.352 ms ± 391.154 μs
{"‽" * 1000 + str(i): "‽" * 1000 for i in range(100)} (sorted) 10 1.566 ms ± 175.532 μs 2.716 ms ± 511.366 μs 2.353 ms ± 863.657 μs 2.512 ms ± 118.144 μs 2.159 ms ± 464.589 μs 3.093 ms ± 790.857 μs
{"‽" * 1000: "‽" * 1000} 2000 13.441 μs ± 344.561 ns 14.856 μs ± 601.978 ns 13.384 μs ± 257.500 ns 14.919 μs ± 798.129 ns 13.713 μs ± 563.873 ns 14.943 μs ± 767.904 ns
{"‽" * 1000: "‽" * 1000} (sorted) 2000 14.193 μs ± 1.279 μs 15.375 μs ± 710.797 ns 14.075 μs ± 370.828 ns 15.560 μs ± 1.219 μs 16.631 μs ± 3.379 μs 17.480 μs ± 2.910 μs
{"‽" * 10: "‽" * 10} 50000 733.941 ns ± 27.110 ns 754.108 ns ± 45.325 ns 758.309 ns ± 27.925 ns 791.072 ns ± 60.025 ns 732.935 ns ± 49.456 ns 809.213 ns ± 47.058 ns
{"‽" * 10: "‽" * 10} (sorted) 50000 1.276 μs ± 91.393 ns 1.259 μs ± 118.681 ns 1.273 μs ± 35.804 ns 1.411 μs ± 236.450 ns 1.240 μs ± 47.544 ns 1.317 μs ± 83.957 ns
{"‽" + str(i): "‽" for i in range(10)} 10000 2.209 μs ± 51.946 ns 2.319 μs ± 42.217 ns 2.428 μs ± 32.124 ns 2.561 μs ± 97.825 ns 2.267 μs ± 274.004 ns 2.251 μs ± 55.288 ns
{"‽" + str(i): "‽" for i in range(10)} (sorted) 10000 3.425 μs ± 346.904 ns 3.515 μs ± 361.567 ns 3.584 μs ± 302.316 ns 3.608 μs ± 164.488 ns 3.280 μs ± 289.558 ns 3.339 μs ± 445.816 ns
{"‽" + str(i): "‽" for i in range(100)} 2000 19.128 μs ± 985.401 ns 19.783 μs ± 1.300 μs 21.647 μs ± 1.043 μs 22.034 μs ± 668.883 ns 18.926 μs ± 1.698 μs 19.609 μs ± 1.630 μs
{"‽" + str(i): "‽" for i in range(100)} (sorted) 2000 26.560 μs ± 3.053 μs 27.238 μs ± 1.672 μs 29.051 μs ± 2.024 μs 31.536 μs ± 5.686 μs 25.624 μs ± 534.098 ns 26.127 μs ± 1.471 μs
{"‽": "‽"} 50000 586.382 ns ± 23.397 ns 614.275 ns ± 19.908 ns 612.504 ns ± 11.466 ns 730.181 ns ± 153.693 ns 638.338 ns ± 39.827 ns 624.343 ns ± 18.022 ns
{"‽": "‽"} (sorted) 50000 1.061 μs ± 35.487 ns 1.131 μs ± 68.909 ns 1.140 μs ± 212.915 ns 1.134 μs ± 111.423 ns 1.059 μs ± 29.087 ns 1.049 μs ± 27.367 ns

The raw output of ./test is available here: https://transfer.archivete.am/F77fw/ujson-pr530-benchmark.jsonl

@JustAnotherArchivist JustAnotherArchivist marked this pull request as ready for review May 31, 2022 03:53
@hugovk hugovk added the changelog: Fixed For any bug fixes label Jun 1, 2022
@hugovk
Copy link
Member

hugovk commented Jun 1, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: Fixed For any bug fixes
Projects
None yet
5 participants