Generalize benchmarks #532

Erotemic · 2022-04-20T15:45:27Z

In an effort to procrastinate on what I really need to be doing I did a rework of the existing benchmarks script, and I added a new one based on my timerit module (https://github.com/Erotemic/timerit/).

It's still a work in progress, because I do have to stop procrastinating, but here is a teaser plot that visually compares implementation performance on specific tasks over a range of sizes. Error bars included.

This branch is currently based on another PR, but I do plan to clean it up / factor out additional dependencies if that is desired.

To make the nice plots it will depend on seaborn and pandas. For timing, I do want to use timerit instead of timeit as it makes the benchmarks much easier to write. I could factor out ubelt, but it's small and I do find it useful.

The main dependency that could be excluded is openskill, which is used to generate overall probability estimates that one library is faster than another. (Although currently nujson is slightly beating ujson, but maybe that's just because my PR has a slowdown). Part of the point of this is to determine when a patch introduces a performance regression.

This allows surrogates anywhere in the input, compatible with the json module from the standard library. This also refactors two interfaces: - The `PyUnicode` to `char*` conversion is moved into its own function, separated from the `JSONTypeContext` handling, so it can be reused for other things in the future (e.g. indentation and separators) which don't have a type context. - Converting the `char*` output to a Python string with surrogates intact requires the string length for `PyUnicode_Decode` & Co. While `strlen` could be used, the length is already known inside the encoder, so the encoder function now also takes an extra `size_t` pointer argument to return that and no longer NUL-terminates the string. This also permits output that contains NUL bytes (even though that would be invalid JSON), e.g. if an object's `__json__` method return value were to contain them. Fixes ultrajson#156 Fixes ultrajson#447 Supersedes ultrajson#284

[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Fix compiler warnings Update python/objToJSON.c Co-authored-by: JustAnotherArchivist <JustAnotherArchivist@users.noreply.github.com> camelCase whitespace Update tests/test_ujson.py Co-authored-by: JustAnotherArchivist <JustAnotherArchivist@users.noreply.github.com>

for more information, see https://pre-commit.ci

…nint_indent

for more information, see https://pre-commit.ci

bwoodsend · 2022-04-20T17:28:59Z

I don't see anything wrong with the benchmark picking up dependencies for graphics or for doing the timing (although it's intersting to see yet another timeit but not as awkward as timeit library floating around - I'll have to see if I like it more than sloth). Happy procrastinating!

for more information, see https://pre-commit.ci

JustAnotherArchivist · 2022-05-19T04:46:30Z

I like pretty graphs. :-)

Just so it isn't missed later, this PR currently includes commits from #518 and #530 and other functional changes which should be removed before this gets merged.

Erotemic · 2022-05-30T15:05:02Z

Closed in favor of #542

JustAnotherArchivist and others added 22 commits April 17, 2022 03:49

Allow str and None values for indent

92b044d

Use older PyObject_call API

5c05078

Debug code

d82654b

Differentiate integer vs explicit indent

4c68a0b

remove printf

993b262

Use PyUnicode_AsEncodedString

4ae8a0b

Use PyUnicode_AsEncodedString

70e9085

Enable all agree checks

6861525

[pre-commit.ci] auto fixes from pre-commit.com hooks

183d863

for more information, see https://pre-commit.ci

remove flake8 long lines

fa84e27

Merge branch 'nonint_indent' of github.com:Erotemic/ultrajson into no…

84ca3d1

…nint_indent

remove compat tests

5853a4f

Fix negative allocation

ed8de3c

remove non portable min/max

520f2ec

Remove max in favor of indentEnabled

1e0885d

[pre-commit.ci] auto fixes from pre-commit.com hooks

2cc3544

for more information, see https://pre-commit.ci

Fix -1 length issue and revert extra spaces in tests

75895fc

Generalize the way new json modules can be added to existing benchmarks

35bb31a

Proof of concept for graphical benchmarks

cbd3019

[pre-commit.ci] auto fixes from pre-commit.com hooks

9864596

for more information, see https://pre-commit.ci

Erotemic and others added 3 commits April 21, 2022 15:25

Cleanup openskill code

85260e6

Fix merge issue

75cd254

[pre-commit.ci] auto fixes from pre-commit.com hooks

1ccacdd

for more information, see https://pre-commit.ci

Erotemic mentioned this pull request Apr 21, 2022

Benchmark refactor - argparse CLI #533

Merged

Erotemic mentioned this pull request May 26, 2022

Benchmark stats v2 #542

Open

3 tasks

Erotemic closed this May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize benchmarks #532

Generalize benchmarks #532

Erotemic commented Apr 20, 2022

bwoodsend commented Apr 20, 2022

JustAnotherArchivist commented May 19, 2022

Erotemic commented May 30, 2022

Generalize benchmarks #532

Generalize benchmarks #532

Conversation

Erotemic commented Apr 20, 2022

bwoodsend commented Apr 20, 2022

JustAnotherArchivist commented May 19, 2022

Erotemic commented May 30, 2022