New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark refactor - argparse CLI #533
Conversation
for more information, see https://pre-commit.ci
… into benchmark-flexibility
Codecov Report
@@ Coverage Diff @@
## main #533 +/- ##
=======================================
Coverage 91.75% 91.75%
=======================================
Files 6 6
Lines 1819 1819
=======================================
Hits 1669 1669
Misses 150 150 Continue to review full report at Codecov.
|
> python -VV
Python 3.10.1
> python -c 'import cpuinfo'
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'cpuinfo' Are you sure it's standard lib? |
for more information, see https://pre-commit.ci
I suppose I was mistaken. It looks like cpuinfo is coming from torch. It's not important enough to add a dependency. I'll remove that. |
… into benchmark-flexibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a marginally nicer way to eliminate the global LIBRARIES
variable would be to have a class which takes libraries
as a parameter and stores it as an attribute, then make all the functions that now take a libraries
parameter methods of that class instead. But only marginally nicer. Honestly, if I were that allergic to OOP-destroying global variables then I would have abandoned Python altogether in favour pedantically object oriented, but otherwise underwhelming at every turn, Java 🙃 .
I was considering a class that stored the parameters as an alternative. But as marginally nice as that is, this script isn't run as part of a larger system, so the globals only bother me on an aesthetic level, they don't have any major down side that I can see. When I get around to working on #532 go get nice graphs for the benchmarks (and hopefully t-tests to indicate when there is a high probability regression), I'll be rewriting most of this, so I figured I'll just keep the changes here minimal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Oops, this failed on the CI: https://github.com/ultrajson/ultrajson/runs/6154659419?check_suite_focus=true Please could you check it? Let's also add this to run the benchmark on PRs changing the benchmark script or workflow file: on:
push:
branches:
- main
pull_request:
paths:
- ".github/workflows/benchmark.yml"
- "tests/benchmark.py" |
@bwoodsend This is a paired down version of #532 that is ready for review / merging. There are two main features added here:
Uses the standard library's(whoops)cpuinfo
module to print CPU details when reporting the results of the benchmarks. This will help ensure that users don't compare incomparable data and can serve as a reminder for what benchmarks can be compared against each other.Adds an argparse CLI so the user can specify specific modules to disable. I personally don't care about the other json modules that aren't a drop-in replacement for Python's json module. Also the user can specify a
--factor
as a fraction to speed up the benchmarks for easier hacking.To add these two features I had to do a little bit of refactoring. I wanted to make it easy to add / remove any new implementation that might come onto the scene, so I registered each benchmark with it's associated library name using a decorator and am passing the list of library names that the user requested to test to the benchmark functions. (I was very tempted to factor out the global variables but I restrained myself).
Speaking of global variables, I made it such that the external libraries aren't imported until the script knows they are needed. I use
importlib
to implement this as a loop, and then I set those modules as global variables so the rest of the benchmark structure can work without modification.Lastly, I added a note about what units the benchmarks were in, because I always confuse myself with that.