Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] [WiP Draft] Performance gate prototype #1825

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

vytas7
Copy link
Member

@vytas7 vytas7 commented Dec 20, 2020

This is an early prototype of how implementing the https://github.com/pythonspeed/cachegrind-benchmarking approach could look like for Falcon.

The prototype builds upon this excellent article by @itamarst: https://pythonspeed.com/articles/consistent-benchmarking-in-ci/ (thanks @njsmith for kindly pointing to this idea).

Closes #1450

To do if we want to proceed with this:

  • ASGI performance metric
  • Barebones "Hello, World!" performance metric
  • Media performance metric
  • Routing performance metric
  • URL params and headers performance metric
  • Clean up tox environments, requirements etc
  • Add Cython support
  • Run Cython gates for master only (?)
  • Add PyPy support, master only (?); probably only informational?

@vytas7 vytas7 marked this pull request as draft December 20, 2020 19:08
@codecov
Copy link

codecov bot commented Dec 20, 2020

Codecov Report

Merging #1825 (30d2886) into master (29b05ed) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##            master     #1825   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           54        54           
  Lines         5154      5154           
  Branches       831       831           
=========================================
  Hits          5154      5154           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29b05ed...30d2886. Read the comment docs.

@itamarst
Copy link

Excited to see this, and to see if it turns out to be useful. I'm going to be setting up something similar for one of my projects, so will share what I come up with if this is still in progress, or maybe steal from you, depending 😀

@itamarst
Copy link

Note that I've found a bug in the cachegrind.py script calculation, so you'll want to pull a new version once I've updated it (tomorrow hopefully).

@vytas7
Copy link
Member Author

vytas7 commented Dec 21, 2020

Heh, thanks for heads up @itamarst !
I did experience that the cachegrind.py metric was a lot (1-2 orders of magnitude) noisier than the instruction count from valgrind, if that is what would be revised.
But maybe it is a bit noisier in its nature, so I didn't pay too much attention to it.

@vytas7
Copy link
Member Author

vytas7 commented Dec 21, 2020

@itamarst Btw, another thing I was really surprised when toying with this approach, was how much "rogue" Python hash seeds can affect performance 🙂

@itamarst
Copy link

Yeah, you really want to set a fixed PYTHONHASHSEED for consistency.

The issue was that I was counting L3 hits wrong, if something hit RAM I also counted this as hitting L3 (i.e. L3 hits were too high). I am not sure if this will have much impact on the noisiness though.

@itamarst
Copy link

OK, https://github.com/pythonspeed/cachegrind-benchmarking has been updated.

@vytas7
Copy link
Member Author

vytas7 commented Dec 22, 2020

Thanks @itamarst , I'll check that out!

@vytas7
Copy link
Member Author

vytas7 commented Dec 26, 2020

@itamarst thanks again for the update.
The noisiness is gone, and the variation of the least-squares linear regression fitting error is now low beyond belief 💯 .
In fact, I'm now getting exactly the same cost of one iteration in two different CI runs (within at least 9 significant digits!).

@itamarst
Copy link

I got benchmarks working. Beyond what is in original repository:

  1. I store results as pretty-printed, sorted JSON file on disk. Expectation is that developer runs this locally and checks in result, but I'm single-person project.
  2. On every PR, I run the benchmarks again and add diff as GitHub comment: https://github.com/pythonspeed/filprofiler/blob/master/.github/workflows/main.yml#L120
  3. In addition to setting PYTHONHASHSEED and figuring out equivalent fixed-seed for Rust, I ended up using Conda environments to reduce noise between my machine and GitHub Actions machines (so e.g. it's the same Python binary, rather than different dot-versions or different compilers etc.). The result is quite consistent on my machine, and a little noisy between my machine and VMs, but better than it would be without Conda.

Example output: pythonspeed/filprofiler#110

@kgriffs kgriffs mentioned this pull request Aug 4, 2021
14 tasks
@vytas7
Copy link
Member Author

vytas7 commented Mar 13, 2022

This has been rotting for so long due to the lack of my bandwidth, that I'm thinking to wait a couple of weeks more and then migrate this to Ubuntu 22.04 + CPython 3.10, to have the same gauge for a longer time.

I'm hoping to circle back on this shortly after we release 3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add basic performance testing to the gate
3 participants