Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coverage 6.1.1 musllinux wheel performance issues #1268

Closed
reaperhulk opened this issue Oct 31, 2021 · 19 comments
Closed

coverage 6.1.1 musllinux wheel performance issues #1268

reaperhulk opened this issue Oct 31, 2021 · 19 comments
Labels
bug Something isn't working fixed

Comments

@reaperhulk
Copy link

reaperhulk commented Oct 31, 2021

With the release of coverage 6.1.1 coverage now ships musllinux_1_1 wheels so that users don't need to compile on musl-derived distributions. This is great! However, the cryptography project has seen a massive performance regression with these wheels in our CI.

coverage 6.1 compiled in an alpine-latest container running our test suite: 215.89s (0:03:35)
coverage 6.1.1 using the musllinux_1_1 wheel in the same container: 842.11s (0:14:02)

While the exact timings, of course, vary, the magnitude of slowdown is consistent. Locally I tested this by installing 6.1.1 with --no-binary coverage and the performance problem went away.

To Reproduce
This should be reproducible with alpine-latest (which has python 3.9 at this time) and any project with a reasonably large test suite, but you can directly reproduce it with cryptography's own runner if you'd like:

docker pull ghcr.io/pyca/cryptography-runner-alpine
docker run --rm -ti ghcr.io/pyca/cryptography-runner-alpine /bin/sh
<in the container>
git clone --depth=1 https://github.com/pyca/cryptography && cd cryptography && tox -e py39 -vvv
@reaperhulk reaperhulk added the bug Something isn't working label Oct 31, 2021
@reaperhulk
Copy link
Author

This will no longer reproduce with the above instructions because we landed pyca/cryptography#6511 as a workaround. To see the problem now you'll need to edit tox.ini to remove the install_command line before invoking tox.

@reaperhulk
Copy link
Author

I've now tested this by installing the wheel on alpine 3.12 (which runs musl 1.1 and is what the musllinux image uses as a base), alpine 3.13 (the first with musl 1.2) and alpine 3.14 (which has musl 1.2.2, same as 3.13). The performance issue only occurs in alpine 3.14, which is...baffling (and should not yet be considered entirely true. I've run a lot of permutations and could have made a mistake).

Happy to debug this further, although I'll need to figure out a reasonable profiling strategy for timing calls to the tracing func.

@nedbat
Copy link
Owner

nedbat commented Nov 1, 2021

I know nothing about the Linux underpinnings here, so I'm not sure how to help diagnose this. If there's something I should be doing differently in building distributions, I'm happy to make a change.

One difference between 6.1 and 6.1.1: the versions of cibuildwheel and packaging changed:

  • 6.1: cibuildwheel==2.1.3, packaging==21.0
  • 6.1.1: cibuildwheel==2.2.2, packaging==21.2

@reaperhulk
Copy link
Author

Okay, I've figured out the performance issue. What's occurring is that it fails importing the CTracer, but this code makes it silently fall back to the pure Python tracer:

try:
# Use the C extension code when we can, for speed.
from coverage.tracer import CTracer, CFileDisposition
except ImportError:
# Couldn't import the C extension, maybe it isn't built.
if os.getenv('COVERAGE_TEST_TRACER') == 'c': # pragma: part covered
# During testing, we use the COVERAGE_TEST_TRACER environment variable
# to indicate that we've fiddled with the environment to test this
# fallback code. If we thought we had a C tracer, but couldn't import
# it, then exit quickly and clearly instead of dribbling confusing
# errors. I'm using sys.exit here instead of an exception because an
# exception here causes all sorts of other noise in unittest.
sys.stderr.write("*** COVERAGE_TEST_TRACER is 'c' but can't import CTracer!\n")
sys.exit(1)
CTracer = None

Digging deeper, the issue appears to be with the naming of the tracer so file. In the musl wheel it is named tracer.cpython-39-x86_64-linux-gnu.so, but when building it myself I see tracer.cpython-39-x86_64-linux-musl.so. Simply installing the musl wheel and renaming the file also causes python -c "from coverage.tracer import CTracer" to import successfully.

I'm actually not familiar with the naming requirements for traditional CPython extensions (as all my work is done with cffi), but it appears there's some issue occurring in the build process (and auditwheel repair doesn't seem to catch it) such that the bundled shared object isn't named correctly.

@nedbat
Copy link
Owner

nedbat commented Nov 1, 2021

Oh, hmm, 6.1 didn't have musl wheels at all.

@reaperhulk
Copy link
Author

Some further info: Python 3.8 on Alpine 3.12 and 3.13 do successfully load the -gnu naming, but 3.9 on 3.14 does not.

@nedbat
Copy link
Owner

nedbat commented Nov 1, 2021

Seems like this should be reported upstream somewhere, but: cibuildwheel? Alpine? any thoughts?

@njsmith
Copy link

njsmith commented Nov 1, 2021

The thing where linux-gnu gets loaded on alpine seems like a CPython bug. Though since it's the latest alpine that's slow, possibly CPython already fixed it.

It would be good for auditwheel to check for this; that's an easy thing to do and likely to catch other buggy wheels. So it'd worth opening an issue there.

I'm not sure who's responsible for the wrong name in the first place. If I'm right that CPython fixed a bug here, then it might be you need to build with the CPython that has the bug fix. (CPython keeps some internal metadata tables about stuff like "how should extension modules be named". So my hypothesis is that old CPython had the wrong metadata on alpine, which leads to both creating the wrong name at build time and searching for the wrong name at import time.)

@nedbat
Copy link
Owner

nedbat commented Nov 1, 2021

OK, that's two more candidates for upstream bug reports: auditwheel and CPython. I don't know how to untangle this.

@njsmith
Copy link

njsmith commented Nov 1, 2021

The auditwheel bug report is orthogonal to everything else -- it won't fix the problem, but it should be done in any case to stop the problem from spreading or coming back.

It looks like I was right about CPython having the wrong metadata. With the official musllinux build image, I get a gnu tag:

$ docker run --rm quay.io/pypa/musllinux_1_1_x86_64 /opt/python/cp39-cp39/bin/python -c "import sysconfig; print(sysconfig.get_config_var('SOABI'))" 
cpython-39-x86_64-linux-gnu

And with alpine 3.13 I get a gnu tag:

$ docker run --rm alpine:3.13 sh -c "apk add python3; python3 -c 'import sysconfig; print(sysconfig.get_config_var(\"SOABI\"))'"
[...]
OK: 53 MiB in 24 packages
cpython-38-x86_64-linux-gnu

But with alpine 3.14 I get a musl tag:

$ docker run --rm alpine:3.14 sh -c "apk add python3; python3 -c 'import sysconfig; print(sysconfig.get_config_var(\"SOABI\"))'"
[...]
OK: 56 MiB in 27 packages
cpython-39-x86_64-linux-musl

The Docker-maintained "official" Python images also have a gnu tag:

$ docker run --rm python:alpine sh -c "python3 -c 'import sysconfig; print(sysconfig.get_config_var(\"SOABI\"))'"
cpython-310-x86_64-linux-gnu

So either CPython fixed something, or else Alpine is hacking their python package to fix it and that hack should go upstream.

And also probably most musl-based python docker images need to revved to get the new tags, including the official musllinux build image, the official docker "python" images that use alpine, and cryptography build images, ...

And maybe pip should detect when it's running on a python with a broken SOABI tag, and refuse to install musllinux wheels?

@njsmith
Copy link

njsmith commented Nov 1, 2021

@reaperhulk
Copy link
Author

I've continued the discussion on discuss.python.org, but @nedbat I'd suggest deleting the musllinux wheels for at least 3.9 for the moment. As is, users on Alpine 3.14 (the current latest) who install the distribution Python will experience this problem. Deleting the wheel will require everyone using 3.9 on Alpine to compile it themselves, but that was the status quo until a day ago anyway 😄

Other versions of the wheel actually work fine, so while the soabi naming is wrong you could leave them up for now if you wanted.

@nedbat
Copy link
Owner

nedbat commented Nov 2, 2021

I appreciate the detailed guidance. I've delete the three 3.9 musl wheels.

@nedbat nedbat added this to the Next milestone Nov 4, 2021
@nedbat
Copy link
Owner

nedbat commented Nov 5, 2021

I would like to try to test for these sorts of problems, but it seems impractical. I can add a simple check to ci_buildwheel, but it wouldn't catch this particular problem unless it was run on Alpine, and a specific version of Alpine. Any advice?

@reaperhulk
Copy link
Author

This is, unfortunately, not the sort of case that I would expect to be caught even in a very rigorous CI, since it's the interaction of wheels with distro versions they weren't originally built on and musllinux/manylinux exist precisely to allow that. To catch this type of bug you'd need to modify your wheel build process such that you store the artifacts, fetch them, and import them (with COVERAGE_TEST_TRACER=c) on a wide variety of later Linux distributions, and only when they succeed finally upload them to PyPI. That doesn't seem like a tenable solution 😞

@nedbat
Copy link
Owner

nedbat commented Nov 6, 2021

OK, that's what I thought. I guess for now I will simply avoid 3.9 musllinux wheels.

@nedbat
Copy link
Owner

nedbat commented Nov 6, 2021

Commit d538c55 removes the 3.9 musllinux wheels during building. I don't know how I'll know when it's safe to put them back...

@henryiii
Copy link

This has been fixed, Alpine 3.14 and yesterday's 3.15 now load musllinux wheels, so it should be safe to include again. This was due to an unsupported patch to Python on Alpine, and was not an issue with musllinux. If anyone is using Alpine over a longer period of time, they will need to upgrade to the latest patched Python package.

@nedbat nedbat added the next label Nov 25, 2021
@nedbat
Copy link
Owner

nedbat commented Jan 25, 2022

I've restored those wheels in commit cfe14c2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed
Projects
None yet
Development

No branches or pull requests

4 participants