Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always install pre-built wheels #34444

Open
kdmccormick opened this issue Mar 28, 2024 · 3 comments
Open

Always install pre-built wheels #34444

kdmccormick opened this issue Mar 28, 2024 · 3 comments
Labels
code health Proactive technical investment via refactorings, removals, etc.

Comments

@kdmccormick
Copy link
Member

Originally written up by @jmbowman over here: edx/edx-arch-experiments#177


There are a few reasons we would prefer to always install Python packages as wheels rather than source tarballs:

  • Faster, especially for packages with C or Rust extensions but even for pure Python packages
  • Enables us to eliminate hundreds of MB of compilation tools from our Docker images
  • More secure, by removing opportunities for arbitrary code execution during installation

However, not all of our dependencies have wheels on PyPI for the versions we use. And of those that do, not all of them have all of the process architectures we would like to support. But this can be worked around. Tentative proposal:

  • Document the set of binary formats (OS/architecture combinations) we want to support
  • Make sure that all of the Open edX packages get all of the appropriate wheels pushed to PyPI with each release. This will generally be a single universal wheel unless the package contains native code extensions.
  • Set up at least one package server for 2U to host wheels for its private code and dependencies which don't have all the needed wheels on PyPI. Ideally there would be a second managed by tCRIL to host just the missing dependency wheels for the benefit of the entire Open edX community.
  • Create a repository and Docker image(s) with all of the dependencies needed to build the missing dependency wheels. This could also be used in development to try out new packages with missing wheel variants on PyPI, building them here and then supplying them to the dev environment's devpi instance.
  • Update pip configuration to allow installation from the custom package server(s)
  • Remove native compilation tooling and development header packages from all the other Docker images.

Some of the benefits could be obtained by instead/also using a builder pattern for Dockerfiles such that the compilation toolchain isn't actually in the images used for development and production, but that leaves some of the security risk, a longer image build time, and occasional hassles in development when installing a new package with a missing binary wheel.

Some package server options:

Some tooling that helps build and upload wheels:

@kdmccormick
Copy link
Member Author

Make sure that all of the Open edX packages get all of the appropriate wheels pushed to PyPI with each release. This will generally be a single universal wheel unless the package contains native code extensions.

For ~95% of Open edX repos, would this be as simple as setting {'bdist_wheel':{'universal':'1'}}?

If so, would that on its own be worth doing, whether or not we go for the rest of the issue?

Create a repository and Docker image(s) with all of the dependencies needed to build the missing dependency wheels. This could also be used in development to try out new packages with missing wheel variants on PyPI, building them here and then supplying them to the dev environment's devpi instance.

I think I would want to look at a list packages that are missing dependency wheels so we can get an idea of how much build time & image size we'd save. If it'd be significant, then I'm in favor of this.

Some of the benefits could be obtained by instead/also using a builder pattern for Dockerfiles

FWIW, Tutor and most of its plugins use the builder pattern, yet I still want to bring the build time and image size down.

@kdmccormick
Copy link
Member Author

@jmbowman :

I think we'd want to start with a repo health check that checks PyPI for wheel availability, that would probably go a long way towards answering your questions. I suspect it's only around 2% of our package dependencies that have any native code extensions, which is why it feels silly to me that we're bloating our build process because of those.

@kdmccormick kdmccormick added the code health Proactive technical investment via refactorings, removals, etc. label Mar 28, 2024
@kdmccormick kdmccormick changed the title Alway install pre-built wheels Always install pre-built wheels Mar 28, 2024
@jmbowman
Copy link
Contributor

For ~95% of Open edX repos, would this be as simple as setting {'bdist_wheel':{'universal':'1'}}?

We should already be doing this for most of them, we just need to be consistent about it. The need is for both configuration like https://github.com/openedx/django-user-tasks/blob/master/setup.cfg#L1-L2 and working automated release uploads like https://github.com/openedx/django-user-tasks/blob/master/.github/workflows/pypi-publish.yml . (And actually doing releases instead of linking to commits on GitHub.)

If so, would that on its own be worth doing, whether or not we go for the rest of the issue?

Yes, every package made available as a wheel that wasn't before should speed things up just a little bit more. A lot more in the case of the handful of (mostly upstream) packages with nontrivial native code that needs to be compiled.

I think we'd want to start with a repo health check that checks PyPI for wheel availability, that would probably go a long way towards answering your questions. I suspect it's only around 2% of our package dependencies that have any native code extensions, which is why it feels silly to me that we're bloating our build process because of those.

A couple of repo health checks could definitely help us be more consistent about this, especially if combined with edx/edx-arch-experiments#231 . And the main thing that's been forcing us to keep those build dependencies has been the lack of an official process for building those 3rd-party wheels and a public place to store them. I wanted to set that up for a long time, but it just never rose to the top of 2U's priority list.

Another reason to do this and extend it even to upstream packages that do have wheels on PyPI is protection against the sudden removal of dependencies from PyPI. This actually bit us a few times, and usually resulted in at least a half-day fire drill where a few people had to drop everything, fork the repo, get the fork released on PyPI, and untangle enough dependencies to make everything depend on the fork's name instead. (And deployments are impossible until all that gets sorted out.) That problem goes away if we maintain a package server that nobody else has permission to delete releases from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code health Proactive technical investment via refactorings, removals, etc.
Projects
None yet
Development

No branches or pull requests

2 participants