Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add details on native packaging requirements exposed by mobile platforms #27

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

freakboy3742
Copy link
Contributor

As part of the our work on the BeeWare project, we've discovered that packaging native Python binaries for mobile platforms (iOS and Android) have a collection of requirements and oddities that seem to be in scope for what this project is attempting to capture.

If this is part of a larger effort to reconsider and standardize Python packaging of native modules, I'd like to ensure that the needs of mobile platform packaging are captured as part of that effort.

To that end, I've attempted to summarize the things that we've learned through our work that are specific to supporting iOS and Android binaries. We've been able to hack together solutions that work well enough; however, we're interested in any effort to make those hacks part of a more consolidated and standardised effort.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking the time to write this up!

I don't make the decisions here, but just from my POV:

  • cross-compilation is very likely worth a key-issue page
  • I'm less sure about multiple architectures, it seems very niche to me (especially if we cover cross-compilation which solves the problem almost fully as long as you compile for each architecture separately), but maybe I don't know the mobile story well enough. Perhaps you can enlighten me further?

docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
docs/key-issues/cross_platform.md Outdated Show resolved Hide resolved
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @freakboy3742! My current impression is:

  • Cross-compiling is worth a separate page indeed, there's quite a few issues with it and improvements to the current state will be very helpful.
  • I'm not sure what the actual problems are with multi-architecture support or for Beeware. Being more explicit about that and perhaps cross-linking some relevant issues on the Beeware issue tracker would be very helpful.

As a big picture question for Beeware: do you just need wheel tags to make builds work better, or do you need something more? And in general, what's wrong with the "archiving" approach - it seems like the most general and sensible one.

docs/key-issues/cross_platform.md Outdated Show resolved Hide resolved
docs/key-issues/cross_platform.md Outdated Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
app store.

When an end-user requests the installation of an app, the app store strips out the
binary that is appropriate for the end-user's device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice solution. After reading this, I'm wondering what the actual problem here is from Beeware's perspective?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Android case, it's not really a problem per se; its more of a "consideration to keep in mind".

Let's say I have an Android project that uses a Python package A that has a binary module; that package has a dependency on B (which also has a binary module). When I do the compilation pass for ARM64, I pip install A, which does a dependency resolution pass to determine a compatible version of B. I then do a compilation pass for ARMv7, which does a separate dependency resolution pass to determine a compatible version of B. The use of a separate dependency resolution pass introduces the possibility that my ARMv7 and ARM64 binaries have entirely different versions of package B (and, I guess, potentially package A as well).

That's not a problem; it could even be argued as "working as designed". It's how BeeWare (strictly, the Chaquopy subsystem that Briefcase uses) is working at present; it just might be a little surprising if you don't have a systematic way of at least flagging that there are different versions installed for each platform. I'll modify the wording to clarify this.

However, for the same project on iOS, this isn't an option. iOS apps perform one compilation pass per ABI (i.e. one for the simulator, and one for the physical device), so the final installed artefact needs to contain a single fat binary with all the architectures. This could be treated as an install-time problem rather than distribution problem, though; I'll add a note about that possible approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a good and concrete example. It sounds like it's only a problem with pip/PyPI - any other package relevant package manager gets all the metadata first and then does a single solve for all dependencies, so you'd get a single version of package B.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow why it's only a pip/PyPI problem. The issue isn't the solution path; it's how many times you need to run the installer (and thus the solver). If I solve and install for arm64, then solve and install for x86-64, that's 2 independent solutions, and any inconsistency in availability of packages for arm64 and x86-64 will result in different solutions. AFAICT, this will still be an issue for conda, as the issue isn't the availability of metadata for a single platform; it's the availability of the solution arrived at by a completely independent solver pass.

You can only avoid this problem if you do a single resolver pass looking to satisfy both architectures at the same time, and only install once a single solution that works for both architectures is found (or, I guess, you have some ways to pass a specific solution found by pass 1 into subsequent passes in a way that doesn't include package hashes, as you'll be installing a different package with the same name and version, but different ABI/architecture).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, thank makes sense, thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the issue isn't the availability of metadata for a single platform; it's the availability of the solution arrived at by a completely independent solver pass.

conda metadata is available offline, unlike pip where every resolver run is a series of API calls followed by downloading artifacts followed by extracting artifacts followed by yet more API calls. So you actually can guarantee multiple runs of the resolver will produce the same results.

If the metadata is guaranteed to be consistent across architectures at any given moment (e.g. packages of any given version will always exist for all architectures) then you can also get the same resolver result across architectures too. That's a social problem -- it depends on whether the ecosystem allows "partial releases".

or, I guess, you have some ways to pass a specific solution found by pass 1 into subsequent passes in a way that doesn't include package hashes

pip freeze?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you actually can guarantee multiple runs of the resolver will produce the same results.

In principle, yes. But I don't think either conda or mamba have this capability, so in practice it'd be pretty hard today.

Copy link
Contributor

@mhsmith mhsmith Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chaquopy actually doesn't run multiple dependency resolution passes; it works like this:

  • Run pip normally for the first architecture: it doesn't matter which one.
  • Use the .dist-info metadata to identify the native packages: both those directly requested by the user, and those installed as indirect requirements by pip.
  • Separate the native packages from the the pure-Python packages.
  • Run pip again for each of the remaining architectures. But this time, we install only the native packages, pin them all to the same versions that pip selected for the first architecture, and add the option --no-deps.

So if a package isn't available in the same version for all of the app's architectures, the build will fail. Since we build all our Android wheels ourselves, this hasn't yet been an issue.

The end result is one directory tree for pure-Python packages, and one directory tree per architecture for native packages, all of which are guaranteed to have the same versions. Those directory trees are then packaged into the APK in a way that allows them to be efficiently accessed at runtime.

@rgommers rgommers added the content PRs and issues related to content for a website label Jan 10, 2023
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to pivot this PR/review a bit away from the more abstract discussion and Linux vs. macOS binary distribution models. The content on this site is written focused on very concrete problems that package authors, packagers, and end users have. Not everyone has to agree on a more abstract framing.

@freakboy3742 each key issue page is written according to this template: https://github.com/pypackaging-native/pypackaging-native/blob/main/utils/template_key_issue.md, it would be great if you could use that. The "Current state" is the bulk of the text, "Problems" is more a short summary of actual pain points.

To make this as concrete as possible, cross-linking relevant issues/discussions would be good. Also, please mention Beeware explicitly (perhaps in an Example: ... frame as used on pages like https://pypackaging-native.github.io/key-issues/abi/#current-state?).

The main audience here is folks involved in Python packaging - they may not work with compiled code much or at all, so the more concrete and explicit, the better.

Here is my current understanding (probably incomplete):

  • The most important pain point was/is about building binaries. Until recently, build system support was severely lacking.
    • Now that we have new build backends and Beeware can use either Meson or CMake as well-designed build systems with full support for cross-compilation, it's less clear how much is still missing though.
    • You mentioned wheel tags for multi-arch support. Anything else along that line? Expressing build-time dependencies in a way that works for cross-compiling perhaps?
  • Distribution seems to be less of an issue (right?) - the app stores are the distribution path, and PyPI isn't relevant here (it doesn't allow these kinds of wheels).
  • Dependency resolution the way Pip does it, with backtracking, per architecture may result in suboptimal results (your package A/B example with two versions B in the final resolver solution).

@freakboy3742
Copy link
Contributor Author

@freakboy3742 each key issue page is written according to this template: https://github.com/pypackaging-native/pypackaging-native/blob/main/utils/template_key_issue.md, it would be great if you could use that. The "Current state" is the bulk of the text, "Problems" is more a short summary of actual pain points.

My apologies. I noticed some similarities between pages, but didn't notice the template page. I'll update my contributions to conform to that structure.

To make this as concrete as possible, cross-linking relevant issues/discussions would be good. Also, please mention Beeware explicitly (perhaps in an Example: ... frame as used on pages like https://pypackaging-native.github.io/key-issues/abi/#current-state?).

Happy to mention BeeWare specifically as an exemplar of how these problems manifest. There won't be many active issues to reference, as we've built workarounds for most of these problems when they've occurred, but I'll link in documentation/discussions as possible.

FWIW, my motivation for documenting this is to (eventually) be able to remove those workarounds in BeeWare's code (to the extent possible).

The main audience here is folks involved in Python packaging - they may not work with compiled code much or at all, so the more concrete and explicit, the better.

Here is my current understanding (probably incomplete):

  • The most important pain point was/is about building binaries. Until recently, build system support was severely lacking.

    • Now that we have new build backends and Beeware can use either Meson or CMake as well-designed build systems with full support for cross-compilation, it's less clear how much is still missing though.

Having build backends that acknowledge the existence of cross-compilation definitely addresses much of the problem (or, at least, defers it to a problem on a specific build system, rather than "whole of ecosystem" issue).

  • You mentioned wheel tags for multi-arch support. Anything else along that line? Expressing build-time dependencies in a way that works for cross-compiling perhaps?

Those are all possibly required - however, the pieces that are missing depend on the ecosystem-level solution. Two possibilities:

  1. Wheels are always single ABI, single architecture, and it's up to deployment solutions to perform any multi-arch, multi-ABI integration that may be required; or
  2. Wheels may be multi ABI or multi-architecture (or both), and build systems/package repositories need to be able to accomodate those artefacts.

The universal2 option used by macOS is effectively (2).

I'll elaborate on this topic in the document.

  • Distribution seems to be less of an issue (right?) - the app stores are the distribution path, and PyPI isn't relevant here (it doesn't allow these kinds of wheels).

So - we need to be careful about terminology here.

There's two types of distribution potentially involved here - "Library" distribution, and "app" distribution. Again, I'll elaborate in the document.

  • Dependency resolution the way Pip does it, with backtracking, per architecture may result in suboptimal results (your package A/B example with two versions B in the final resolver solution).

Correct.

I don't think there are any other major issues; I'll update my PR draft to reflect the expected structure, and to elaborate on the issues I've flagged above.

…ic references to prior art and existing discusions.
@freakboy3742
Copy link
Contributor Author

@rgommers Updates pushed. Not sure if I'm still too verbose in the "problems" section, and some of those details should be shifted into the "current state" section. Happy to take another swing at revisions if there's anything that needs more (or less!) detail.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another pass, had some comments, but looks pretty good overall already! :)

docs/key-issues/cross_platform.md Outdated Show resolved Hide resolved
docs/key-issues/cross_platform.md Outdated Show resolved Hide resolved
docs/key-issues/cross_platform.md Outdated Show resolved Hide resolved
docs/key-issues/cross_platform.md Outdated Show resolved Hide resolved
docs/key-issues/cross_platform.md Outdated Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
Copy link
Contributor

@mhsmith mhsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more typo: the coverage of Android otherwise looks fine now.

docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
freakboy3742 and others added 4 commits January 17, 2023 07:37
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
Co-authored-by: Malcolm Smith <smith@chaquo.com>
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
@leofang
Copy link
Contributor

leofang commented Jan 18, 2023

Thank you for the nice write-ups, Russell!

One note I can contribute to the multi-arch section is that it starts becoming a concern that for aarch64 we can't differentiate SBSA from any non-SBSA variants (whatever it is), and so we can't distribute Python wheels for each aarch64 flavor; that is, the tag linux-aarch64 alone is not enough. This is very bad especially because the latter often means embedded-/edge- computing devices that seriously rely on highly optimized (sometimes shrunk-down) prebuilt binaries.

On conda-forge, this issue can be worked around by the arm-variant mutex, but this is not applicable to wheels, obviously. And this is not honored as first-class citizen by conda.

@freakboy3742
Copy link
Contributor Author

One note I can contribute to the multi-arch section is that it starts becoming a concern that for aarch64 we can't differentiate SBSA from any non-SBSA variants (whatever it is), and so we can't distribute Python wheels for each aarch64 flavor; that is, the tag linux-aarch64 alone is not enough.

My experience with ARM64 is limited to what is on macOS and iOS, so I'm not familiar with the SBSA variants - is the requirement here substantially different from what you get on x86 with SSE et al instruction set variants? i.e., "pure" CPU architecture is a necessary, but not sufficient specifier of the complete CPU architecture? Is the "full" CPU architecture something that can be specified (even a new nomenclature, like "aarch64.sbsa", needs to be invented for that purpose)?

@freakboy3742
Copy link
Contributor Author

@rgommers A gentle bump on this one - is there anything else you'd like to see by way of revisions to this?

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping @freakboy3742, and apologies for the delay. My two week holiday has exploded my backlog. I'll try to get this merged very soon. I just re-read the cross-compilation page and it's very close. I'll like to push a bit more content (history, current needs, etc.) about cross-compilation needs for projects like NumPy and systems like Buildroot and Yocto.

Regarding solutions, I think sysconfig is the main thing that needs fixing on the Python side - taking away the need to run Python code indeed. In terms of standardization, a PEP 517 like thing is probably step 2; first class support in build backends like meson-python and scikit-build-core` is an easier hurdle to take (and mostly present already).

For the macOS case, I'd like to start with producing thin arm64 wheels, rather than fat binaries - that is much more important. Luckily the macOS cross-compilation thing is very easy compared to doing that for any other platform.

I'll try to do this over the weekend.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freakboy3742 thanks for your patience! I pushed an update to the cross compilation page with both copy-edits and significant new content. Could you have a look? I'm happy with that page now and am ready to merge it.

I'd like to review the page on multiple architectures separately. It seems there are some review comments left, and in the macOS/universal2 part (the one part I'm quite familiar with) I'd like to make some updates.

docs/key-issues/cross_platform.md Outdated Show resolved Hide resolved
@freakboy3742
Copy link
Contributor Author

@rgommers Apologies for the delay - I was at a conference last week. I've just pushed updates to cover most of your review notes; there's one or two points (comments inline) where there's possibly some more discussion required.

@rgommers
Copy link
Member

@rgommers Apologies for the delay - I was at a conference last week. I've just pushed updates to cover most of your review notes; there's one or two points (comments inline) where there's possibly some more discussion required.

No worries at all, thanks for the updates Russell! Almost there🤞🏼.

for more on that).
- When a project provides thin wheels (which is a must-do for projects with
native code, because those are the better experience due to smaller
size), you cannot even install a `universal2` wheel with pip from PyPI at
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, subjective, use-case specific language. In my use case, universal2 wheels unequivocally provide a better experience.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"because those are the better experience with pip install into an active environment" perhaps?

Because it is unequivocally better for all use cases where you aren't going to transfer the results to another computer. And virtual environments are not relocatable. And I'd imagine that conda doesn't provide a universal2 interpreter so conda environments might be relocatable, but presumably not across machines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'd imagine that conda doesn't provide a universal2 interpreter so conda environments might be relocatable, but presumably not across machines.

Conda environments should be more or less relocatable (not that I'd recommend it), but only within the same CPU architecture (more precisely, something closely resembling the target triple). Everything in conda is per-arch, there are no fat binaries or something like universal2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"because those are the better experience with pip install into an active environment" perhaps?

That sounds fine to me. Maybe adding "as an end user"? I imagine briefcase and py2app also use venv's under the hood.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that those would be inactive environments, but it's a subtle distinction and "as an end user" is probably a better one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, Briefcase doesn't use literal venvs, but it does do some venv-like tricks to ensure interpreter isolation. I can't speak to the current state of py2app.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I toned this down from "must do" to "should be done" and added the clarification that it's only better when installing for use on that machine. Please resolve this comment if that looks good.

- It is straightforward to fuse two thin wheels with `delocate-fuse` (a
tool that comes with [delocate](https://pypi.org/project/delocate/)),
it's a one-liner: `delocate-fuse $x86-64_wheel $arm64_wheel -w .`
However, it's worth noting that this requires that any headers or python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this edit is right. The exact same bug will still be present in universal2 wheels. Anything that depends on the build machine's architecture is going to be incorrect and must be explicitly handled for universal2 I believe. There may be exceptions, but I think it's true in general (similar issues as when cross-compiling) and it was true for the long double size issue that Isuru linked.

The problem here is the existence of universal2 as a format more than how you produced it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's a more general problem even for any multi-architecture support. Typical header generation that does things like sizeof(<C type name>) is going to be incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to compile macOS and iOS artefacts supporting 32 and 64 bit architectures in the same binary in a single compiler pass. It definitely requires careful invocation of the compiler; it requires that the full set of headers are available; and it precludes the use of command line #define flags that turn on specific code features - but it's entirely possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - doesn't this support my point? Building a fat wheel isn't a "straightforward" matter of running delocate-fuse".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, anything is possible, but it requires care.

Building a fat wheel isn't a "straightforward" matter of running delocate-fuse".

I think that in the case of no bugs like architecture-dependent headers, it is straightforward. And if there are such bugs, I'd rather get a loud clash from delocate-fuse rather than a silent bug.

In the end, I think the most robust process would be to build thin apps based on only thin wheels, and then only fuse the final thin apps together. That way, most/all bugs like this are likely irrelevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, building thin apps isn't a viable option. It causes a degraded and user experience on macOS, and fat apps are a requirement for iOS/Android distribution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(noting that the "fat" interpretation on Android is a little easier to accomodate because of how it handles binaries)

Copy link
Member

@rgommers rgommers Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, building thin apps isn't a viable option. It causes a degraded and user experience on macOS, and fat apps are a requirement for iOS/Android distribution.

This is why I said "and then only fuse the final thin apps together". I meant "into fat apps to distribute to users". The end result is the same, a fat app. It's just that the process is better controlled; any intermediate header issues get avoided when you use a stack of thin wheels.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your edit notes that delocate-fuse might succeed but produce broken artefacts.

But it's not an inherent problem with delocate-fuse, it's a "simple" matter of the project not supporting or testing universal2. If a project supports or tests universal2, it should not matter whether the project personally executes the compiler in multi-pass mode, or personally executes delocate-fuse.

Either both work, or both do not work.

And if/when the delocate-fuse tooling is improved, then it does become a straightforward matter of "attempt to fuse the wheels, if it doesn't work then submit a bug report to the project". No more "the project doesn't test this, so there may be subtle runtime bugs".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rephrased this now to say:

Note though that robustness improvements in delocate-fuse for more complex cases (e.g., generated header files with architecture-dependent content) are needed (see delocate#180). Such cases are likely to be equally problematic for direct universal2 wheel builds (see, e.g., numpy#22805).

I think that addressed it, please resolve if this looks good.

Copy link

@eli-schwartz eli-schwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo nits

docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
docs/key-issues/multiple_architectures.md Outdated Show resolved Hide resolved
@rgommers
Copy link
Member

Signing off for today ... but we're getting there.

@rgommers
Copy link
Member

Okay, I think this is converging - I'm pretty happy with how the universal2 foldout looks now, and with the identified solution direction. It provides a way forward that meets the needs of universal app builders without almost any impact on maintainers of projects with native code or end users.

@Noface86
Copy link

Anyone looking for an apprentice?

@h-vetinari
Copy link
Member

Okay, I think this is converging

I lost track of any remaining open threads here, but perhaps we can manage to finalize this PR?

@freakboy3742
Copy link
Contributor Author

I'm not aware of any open issues; I'm obviously keen to wrap this work up, so if there's anything outstanding, let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content PRs and issues related to content for a website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants