Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add details on native packaging requirements exposed by mobile platforms #27

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
eb2419a
Add details on native packaging requirements exposed by mobile platfo…
freakboy3742 Jan 9, 2023
8fef63e
Clarified the role/impact of cross-compilation on non-macOS platforms.
freakboy3742 Jan 10, 2023
d16035f
Grammar cleanup.
freakboy3742 Jan 10, 2023
84dbd5f
Add note about Windows platform support
freakboy3742 Jan 10, 2023
2a40f47
Moved a paragraph about the universal2 to current state.
freakboy3742 Jan 10, 2023
2563270
Clarified how Android deals with dependencies.
freakboy3742 Jan 10, 2023
b9b904c
Added an alternative approach for handling iOS multi-arch.
freakboy3742 Jan 10, 2023
45f748f
Modified comments to use common section structure, and include specif…
freakboy3742 Jan 16, 2023
373bb09
Apply suggestions from code review
freakboy3742 Jan 16, 2023
d8a2ca6
More updates stemming from review.
freakboy3742 Jan 16, 2023
f533395
Expand note about Linux support.
freakboy3742 Jan 17, 2023
8475360
Correct an it's typo.
freakboy3742 Jan 17, 2023
2886f2c
Add content to page on cross compilation
rgommers Feb 27, 2023
7556850
Resolve the last cross-compilation comment, on `pip --platform`
rgommers Mar 10, 2023
cb85652
Merge branch 'main' into mobile-details
rgommers Mar 10, 2023
49806e2
Put back link to "multiple architectures" page from cross compile page
rgommers Mar 10, 2023
ea1fb60
Remove the `cross_platform.md` file
rgommers Mar 10, 2023
d249af6
Fix some formatting and typo issues
rgommers Mar 10, 2023
50d8c26
Revisions to multi-architecture notes following review.
freakboy3742 Mar 20, 2023
a9776e0
Add foldout for pros and cons of `universal2` wheels
rgommers Mar 21, 2023
8d46e06
Add the 'for' arguments for universal2.
freakboy3742 Mar 21, 2023
5d06a56
Clarified 'end user' language; added note about merge problems.
freakboy3742 Mar 22, 2023
3e1fc05
Clarify the state of arm64 on github actions.
freakboy3742 Mar 22, 2023
74705d8
Add reference to pip issue about universal2 wheel installation.
freakboy3742 Mar 22, 2023
f46d2b0
Fixed typo.
freakboy3742 Mar 22, 2023
e1c278f
Removed subjective language.
freakboy3742 Mar 22, 2023
1a926eb
Apply textual/typo suggestions
rgommers Mar 22, 2023
7967383
Rephrase universal2 usage frequency/demand phrasing
rgommers Mar 22, 2023
1fb0ffb
Tone down the statement on "must provide thin wheels"
rgommers Mar 22, 2023
b44a322
Rephrase note on needed robustness improvements in delocate-fuse
rgommers Mar 22, 2023
dd93f1f
Add "first-class support for fusing thin wheels" as a potential solution
rgommers Mar 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/glossary.md
Expand Up @@ -6,6 +6,8 @@
|---|---|---|
| ABI | Application Binary Interface | See [here](./background/binary_interface.md) |
| API | Application Programming Interface | The sum total of available functions, classes, etc. of a given program |
| AAB | Android Application Bundle | A distributable unit containing an Android application |
| APK | Android application Package | A "binary" unit for Android, installed on a device |
| ARM | Advanced RISC Machines | Family of RISC architectures, second-most widely used processor family after x86 |
| AVX | Advanced Vector eXtensions | Various extensions to the x86 instruction set (AVX, AVX2, AVX512), evolution after SSE |
| BLAS | Basic Linear Algebra Subprograms | Specification resp. implementation for low-level linear algebra routines |
Expand All @@ -29,13 +31,15 @@
| LAPACK | Linear Algebra PACKage | Standard software library for numerical linear algebra |
| ISA | Instruction Set Architecture | Specification of an instruction set for a CPU; e.g. x86-64, arm64, ... |
| JIT | Just-in-time Compilation | Compiling code just before execution; used in CUDA, PyTorch, PyPy, Numba etc. |
| JNI | Java Native Interface | The bridge API allowing access of Java runtime objects from native code (and vice versa) |
| LLVM | - | Cross-platform compiler framework, home of Clang, MLIR, BOLT etc. |
| LTO | Link-Time Optimization | See [here](./background/compilation_concepts.md#link-time-optimization-lto)|
| LTS | Long-Term Support | Version of a given software/library/distribution designated for long-term support |
| musl | - | An alternative implementation of the C standard library |
| MPI | Message Passing Interface | Standard for message-passing in parallel computing |
| MLIR | Multi-Level IR | Higher-level IR within LLVM; used i.a. in machine learning frameworks |
| MSVC | Microsoft Visual C++ | Main compiler on Windows |
| NDK | Native Development Kit | The Android toolchain supporting compilation of binary modules |
| NEP | Numpy Enhancement Proposal | See [here](https://numpy.org/neps/) |
| OpenMP | Open Multi Processing | Multi-platform API for enabling multi-processing in C/C++/Fortran |
| OS | Operating System | E.g. Linux, MacOS, Windows |
Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Expand Up @@ -68,6 +68,7 @@ workarounds for.
5. [Distributing a package containing SIMD code](key-issues/simd_support.md)
6. [Unsuspecting users getting failing from source builds](key-issues/unexpected_fromsource_builds.md)
7. [Cross compilation](key-issues/cross_compilation.md)
8. [Platforms with multiple CPU architectures](key-issues/multiple_architectures.md)


## Contributing
Expand Down
2 changes: 1 addition & 1 deletion docs/key-issues/cross_compilation.md
Expand Up @@ -23,7 +23,7 @@ compiled for the target platform.

macOS also experiences this as a result of the Apple Silicon transition. Apple
has provided the tools to make cross compilation from x86-64 to arm64 as easy
as possible, as well as to compile fat binaries
as possible, as well as to compile [fat binaries](multiple_architectures.md)
(supporting x86-64 and arm64 at the same time) on both architectures. In the
latter case, the host platform will still be one of the outputs of the
compilation process, and the resulting binary will run on the CI/CD system.
Expand Down
266 changes: 266 additions & 0 deletions docs/key-issues/multiple_architectures.md
@@ -0,0 +1,266 @@
# Platforms with multiple CPU architectures

One important subset of ABI concerns is the CPU architecture for which a binary
artefact has been built. Attempting to run a binary on hardware that doesn't
match the CPU architecture (or architecture variant[^1]) for which the binary
was built will generally lead to crashes, even if the ABI being used is
otherwise compatible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're back to the key problem being support for iOS and Android (the one other concrete case of aarch64 variants like SBSA may not be an issue after all); so tweaking the intro here would be good I think to state that clearly. We don't have concrete problems for any other platforms, because single-arch wheels are always preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. macOS is also firmly in this grouping, at least for my use case.

This isn't something that affects the "Conda/System python" use case, because you're in control of the Python interpreter, and as the person installing a binary package, you're pulling binaries that are specific to your usage requirement, which is a single CPU architecture by definition.

However, if you're building a redistributable macOS app, you must care about all the architectures that your user base may be using. If I build an app on M1, and give it to a user on x86, I still want my app to run. The set of problems here are identical to those on mobile platforms. The Python ecosystem has a solution - universal wheels - but this solution is (a) not inherently useful for mobile platforms without significant changes, and (b) not liked by some segments of the desktop user base.

From a packaging perspective, I distinctly do not prefer single arch wheels, as they complicate the process of building an app - at least, it does given the current state of Python packaging tooling. If the Python ecosystem is going to converge on single architecture wheels as the only/preferred approach, then I'd argue that needs to be combined with a clear understanding of how use cases on platforms that need to support multiple architectures interact with that ecosystem. Being able to do "pip install" specifying multiple architectures and implicitly do a "delocate-fuse" (or equivalent) for each target architecture seems like a bare minimum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to do "pip install" specifying multiple architectures and implicitly do a "delocate-fuse" (or equivalent) for each target architecture seems like a bare minimum.

Yes, I think that that is the correct solution here. Given that the concern is very specific to app building, the solution should live in the app building toolchain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can expand on this a bit I think, maybe in a note admonition. Because it's not too well understood I'd say, and the toolchains here for py2app & co are falling short of what is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm unclear why this is a py2app (or, in my case, briefcase) issue (beyond the obvious "it's your problem, so fix it" answer). As discussed further up the page, this isn't something that can be trivially resolved by end-user tooling. It's something that requires policies and annotation at the level of the package ecosystem - at the very least to describe how package merging should occur, if not to provide at least a reference implementation of that merging process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the improvements needed discussed in the other thread.

I guess I'm unclear why this is a py2app (or, in my case, briefcase) issue (beyond the obvious "it's your problem, so fix it" answer).

Beyond the "those who need it tend to have to do the work", it is because from a holistic design perspective it is the correct place to implement this. The only users that need this are py2app, briefcase & co, so let's support the use case there rather than both every single package author with more work. It also will scale better in practice that way (only 50% of package authors uploading fat wheels won't be enough for your needs here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we agree here; however, what I'm flagging is that we shouldn't be aiming for/requiring py2app and briefcase to develop independent implementations of the "merge" logic. It should be shared first-class tooling in the Python ecosystem that can be utilised by anyone that needs it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be shared first-class tooling in the Python ecosystem that can be utilised by anyone that needs it.

I agree, and I like this. Let's state it like this in the "Potential solutions" section?

Ideally what should happen here is that this ends up in auditwheel and that becomes the one-stop shop for wheel distribution/packager needs, rather than having separate per-OS projects with patchy support for this kind of thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an entry under Potential solutions for this. Please resolve if that looks good.

[^1]:
E.g., the x86-64 architecture has a range of well-known extensions, such as
SSE, SSE2, SSE3, AVX, AVX2, AVX512, etc.

## Current state

Historically, it could be assumed that an executable or library would be
compiled for a single CPU architecture. On the rare occasion that an operating
system was available for mulitple CPU architectures, it became the
rgommers marked this conversation as resolved.
Show resolved Hide resolved
responsibility of the user to find (or compile) a binary that was compiled for
their host CPU architecture.

However, we now see operating system platforms where multiple CPU architectures
are supported:

* In the early days of Windows NT, both x86 and DEC Alpha CPUs were supported
* Windows 10 supports x86, x86-64, ARMv7 and ARM64; Windows 11 supports x86-64
and ARM64.
* Due to its open source nature, Linux tends to support all CPU architectures for
which someone is interested enough to author & provide support in the kernel,
see [here](https://en.wikipedia.org/wiki/List_of_Linux-supported_computer_architectures).
* Apple transitioned Mac hardware from PowerPC to Intel (x86-64) CPUs, providing
a forwards compatibility path for binaries
* Apple is currently transitioning Mac hardware from Intel (x86-64) to
Apple Silicon (ARM64) CPUs, again providing a forwards compatibility
path
* Apple supports ARMv6, ARMv7, ARMv7s, ARM64 and ARM64e on iOS
* Android currently supports ARMv7, ARM64, x86, and x86-64; it has historically
also supported ARMv5 and MIPS
rgommers marked this conversation as resolved.
Show resolved Hide resolved

CPU architecture compatibility is a necessary, but not sufficient criterion for
determining binary compatibility. Even if two binaries are compiled for the same
CPU architecture, that doesn't guarantee [ABI compatibility](abi.md).

In some respects, CPU architecture compatibility could be considered a superset
rgommers marked this conversation as resolved.
Show resolved Hide resolved
of [GPU compatibility](gpus.md). When dealing with multiple CPU architectures,
there may be some overlap with the solutions that can be used to support GPUs in
native binaries.

Three approaches have emerged on operating systmes that have a need to manage
rgommers marked this conversation as resolved.
Show resolved Hide resolved
multiple CPU architectures:

### Multiple binaries

The minimal solution is to distribute multiple binaries. This is the approach
that is by Windows and Linux. At time of distribution, an installer or other
rgommers marked this conversation as resolved.
Show resolved Hide resolved
downloadable artefact is provided for each supported platform, and it is up to
the user to select and download the correct artefact.

### Archiving

The approach taken by Android is very similar to the multiple binary approach,
with some affordances and tooling to simplify distribution.

By default Android projects use Java/Kotlin, which produces platform independent
code. However, it is possible to use non-Java/Kotlin libraries by using JNI and
the Android NDK (Native Development Kit). If a project contains native code, a
separate compilation pass is performed for each architecture.

If a native binary library is required to compile the Android application, a
version must be provided for each supported CPU architecture. A directory layout
convention exists for providing a binary for each platform, with the same
library name.

The final binary artefact produced for Android distrobution uses this same
directory convention. A "binary" on Android is an APK (Android Application
Package) bundle; this is effectively a ZIP file with known metadata and
structure; internally, there are subfolders for each supported CPU architecture.
This APK is bundled into AAB (Android Application Bundle) format for upload to
an app store; at time of installation, a CPU-specific APK is generated and
provided to the end-user for installation.

### Fat binaries

Apple has taken the approach of "fat" binaries. A fat binary is a single
executable or library artefact that contains code for multiple CPU
architectures.

Fat binaries can be compiled in two ways:

1. **Single pass** Apple has modified their compiler tooling with flags that
allow the user to specify a single compilation command, and instruct the
compiler to generate multiple output architectures in the output binary
2. **Multiple pass** After compiling a binary for each platform, Apple provides
a call named `lipo` to combine multiple single-architecture binaries into a
single fat binary that contains all platforms.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

At runtime, the operating system loads the binary slice for the current CPU
architecture, and the linker loads the appropriate slice from the fat binary of
any dynamic libraries.

On macOS ARM hardware, Apple also provides Rosetta as a support mechanism; if a
user tries to run an binary that doesn't contain an ARM64 slice, but *does*
rgommers marked this conversation as resolved.
Show resolved Hide resolved
contain an x86-64 slice, the x86-64 slice will be converted at runtime into an
ARM64 binary. Complications can occur when only *some* of the binary is being
converted (e.g., if the binary being executed is fat, but a dynamic library
isn't).

To support the transition to Apple Silicon/M1 (ARM64), Python has introduced a
`universal2` architecture target. This is effectively a "fat wheel" format; the
`.dylib` files contained in the wheel are fat binaries containing both x86-64
and ARM64 slices.

iOS has an additional complication of requiring support for mutiple *ABIs* in
rgommers marked this conversation as resolved.
Show resolved Hide resolved
addition to multiple CPU architectures. The ABI for the iOS simulator and
physical iOS devices are different; however, ARM64 is a supported CPU
architecture for both. As a result, it is not possible to produce a single fat
library that supports both the iOS simulator and iOS devices. Apple provides an
additional structure - the `XCFramework` - as a wrapper format for packaging
libraries that need to span multiple ABIs. When developing an application for
iOS, a developer will need to install binaries for both the simulator and
physical devices.

## Problems
rgommers marked this conversation as resolved.
Show resolved Hide resolved

At present, the Python ecosystem almost exclusively uses the "multiple binary"
solution. This serves the needs of Windows and Linux well, as it matches the
way end-users interact with binaries.

The `universal2` "fat wheel" solution also works well for macOS. The definition
rgommers marked this conversation as resolved.
Show resolved Hide resolved
of `universal2` is a hard-coded accomodation for one specific (albeit common)
multi-architecture configuration, and involves a number of specific
accomodations in the Python ecosystem (e.g., a macOS-specific architecture
lookup scheme).

Supporting iOS requires supporting between 2 and 5 architectures (x86-64 and
ARM64 at the minimum), and at least 2 ABIs - the iOS simulator and iOS device
have different (and incompatible) binary ABIs. At runtime, iOS expects to find a
single "fat" binary for the ABI that is in use. iOS effectively requires an
analog of `universal2` covering the 2 ABIs and multiple architectures. However:

1. The Python ecosystem does not provide an extension mechanism that would allow
platforms to define and utilize multi-architecture build artefacts.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

2. The rate of change of CPU architectures in the iOS ecosystem is more rapid
than that seen on desktop platforms; any potential "universal iOS" target
would need to be updated or versioned regularly. A single named target would
also force developers into supporting older devices that they may not want to
support.

Supporting Android also requires the support of between 2 and 4 architectures
(depending on the range of development and end-user configurations the app needs
to support). Android's archiving-based approach can be mapped onto the "multiple
binary" approach, as it is possible to build a single archive from multiple
individual binaries. However, some coordination is required when installing
multiple binaries. If an independent install pass (e.g., call to `pip`) is used
for each architecture, the dependency resolution process for each platform will
also be independent; if there are any discrepancies in the specific versions
available for each architecture (or any ordering instabilities in the dependency
resolution algorithm), it is possible to end up with different versions on each
platform. Some coordination between per-architecture passes is therefore
required.

## History

[The BeeWare Project](https://beeware.org) provides support for building both
iOS and Android binaries. On both platforms, BeeWare provides a custom package
index that contains pre-compiled binaries
([Android](https://chaquo.com/pypi-7.0/);
[iOS](https://anaconda.org/beeware/repo)). These binaries are produced using a
set of tooling
([Android](https://github.com/chaquo/chaquopy/tree/master/server/pypi);
[iOS](https://github.com/freakboy3742/chaquopy/tree/iOS-support/server/pypi))
that is analogous to the tools used by conda-forge to build binary artefacts.
These tools patch the source and build configurations for the most common Python
binary dependencies; on iOS, these tools also manage the process of merging
single-architecture, single ABI wheels into a fat wheel.

On iOS, BeeWare-supplied iOS binary packages provide a single "iPhone" wheel.
This wheel includes 2 binary libraries (one for the iPhone device ABI, and one
for the iPhone Simulator ABI); the iPhone simulator binary includes x86-64 and
ARM64 slices. This is effectively the "universal-iphone" approach, encoding a
specific combination of ABIs and architectures.

BeeWare's support for Android uses [Chaquopy](https://chaquo.com/chaquopy) as a
base. Chaquopy's binary artefact repository stores a single binary wheel for
each platform; it also contains a wrapper around `pip` to manage the
installation of multiple binaries. When a Python project requests the
installation of a package:

* Pip is run normally for one binary architecture,
* The `.dist-info` metadata is used to identify the native packages - both
those directly requested by the user, and those installed as indirect
requirements by pip,
* The native packages are separated from the pure-Python packages, and pip is
then run again for each of the remaining architectures; this time, only those
specific native packages are installed, pinned to the same versions that pip
selected for the first architecture.

[Kivy](https://kivy.org) also provides support for iOS and Android as deployment
platforms. However, Kivy doesn't support the use of binary artefacts like wheels
on those platforms; Kivy's support for binary modules is based on the broader Kivy
platform including build support for libraries that may be required.

## Relevant resources

To date, there haven't been extensive public discussions about the support of
iOS or Android binary packages. However, there were discussions around the
adoption of `universal2` for macOS:

* [The CPython discussion about `universal2`
support](https://discuss.python.org/t/apple-silicon-and-packaging/4516)
* [The addition of `universal2` to
CPython](https://github.com/python/cpython/pull/22855)
* [Support in packaging for
`universal2`](https://github.com/pypa/packaging/pull/319), which declares the
logic around resolving `universal2` to specific platforms.

## Potential solutions or mitigations

There are two approaches that could be used to provide a general solution to
this problem, depending on whether the support of multiple architectures is
viewed as a distribution or integration problem.

### Distribution-based solution

The first approach is to treat the problem as a package distribution issue. In
this approach, artefacts stored in package repositories include all the ABIs and
CPU architectures needed to meaningfully support a given platform. This is the
approach embodied by the `universal2` packaging solution on macOS, and the iOS
solution used by BeeWare.

This approach would require agreement on any new "known" multi-ABI/arch tags, as
well as any resolution schemes that may be needed for those tags.

A more general approach to this problem would be to allow for multi-architecture
and multi-ABI binaries as part of the wheel naming scheme. A wheel can already
declare compatibility with multiple CPython versions (e.g.,
`cp34.cp35.cp36-abi3-manylinux1_x86_64`); it could be possible for a wheel to
declare multiple ABI or architecture inclusions. In such a scheme,
`cp310-abi3-macosx_10_9_universal2` would effectively be equivalent to
`cp310-abi3-macosx_10_9_x86_64.macosx_10_9_arm64`; an iPhone wheel for the same
package might be
`cp310-abi3-iphoneos_12_0_arm64.iphonesimulator_12_0_x86_64.iphonesimulator_12_0_arm64`.

This would allow for more generic logic based on matching name fragments, rather
than specific "known name" targets.

Regardless of whether "known tags" or a generic naming scheme is used, the
distribution-based approach requires modifications to the process of building
packages, and the process of installing packages.

### Integration-based solution

Alternatively, this could be treated as an install-time problem. This is the
approach taken by BeeWare/Chaquopy on Android.

In this approach, package repositories would continue to store
single-architecture, single-ABI artefacts. However, at time of installation, the
installation tool allows for the specification of multiple architectures/ABI
combinations. The installer then downloads a wheel for each architecture/ABI
requested, and performs any post-processing required to merge binaries for
multiple architectures into a single fat binary, or archiving those binary
artefacts in an appropriate location.

This approach is less invasive from the perspective of package repositories and
package build tooling; but would require significant modifications to installer
tooling.
1 change: 1 addition & 0 deletions mkdocs.yml
Expand Up @@ -43,6 +43,7 @@ nav:
- 'key-issues/simd_support.md'
- 'key-issues/unexpected_fromsource_builds.md'
- 'key-issues/cross_compilation.md'
- 'key-issues/multiple_architectures.md'
- 'other_issues.md'
- 'Background':
- 'background/binary_interface.md'
Expand Down