Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add details on native packaging requirements exposed by mobile platforms #27

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
eb2419a
Add details on native packaging requirements exposed by mobile platfo…
freakboy3742 Jan 9, 2023
8fef63e
Clarified the role/impact of cross-compilation on non-macOS platforms.
freakboy3742 Jan 10, 2023
d16035f
Grammar cleanup.
freakboy3742 Jan 10, 2023
84dbd5f
Add note about Windows platform support
freakboy3742 Jan 10, 2023
2a40f47
Moved a paragraph about the universal2 to current state.
freakboy3742 Jan 10, 2023
2563270
Clarified how Android deals with dependencies.
freakboy3742 Jan 10, 2023
b9b904c
Added an alternative approach for handling iOS multi-arch.
freakboy3742 Jan 10, 2023
45f748f
Modified comments to use common section structure, and include specif…
freakboy3742 Jan 16, 2023
373bb09
Apply suggestions from code review
freakboy3742 Jan 16, 2023
d8a2ca6
More updates stemming from review.
freakboy3742 Jan 16, 2023
f533395
Expand note about Linux support.
freakboy3742 Jan 17, 2023
8475360
Correct an it's typo.
freakboy3742 Jan 17, 2023
2886f2c
Add content to page on cross compilation
rgommers Feb 27, 2023
7556850
Resolve the last cross-compilation comment, on `pip --platform`
rgommers Mar 10, 2023
cb85652
Merge branch 'main' into mobile-details
rgommers Mar 10, 2023
49806e2
Put back link to "multiple architectures" page from cross compile page
rgommers Mar 10, 2023
ea1fb60
Remove the `cross_platform.md` file
rgommers Mar 10, 2023
d249af6
Fix some formatting and typo issues
rgommers Mar 10, 2023
50d8c26
Revisions to multi-architecture notes following review.
freakboy3742 Mar 20, 2023
a9776e0
Add foldout for pros and cons of `universal2` wheels
rgommers Mar 21, 2023
8d46e06
Add the 'for' arguments for universal2.
freakboy3742 Mar 21, 2023
5d06a56
Clarified 'end user' language; added note about merge problems.
freakboy3742 Mar 22, 2023
3e1fc05
Clarify the state of arm64 on github actions.
freakboy3742 Mar 22, 2023
74705d8
Add reference to pip issue about universal2 wheel installation.
freakboy3742 Mar 22, 2023
f46d2b0
Fixed typo.
freakboy3742 Mar 22, 2023
e1c278f
Removed subjective language.
freakboy3742 Mar 22, 2023
1a926eb
Apply textual/typo suggestions
rgommers Mar 22, 2023
7967383
Rephrase universal2 usage frequency/demand phrasing
rgommers Mar 22, 2023
1fb0ffb
Tone down the statement on "must provide thin wheels"
rgommers Mar 22, 2023
b44a322
Rephrase note on needed robustness improvements in delocate-fuse
rgommers Mar 22, 2023
dd93f1f
Add "first-class support for fusing thin wheels" as a potential solution
rgommers Mar 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/index.md
Expand Up @@ -67,6 +67,8 @@ workarounds for.
4. [Metadata handling on PyPI](key-issues/pypi_metadata_handling.md)
5. [Distributing a package containing SIMD code](key-issues/simd_support.md)
6. [Unsuspecting users getting failing from source builds](key-issues/unexpected_fromsource_builds.md)
7. [Platforms with multiple CPU architectures](key-issues/multiple_architectures.md)
8. [Cross-platform installation](key-issues/cross_platform.md)


## Contributing
Expand Down
92 changes: 92 additions & 0 deletions docs/key-issues/cross_platform.md
@@ -0,0 +1,92 @@
# Cross-platform installation

The historical assumption of compilation is that the platform where the code is
compiled will be the same as the platform where the final code will be executed
(if not literally the same machine, then at least one that is CPU and ABI
compatible at the operating system level). This is a reasonable assumption for
most desktop platforms; however, for some platforms, this isn't the case.

On mobile platforms, an app is compiled on a desktop platform, and transferred
to the mobile device (or a simulator) for testing. The compiler is not executed
on device. Therefore, it must be possible to build a binary artefact for a CPU
architecture and a ABI that is different the platform that is running the
compiler.

freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved
Cross compilation issues also emerge when dealing with continuous
integration/deployment (CI/CD). CI/CD platforms (such as Github Actions)
generally provide the "common" architectures - often only x86-64 - however, a
project may want to produce binaries for other platforms (e.g., ARM support for
Raspberry Pi devices; PowerPC or s390 for mainframe/server devices; or for
mobile platforms). These binaries won't run natively on the host CI/CD system
(without some sort of emulation); but code can be compiled for the target
platform.

macOS also experiences this as a result of the Apple Silicon transition. Apple
has provided the tools to compile [fat binaries](multiple_architectures.md) on
x86_64 hardware; however, in this case, the host platform (macOS on x86_64) will
still be one of the outputs of the compilation process, and the resulting binary
will run on the CI/CD system.

## Current state

Native compiler and build toolchains (e.g., autoconf/automake, CMake) have long
supported cross-compilation; however, these cross-compilation capabilities are
easy to break unless they are exercised regularly.
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved

CPython's build system includes some support for cross-compilation. This support
is largely based on leveraging autoconf's support for cross compilation. This
support wasn't well integrated into distutils and the compilation of the binary
portions of stdlib; however, with the deprecation and removal of disutils in
Python 3.12, this situation has improved.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

The specification of PEP517 means cross-platform compilation support has been
largely converted into a concern for individual build systems to manage.

## Problems

There is currently a small gap in communicating target platform details to the
build system. While a build system like autoconf or Cmake may support
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved
cross-platform compilation, and a project may be able to cross-compile binary
artefacts, invocation of the PEP517 build interface currently assumes that the
platform running the build will be the platform that ultimately runs the Python
code. As a result, `sys.platform`, or the various attributes of the `platform`
library can't be used as part of the build process.

`pip` provides limited support for installing binaries for a different platform
rgommers marked this conversation as resolved.
Show resolved Hide resolved
by specifying a `--platform`, `--implementation` and `--abi` flags; however,
these flags only work for the selection of pre-built binary artefacts.
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved

## History

Tools like [crossenv](https://github.com/benfogle/crossenv) can be used to trick
Python into performing cross-platform builds. These tools use path hacks and
overrides of known sources of platform-specific details (like `distutils`) to
provide a cross-compilation environment. However, these solutions tend to be
somewhat fragile as they aren't first-class citizens of the Python ecosystem.

[The BeeWare Project](https://beeware.org) also uses a version of these
techniques. On both platforms, BeeWare provides a custom package index that
contains pre-compiled binaries ([Android](https://chaquo.com/pypi-7.0/);
[iOS](https://anaconda.org/beeware/repo)). These binaries are produced using a
forge-like set of tooling
rgommers marked this conversation as resolved.
Show resolved Hide resolved
([Android](https://github.com/chaquo/chaquopy/tree/master/server/pypi);
[iOS](https://github.com/freakboy3742/chaquopy/tree/iOS-support/server/pypi)).

## Relevant resources

TODO

## Potential solutions or mitigations

At it's core, what is required is a recognition that cross-platform builds as a
use case that the Python ecosystem supports.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

In concrete terms, for native modules, this would require either:

1. Extension of the PEP517 interface to allow communicating the desired target
platform as part of a binary build; or

2. Formalization of the "platform identification" interface that can used by
PEP517 build backends to identify the target platform, so that tools like
`crossenv` can provide a reliable proxied environment for cross-platform
builds.
rgommers marked this conversation as resolved.
Show resolved Hide resolved
259 changes: 259 additions & 0 deletions docs/key-issues/multiple_architectures.md
@@ -0,0 +1,259 @@
# Platforms with multiple CPU architectures

In addition to any ABI requirements, a binary is compiled for a CPU
architecture. That CPU architecture defines the CPU instructions that can be
issued by the binary.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

## Current state

Historically, it could be assumed that an executable or library would be
compiled for a single CPU archicture. On the rare occasion that an operating
system was available for mulitple CPU architectures, it became the
rgommers marked this conversation as resolved.
Show resolved Hide resolved
responsibility of the user to find (or compile) a binary that was compiled for
their host CPU architecture.

However, we now see operating system platforms where multiple CPU architectures
are supported:

* In the early days of Windows NT, both x86 and DEC Alpha CPUs were supported
* Windows 10 supports x86, x86-64, ARMv7 and ARM64; Windows 11 supports x86-64
and ARM64.
* Although Linux started as an x86 project, the Linux kernel is now available a
wide range of other CPU architectures, including ARM64, RISC-V, PowerPC, s390
and more.
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved
* Apple transitioned Mac hardware from PowerPC to Intel (x86-64) CPUs, providing
a forwards compatibility path for binaries
* Apple is currently transitioning Mac hardware from Intel (x86-64) to
Apple Silicon (ARM64) CPUs, again providing a forwards compatibility
path
* Apple supports ARMv6, ARMv7, ARMv7s, ARM64 and ARM64e on iOS
* Android currently supports ARMv7, ARM64, x86, and x86-64; it has historically
also supported ARMv5 and MIPS
rgommers marked this conversation as resolved.
Show resolved Hide resolved

CPU architecture compatibility is a necessary, but not sufficient criterion for
determining binary compatibility. Even if two binaries are compiled for the same
CPU architecture, that doesn't guarantee [ABI compatibility](abi.md).

In some respects, CPU architecture compatibility could be considered a superset
rgommers marked this conversation as resolved.
Show resolved Hide resolved
of [GPU compatibility](gpus.md). When dealing with multiple CPU architectures,
there may be some overal with the solutions that can be used to support GPUs in
native binaries.
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved

Three approaches have emerged on operating systmes that have a need to manage
rgommers marked this conversation as resolved.
Show resolved Hide resolved
multiple CPU architectures:

### Multiple binaries

The minimal solution is to distribute multiple binaries. This is the approach
that is by Windows and Linux. At time of distribution, an installer or other
rgommers marked this conversation as resolved.
Show resolved Hide resolved
downloadable artefact is provided for each supported platform, and it is up to
the user to select and download the correct artefact.

### Archiving

The approach taken by Android is very similar to the multiple binary approach,
with some affordances and tooling to simplify distribution.

By default Android projects use Java/Kotlin, which produces platform independent
code. However, it is possible to use non Java/Kotlin libraries by using JNI and
the Android NDK (Native Development Kit). If a project contains native code, a
separate compilation pass is performed for each architecture.
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved

If a native binary library is required to compile the Android application, a
version must be provided for each supported CPU architecture. A directory layout
convention exists for providing a binary for each platform, with the same
library name.

The final binary artefact produced for Android distrobution uses this same
directory convention. A "binary" on Android is an APK (Android Application
Package) bundle; this is effectibely a ZIP file with known metadata and
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved
structure; internally, there are subfolders for each supported CPU architecture.
This APK is bundled into AAB (Android Application Bundle) format for upload to
an app store; at time of installation, a CPU-specific APK is generated and
provided to the end-user for installation.

### Fat binaries

Apple has taken the approach of "fat" binaries. A fat binary is a single
executable or library artefact that contains code for multiple CPU
architectures.

Fat binaries can be compiled in two ways:

1. **Single pass** Apple has modified their compiler tooling with flags that
allow the user to specify a single compilation command, and instruct the
compiler to generate multiple output architectures in the output binary
2. **Multiple pass** After compiling a binary for each platform, Apple provides
a call named `lipo` to combine multiple single-architecture binaries into a
single fat binary that contains all platforms.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

At runtime, the operating system loads the binary slice for the current CPU
architecture, and the linker loads the appropriate slice from the fat binary of
any dynamic libraries.

On macOS ARM hardware, Apple also provides Rosetta as a support mechanism; if a
user tries to run an binary that doesn't contain an ARM64 slice, but *does*
rgommers marked this conversation as resolved.
Show resolved Hide resolved
contain an x86-64 slice, the x86-64 slice will be converted at runtime into an
ARM64 binary. Complications can occur when only *some* of the binary is being
converted (e.g., if the binary being executed is fat, but a dynamic library
isn't).

To support the transition to Apple Silicon/M1 (ARM64), Python has introduced a
`universal2` architecture target to support . This is effectively a "fat wheel"
format; the `.dylib` files contained in the wheel are fat binaries containing
both x86_64 and ARM64 slices.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

iOS has an additional complication of requiring support for mutiple *ABIs* in
rgommers marked this conversation as resolved.
Show resolved Hide resolved
addition to multiple CPU archiectures. The ABI for the iOS simulator and
physical iOS devices are different; however, ARM64 is a supported CPU
architecture for both. As a result, it is not possible to produce a single fat
library that supports both the iOS simulator and iOS devices. Apple provides an
additional structure - the `XCFramework` - as a wrapper format for packaging
libraries that need to span multiple ABIs. When developing an application for
iOS, a developer will need to install binaries for both the simulator and
physical devices.

## Problems
rgommers marked this conversation as resolved.
Show resolved Hide resolved

At present, the Python ecosystem almost exclusively uses the "multiple binary"
solution. This serves the needs of Windows and Linux well, as it matches the
way end-users interact with binaries.

The `universal2` "fat wheel" solution also works well for macOS. The definition
rgommers marked this conversation as resolved.
Show resolved Hide resolved
of `universal2` is a hard-coded accomodation for one specific (albiet common)
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved
multi-architecture configuration, and involves a number of specific
accomodations in the Python ecosystem (e.g., a macOS-specific architecture
lookup scheme).

Supporting iOS requires supporting between 2 and 5 architectures (x86_64 and
ARM64 at the minimum), and at least 2 ABIs - the iOS simulator and iOS device
have different (and incompatible) binary ABIs. At runtime, iOS expects to find a
single "fat" binary for any given ABI. iOS effectively requires an analog of
rgommers marked this conversation as resolved.
Show resolved Hide resolved
`universal2` covering the 2 ABIs and multiple architectures. However:

1. The Python ecosystem does not provide an extension mechanism that would allow
platforms to define and utilize multi-architecture build artefacts.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

2. The rate of change of CPU architectures in the iOS ecosystem is more rapid
than that seen on desktop platforms; any potential "universal iOS" target
would need to be updated or versioned regularly. A single named target would
also force developers into supporting older devices that they may not want to
support.

Supporting Android also requires the support of between 2 and 4 architectures
(depending on the range of development and end-user configurations the app needs
to support). Android's archiving-based approach can be mapped onto the "multiple
binary" approach, as it is possible to build a single archive from multiple
individual binaries. However, some coordination is required when installing
multiple binaries. If an independent install pass (e.g., call to `pip`) is used
for each architecture, the dependency resolution process for each platform will
also be independent; if there are any discrepancies in the specific versions
available for each architecture (or any ordering instabilities in the dependency
resolution algorithm), it is possible to end up with different versions on each
platform. Some coordination between per-architecture passes is therefore
required.

## History

[The BeeWare Project](https://beeware.org) provides support for building both
iOS and Android binaries. On both platforms, BeeWare provides a custom package
index that contains pre-compiled binaries
([Android](https://chaquo.com/pypi-7.0/);
[iOS](https://anaconda.org/beeware/repo)). These binaries are produced using a
forge-like set of tooling
rgommers marked this conversation as resolved.
Show resolved Hide resolved
([Android](https://github.com/chaquo/chaquopy/tree/master/server/pypi);
[iOS](https://github.com/freakboy3742/chaquopy/tree/iOS-support/server/pypi))
that patches the build systems for the most common Python binary dependencies;
and on iOS, manages the process of merging single-architecture, single ABI
wheels into a fat wheel.

On iOS, BeeWare-supplied iOS binary packages provide a single "iPhone" wheel.
This wheel includes 2 binary libraries (one for the iPhone device ABI, and one
for the iPhone Simulator ABI); the iPhone simulator binary includes x86_64 and
ARM64 slices. This is effectively the "universal-iphone" approach, encoding a
specific combination of ABIs and architectures.

BeeWare's support for Android uses [Chaquopy](https://chaquo.com/chaquopy) as a
base. Chaquopy's binary artefact repository stores a single binary wheel for
each platform; it also contains a wrapper around `pip` to manage the
installation of multiple binaries. When a Python project requests the
installation of a package:

* Pip is run normally for one binary architecture
* The `.dist-info` metadata is used to identify the native packages - both
those directly requested by the user, and those installed as indirect
requirements by pip
* The native packages are separated from the pure-Python packages, and pip is
then run again for each of the remaining architectures; this time, only those
specific native packages are installed, pinned to the same versions that pip
selected for the first architecture.

[Kivy](https://kivy.org) also provides support for iOS and Android as deployment
platforms. However, Kivy doesn't support the use of binary artefacts like wheels
on those platforms; Kivy's support for binary modules is based on the broader Kivy
platform including build support for libraries that may be required.

## Relevant resources

To date, there haven't been extensive public discussions about the support of
iOS or Android binary packages. However, there were discussions around the
adoption of universal2 for macOS:

* [The CPython discussion about universal2
support](https://discuss.python.org/t/apple-silicon-and-packaging/4516)
* [The addition of universal2 to
CPython](https://github.com/python/cpython/pull/22855)
* [Support in packaging for
universal2](https://github.com/pypa/packaging/pull/319), which declares the
logic around resolving universal2 to specific platforms.

## Potential solutions or mitigations

There are two approaches that could be used to provide a general solution to
this problem, depending on whether the support of multiple architectures is
viewed as a distribution or integration problem.

### Distribution-based solution

The first approach is to treat the problem as a package distribution issue. In
this approach, artefacts stored in package repositories include all the ABIs and
CPU architectures needed to meaningfully support a given platform. This is the
approach embodied by the `universal2` packaging solution on macOS, and the iOS
solution used by BeeWare.

This approach would require agreement on any new "known" multi-ABI/arch tags, as
well as any resolution schemes that may be needed for those tags.

A more general approach to this problem would be to allow for multi-architecture
and multi-ABI binaries as part of the wheel naming scheme. A wheel can already
declare compatibility with multiple CPython versions (e.g.,
`cp34.cp35.cp36-abi3-manylinux1_x86_64`); it could be possible for a wheel to
declare multiple ABI or architecture inclusions. In such a scheme,
`cp310-abi3-macosx_10_9_universal2` would effectively be equivalent to
`cp310-abi3-macosx_10_9_x86_64.macosx_10_9_arm64`; an iPhone wheel for the same
package might be
`cp310-abi3-iphoneos_12_0_arm64.iphonesimulator_12_0_x86_64.iphonesimulator_12_0_arm64`.

This would allow for more generic logic based on matching name fragments, rather
than specific "known name" targets.

Regardless of whether "known tags" or a generic naming scheme is used, the
distribution-based approach requires modifications to the process of building
packages, and the process of installing packages.

### Integration-based solution

Alternatively, this could be treated as an install-time problem. This is the
approach taken by BeeWare/Chaquopy on Android.

In this approach, package repositories would continue to store
single-architecture, single-ABI artefacts. However, at time of installation, the
installation tool allows for the specification of multiple architectures/ABI
combinations. The installer then downloads a wheel for each architecture/ABI
requested, and performs any post-processing required to merge binaries for
multiple architectures into a single fat binary, or archiving those binary
artefacts in an appropriate location.

This approach is less invasive from the perspective of package repositories and
package build tooling; but would require significant modifications to installer
tooling.
4 changes: 3 additions & 1 deletion mkdocs.yml
Expand Up @@ -20,7 +20,7 @@ theme:
- scheme: default
primary: blue grey
toggle:
icon: material/brightness-7
icon: material/brightness-7
name: Switch to dark mode

nav:
Expand All @@ -42,6 +42,8 @@ nav:
- 'key-issues/pypi_metadata_handling.md'
- 'key-issues/simd_support.md'
- 'key-issues/unexpected_fromsource_builds.md'
- 'key-issues/multiple_architectures.md'
- 'key-issues/cross_platform.md'
- 'other_issues.md'
- 'Background':
- 'background/binary_interface.md'
Expand Down