Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add details on native packaging requirements exposed by mobile platforms #27

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
eb2419a
Add details on native packaging requirements exposed by mobile platfo…
freakboy3742 Jan 9, 2023
8fef63e
Clarified the role/impact of cross-compilation on non-macOS platforms.
freakboy3742 Jan 10, 2023
d16035f
Grammar cleanup.
freakboy3742 Jan 10, 2023
84dbd5f
Add note about Windows platform support
freakboy3742 Jan 10, 2023
2a40f47
Moved a paragraph about the universal2 to current state.
freakboy3742 Jan 10, 2023
2563270
Clarified how Android deals with dependencies.
freakboy3742 Jan 10, 2023
b9b904c
Added an alternative approach for handling iOS multi-arch.
freakboy3742 Jan 10, 2023
45f748f
Modified comments to use common section structure, and include specif…
freakboy3742 Jan 16, 2023
373bb09
Apply suggestions from code review
freakboy3742 Jan 16, 2023
d8a2ca6
More updates stemming from review.
freakboy3742 Jan 16, 2023
f533395
Expand note about Linux support.
freakboy3742 Jan 17, 2023
8475360
Correct an it's typo.
freakboy3742 Jan 17, 2023
2886f2c
Add content to page on cross compilation
rgommers Feb 27, 2023
7556850
Resolve the last cross-compilation comment, on `pip --platform`
rgommers Mar 10, 2023
cb85652
Merge branch 'main' into mobile-details
rgommers Mar 10, 2023
49806e2
Put back link to "multiple architectures" page from cross compile page
rgommers Mar 10, 2023
ea1fb60
Remove the `cross_platform.md` file
rgommers Mar 10, 2023
d249af6
Fix some formatting and typo issues
rgommers Mar 10, 2023
50d8c26
Revisions to multi-architecture notes following review.
freakboy3742 Mar 20, 2023
a9776e0
Add foldout for pros and cons of `universal2` wheels
rgommers Mar 21, 2023
8d46e06
Add the 'for' arguments for universal2.
freakboy3742 Mar 21, 2023
5d06a56
Clarified 'end user' language; added note about merge problems.
freakboy3742 Mar 22, 2023
3e1fc05
Clarify the state of arm64 on github actions.
freakboy3742 Mar 22, 2023
74705d8
Add reference to pip issue about universal2 wheel installation.
freakboy3742 Mar 22, 2023
f46d2b0
Fixed typo.
freakboy3742 Mar 22, 2023
e1c278f
Removed subjective language.
freakboy3742 Mar 22, 2023
1a926eb
Apply textual/typo suggestions
rgommers Mar 22, 2023
7967383
Rephrase universal2 usage frequency/demand phrasing
rgommers Mar 22, 2023
1fb0ffb
Tone down the statement on "must provide thin wheels"
rgommers Mar 22, 2023
b44a322
Rephrase note on needed robustness improvements in delocate-fuse
rgommers Mar 22, 2023
dd93f1f
Add "first-class support for fusing thin wheels" as a potential solution
rgommers Mar 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
70 changes: 61 additions & 9 deletions docs/key-issues/cross_platform.md
Expand Up @@ -4,7 +4,7 @@ The historical assumption of compilation is that the platform where the code is
compiled will be the same as the platform where the final code will be executed
(if not literally the same machine, then at least one that is CPU and ABI
compatible at the operating system level). This is a reasonable assumption for
most desktop projects; However, for mobile platforms, this isn't the case.
most desktop platforms; however, for some platforms, this isn't the case.

On mobile platforms, an app is compiled on a desktop platform, and transferred
to the mobile device (or a simulator) for testing. The compiler is not executed
Expand All @@ -27,14 +27,66 @@ x86_64 hardware; however, in this case, the host platform (macOS on x86_64) will
still be one of the outputs of the compilation process, and the resulting binary
will run on the CI/CD system.

## Current state

Native compiler and build toolchains (e.g., autoconf/automake, CMake) have long
supported cross-compilation; however, these cross-compilation capabilities are
easy to break unless they are exercised regularly.
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved

CPython's build system includes some support for cross-compilation. This support
is largely based on leveraging autoconf's support for cross compilation. This
support wasn't well integrated into distutils and the compilation of the binary
portions of stdlib; however, with the deprecation and removal of disutils in
Python 3.12, this situation has improved.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

The specification of PEP517 means cross-platform compilation support has been
largely converted into a concern for individual build systems to manage.

## Problems

There is currently a small gap in communicating target platform details to the
build system. While a build system like autoconf or Cmake may support
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved
cross-platform compilation, and a project may be able to cross-compile binary
artefacts, invocation of the PEP517 build interface currently assumes that the
platform running the build will be the platform that ultimately runs the Python
code. As a result, `sys.platform`, or the various attributes of the `platform`
library can't be used as part of the build process.

`pip` provides limited support for installing binaries for a different platform
rgommers marked this conversation as resolved.
Show resolved Hide resolved
by specifying a `--platform`, `--implementation` and `--abi` flags; however,
these flags only work for the selection of pre-built binary artefacts.
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved

## History

Tools like [crossenv](https://github.com/benfogle/crossenv) can be used to trick
Python into performing cross-platform builds. These tools use path hacks and
overrides of known sources of platform-specific details (like `distutils`) to
provide a cross-compilation environment. However, these solutions tend to be
somewhat fragile as they aren't first-class citizens of the Python ecosystem.

[The BeeWare Project](https://beeware.org) also uses a version of these
techniques. On both platforms, BeeWare provides a custom package index that
contains pre-compiled binaries ([Android](https://chaquo.com/pypi-7.0/);
[iOS](https://anaconda.org/beeware/repo)). These binaries are produced using a
forge-like set of tooling
rgommers marked this conversation as resolved.
Show resolved Hide resolved
([Android](https://github.com/chaquo/chaquopy/tree/master/server/pypi);
[iOS](https://github.com/freakboy3742/chaquopy/tree/iOS-support/server/pypi)).

## Relevant resources

TODO

## Potential solutions or mitigations

Compiler and build toolchains (e.g., autoconf/automake) have long supported
cross-compilation; however, these cross-compilation capabilities are easy to
break unless they are exercised regularly.
At it's core, what is required is a recognition that cross-platform builds as a
use case that the Python ecosystem supports.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

In concrete terms, for native modules, this would require either:

1. Extension of the PEP517 interface to allow communicating the desired target
platform as part of a binary build; or

In the Python space, tools like [crossenv](https://github.com/benfogle/crossenv)
also exist; these tools use a collection of path hacks and overrides of known
sources of platform-specific details (like `distutils`) to provide a
cross-compilation environment. However, these solutions tend to be somewhat
fragile as they aren't first-class citizens of the Python ecosystem.
2. Formalization of the "platform identification" interface that can used by
PEP517 build backends to identify the target platform, so that tools like
`crossenv` can provide a reliable proxied environment for cross-platform
builds.
rgommers marked this conversation as resolved.
Show resolved Hide resolved
242 changes: 172 additions & 70 deletions docs/key-issues/multiple_architectures.md
Expand Up @@ -4,14 +4,16 @@ In addition to any ABI requirements, a binary is compiled for a CPU
architecture. That CPU architecture defines the CPU instructions that can be
issued by the binary.

## Current state

Historically, it could be assumed that an executable or library would be
compiled for a single CPU archicture. On the rare occasion that an operating
system was available for mulitple CPU architectures, it became the
rgommers marked this conversation as resolved.
Show resolved Hide resolved
responsibility of the user to find (or compile) a binary that was compiled for
their host CPU architecture.

However, on occasion, we see an operating system platform where multiple CPU
architectures are supported:
However, we now see operating system platforms where multiple CPU architectures
are supported:

* In the early days of Windows NT, both x86 and DEC Alpha CPUs were supported
* Windows 10 supports x86, x86-64, ARMv7 and ARM64; Windows 11 supports x86-64
Expand All @@ -37,47 +39,38 @@ of [GPU compatibility](gpus.md). When dealing with multiple CPU architectures,
there may be some overal with the solutions that can be used to support GPUs in
native binaries.

## Platform approaches for dealing with multiple architectures

Three approaches have emerged for handling multiple CPU architectures.
Three approaches have emerged on operating systmes that have a need to manage
rgommers marked this conversation as resolved.
Show resolved Hide resolved
multiple CPU architectures:

### Multiple binaries

The minimal solution is to distribute multiple binaries. This is the approach
that was used by Windows NT, and is currently supported by Linux. At time of
distribution, an installer or other downloadable artefact is provided for each
supported platform, and it is up to the user to select and download the correct
artefact.
that is by Windows and Linux. At time of distribution, an installer or other
rgommers marked this conversation as resolved.
Show resolved Hide resolved
downloadable artefact is provided for each supported platform, and it is up to
the user to select and download the correct artefact.

### Archiving

The approach taken by Android is very similar to the multiple binary approach,
with some affordances and tooling to simplify distribution.

When building an Android project, each target architecture is compiled
independently. If a native binary library is required to compile the Android
application, a version must be provided for each supported CPU architecture. A
directory layout convention exists for providing a binary for each platform,
with the same library name. This yields an independent final binary (APK) for
each CPU architecture. When running locally, a CPU-specific APK will be
uploaded to the simulator or test device.

This approach can be supported with a conventional "single platform wheel"
approach. A library developer can package a wheel for each Android CPU
architecture they wish to support; the Android project will install a
CPU-architecture appropriate wheel when the compiler pass for that archictecture
is performed. The only complication is that process of installing wheels will
involve a dependency resolution pass on each supported platform; this could
potentially lead to a situation where a single application has different
versions of a Python library on different architectures.

To simplify the process of distributing the application, at time of publication,
a single Android App Bundle (AAB) is generated from the multiple CPU-specific
APKs. This AAB contains binaries for all platforms that can be uploaded to an
app store.

When an end-user requests the installation of an app, the app store strips out the
binary that is appropriate for the end-user's device.
By default Android projects use Java/Kotlin, which produces platform independent
code. However, it is possible to use non Java/Kotlin libraries by using JNI and
the Android NDK (Native Development Kit). If a project contains native code, a
separate compilation pass is performed for each architecture.
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved

If a native binary library is required to compile the Android application, a
version must be provided for each supported CPU architecture. A directory layout
convention exists for providing a binary for each platform, with the same
library name.

The final binary artefact produced for Android distrobution uses this same
directory convention. A "binary" on Android is an APK (Android Application
Package) bundle; this is effectibely a ZIP file with known metadata and
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved
structure; internally, there are subfolders for each supported CPU architecture.
This APK is bundled into AAB (Android Application Bundle) format for upload to
an app store; at time of installation, a CPU-specific APK is generated and
provided to the end-user for installation.

### Fat binaries

Expand Down Expand Up @@ -105,6 +98,11 @@ ARM64 binary. Complications can occur when only *some* of the binary is being
converted (e.g., if the binary being executed is fat, but a dynamic library
isn't).

To support the transition to Apple Silicon/M1 (ARM64), Python has introduced a
`universal2` architecture target to support . This is effectively a "fat wheel"
format; the `.dylib` files contained in the wheel are fat binaries containing
both x86_64 and ARM64 slices.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

iOS has an additional complication of requiring support for mutiple *ABIs* in
rgommers marked this conversation as resolved.
Show resolved Hide resolved
addition to multiple CPU archiectures. The ABI for the iOS simulator and
physical iOS devices are different; however, ARM64 is a supported CPU
Expand All @@ -115,43 +113,147 @@ libraries that need to span multiple ABIs. When developing an application for
iOS, a developer will need to install binaries for both the simulator and
physical devices.

Python currently provides `universal2` wheels to support x86_64 and ARM64 in a
single wheel. This is effectively a "fat wheel" format; the `.dylib` files
contained in the wheel are fat binaries containing both x86_64 and ARM64 slices.
## Problems
rgommers marked this conversation as resolved.
Show resolved Hide resolved

At present, the Python ecosystem almost exclusively uses the "multiple binary"
solution. This serves the needs of Windows and Linux well, as it matches the
way end-users interact with binaries.

The `universal2` "fat wheel" solution also works well for macOS. The definition
rgommers marked this conversation as resolved.
Show resolved Hide resolved
of `universal2` is a hard-coded accomodation for one specific (albiet common)
freakboy3742 marked this conversation as resolved.
Show resolved Hide resolved
multi-architecture configuration, and involves a number of specific
accomodations in the Python ecosystem (e.g., a macOS-specific architecture
lookup scheme).

Supporting iOS requires supporting between 2 and 5 architectures (x86_64 and
ARM64 at the minimum), and at least 2 ABIs - the iOS simulator and iOS device
have different (and incompatible) binary ABIs. At runtime, iOS expects to find a
single "fat" binary for any given ABI. iOS effectively requires an analog of
rgommers marked this conversation as resolved.
Show resolved Hide resolved
`universal2` covering the 2 ABIs and multiple architectures. However:

1. The Python ecosystem does not provide an extension mechanism that would allow
platforms to define and utilize multi-architecture build artefacts.
rgommers marked this conversation as resolved.
Show resolved Hide resolved

2. The rate of change of CPU architectures in the iOS ecosystem is more rapid
than that seen on desktop platforms; any potential "universal iOS" target
would need to be updated or versioned regularly. A single named target would
also force developers into supporting older devices that they may not want to
support.

Supporting Android also requires the support of between 2 and 4 architectures
(depending on the range of development and end-user configurations the app needs
to support). Android's archiving-based approach can be mapped onto the "multiple
binary" approach, as it is possible to build a single archive from multiple
individual binaries. However, some coordination is required when installing
multiple binaries. If an independent install pass (e.g., call to `pip`) is used
for each architecture, the dependency resolution process for each platform will
also be independent; if there are any discrepancies in the specific versions
available for each architecture (or any ordering instabilities in the dependency
resolution algorithm), it is possible to end up with different versions on each
platform. Some coordination between per-architecture passes is therefore
required.

## History

[The BeeWare Project](https://beeware.org) provides support for building both
iOS and Android binaries. On both platforms, BeeWare provides a custom package
index that contains pre-compiled binaries
([Android](https://chaquo.com/pypi-7.0/);
[iOS](https://anaconda.org/beeware/repo)). These binaries are produced using a
forge-like set of tooling
rgommers marked this conversation as resolved.
Show resolved Hide resolved
([Android](https://github.com/chaquo/chaquopy/tree/master/server/pypi);
[iOS](https://github.com/freakboy3742/chaquopy/tree/iOS-support/server/pypi))
that patches the build systems for the most common Python binary dependencies;
and on iOS, manages the process of merging single-architecture, single ABI
wheels into a fat wheel.

On iOS, BeeWare-supplied iOS binary packages provide a single "iPhone" wheel.
This wheel includes 2 binary libraries (one for the iPhone device ABI, and one
for the iPhone Simulator ABI); the iPhone simulator binary includes x86_64 and
ARM64 slices. This is effectively the "universal-iphone" approach, encoding a
specific combination of ABIs and architectures.

BeeWare's support for Android uses [Chaquopy](https://chaquo.com/chaquopy) as a
base. Chaquopy's binary artefact repository stores a single binary wheel for
each platform; it also contains a wrapper around `pip` to manage the
installation of multiple binaries. When a Python project requests the
installation of a package:

* Pip is run normally for one binary architecture
* The `.dist-info` metadata is used to identify the native packages - both
those directly requested by the user, and those installed as indirect
requirements by pip
* The native packages are separated from the pure-Python packages, and pip is
then run again for each of the remaining architectures; this time, only those
specific native packages are installed, pinned to the same versions that pip
selected for the first architecture.

[Kivy](https://kivy.org) also provides support for iOS and Android as deployment
platforms. However, Kivy doesn't support the use of binary artefacts like wheels
on those platforms; Kivy's support for binary modules is based on the broader Kivy
platform including build support for libraries that may be required.

## Relevant resources

To date, there haven't been extensive public discussions about the support of
iOS or Android binary packages. However, there were discussions around the
adoption of universal2 for macOS:

* [The CPython discussion about universal2
support](https://discuss.python.org/t/apple-silicon-and-packaging/4516)
* [The addition of universal2 to
CPython](https://github.com/python/cpython/pull/22855)
* [Support in packaging for
universal2](https://github.com/pypa/packaging/pull/319), which declares the
logic around resolving universal2 to specific platforms.

## Potential solutions or mitigations

"Universal2" is a macOS-specific definition that encompasses the scope
of the specific "Apple Silicon" transition ("Universal" wheels also existed
historically for the PowerPC to Intel transition). Even inside the Apple
ecosystem, iOS, tvOS, and watchOS all have different combinations of supported
CPU architectures.

A more general solution for naming multi-architecture binaries, similar to how a
wheel can declare compatibility with multiple CPython versions (e.g.,
`cp34.cp35.cp36-abi3-manylinux1_x86_64`) may be called for. In such a scheme,
`cp310-abi3-macosx_10_9_universal2` would be equivalent to
`cp310-abi3-macosx_10_9_x86_64.arm64`.

Alternatively, this could be solved as an install-time problem. In this
approach, package repositories would continue to store single-architecture,
single-ABI artefacts; however, at time of installation, the installation tool
would allow for the specification of multiple architectures/ABI combinations.
The installer would download a wheel for each architecture/ABI requested, and as
a post-processing step, merge the binaries for multiple architectures into a
single fat binary for each ABI. This would simplify the story from the package
archive's perspective, but would require significant modifications to installer
tooling (some of which would require callouts to platform-specfic build
tooling).

Supporting Android's archiving approach requires no particular modifications to
the "single architecture" solutions in use today. However, there may be a
benefit to the developer experience if it is possible to ensure consistency
in the dependency resolution solutions that are found for each architecture.
The could come in the form of:
1. Allowing for the installation of multiple wheel architectures in a single
installation pass.
2. Sharing dependency resolution solutions between installation passes.
3. Tools to identify when two different install passes have generated different
dependency solutions.
4. A "multi-architecture" Android wheel.
There are two approaches that could be used to provide a general solution to
this problem, depending on whether the support of multiple architectures is
viewed as a distribution or integration problem.

### Distribution-based solution

The first approach is to treat the problem as a package distribution issue. In
this approach, artefacts stored in package repositories include all the ABIs and
CPU architectures needed to meaningfully support a given platform. This is the
approach embodied by the `universal2` packaging solution on macOS, and the iOS
solution used by BeeWare.

This approach would require agreement on any new "known" multi-ABI/arch tags, as
well as any resolution schemes that may be needed for those tags.

A more general approach to this problem would be to allow for multi-architecture
and multi-ABI binaries as part of the wheel naming scheme. A wheel can already
declare compatibility with multiple CPython versions (e.g.,
`cp34.cp35.cp36-abi3-manylinux1_x86_64`); it could be possible for a wheel to
declare multiple ABI or architecture inclusions. In such a scheme,
`cp310-abi3-macosx_10_9_universal2` would effectively be equivalent to
`cp310-abi3-macosx_10_9_x86_64.macosx_10_9_arm64`; an iPhone wheel for the same
package might be
`cp310-abi3-iphoneos_12_0_arm64.iphonesimulator_12_0_x86_64.iphonesimulator_12_0_arm64`.

This would allow for more generic logic based on matching name fragments, rather
than specific "known name" targets.

Regardless of whether "known tags" or a generic naming scheme is used, the
distribution-based approach requires modifications to the process of building
packages, and the process of installing packages.

### Integration-based solution

Alternatively, this could be treated as an install-time problem. This is the
approach taken by BeeWare/Chaquopy on Android.

In this approach, package repositories would continue to store
single-architecture, single-ABI artefacts. However, at time of installation, the
installation tool allows for the specification of multiple architectures/ABI
combinations. The installer then downloads a wheel for each architecture/ABI
requested, and performs any post-processing required to merge binaries for
multiple architectures into a single fat binary, or archiving those binary
artefacts in an appropriate location.

This approach is less invasive from the perspective of package repositories and
package build tooling; but would require significant modifications to installer
tooling.