Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for macOS universal2 builds for ARM-based Macs #473

Closed
Czaki opened this issue Dec 3, 2020 · 35 comments · Fixed by #484
Closed

Support for macOS universal2 builds for ARM-based Macs #473

Czaki opened this issue Dec 3, 2020 · 35 comments · Fixed by #484

Comments

@Czaki
Copy link
Contributor

Czaki commented Dec 3, 2020

There is a plan to change installer from x86_64 to universal. Please see:
pypa/wheel#387 (comment)

@henryiii
Copy link
Contributor

henryiii commented Dec 13, 2020

Looks like a new version of wheel with pypa/wheel#390 will be out soon. Once that's out, if we include that version and use Python 3.9.1, then we should be able to build a universal2 wheel (though I'm not sure how to enable it yet).

We will need to download the Universal2 installer instead of the regular one. It's got a 11.0 in the name, but it is supposed to work with 10.9+.

@Czaki
Copy link
Contributor Author

Czaki commented Dec 13, 2020

We will need to download the Universal2 installer instead of the regular one. It's got a 11.0 in the name, but it is supposed to work with 10.9+.

If I good understand It may also need macOS 11.0 as base system (maybe 10.15). If I good remember there should be Xcode version check before the decision about installer version.

@henryiii
Copy link
Contributor

Xcode 12, so yes.

@ronaldoussoren
Copy link

The universal2 support in CPython currently requires building on macOS 11. That is not a hard system requirement though, the actual requirement is using Xcode 12 (or the command line tools for Xcode 12).

The current code looks at the macOS version because that's was easier to get going. I'll definitely look into replacing that code by something that tests for the compiler version instead of the macOS version. That said, I wouldn't mind if someone provided a PR for that ;-)

@joerick joerick changed the title Universal wheel for macos Support for macOS universal2 builds for ARM-based Macs Dec 16, 2020
@henryiii
Copy link
Contributor

Being worked on in #484.

@mayeut
Copy link
Member

mayeut commented Dec 17, 2020

Just my 2 cents about building universal2 wheels. From past experience with intel wheels, It might not always be a great idea.

From a packager point of view (so, here, user of cibuildwheel):

  1. While the build might seem easy for simple enough packages, I guess it might get a bit more complicated for packages requiring to build other native dependencies, especially if those do not support universal2 builds out of the box, requiring some (well known ?) tricks to combine 2 builds in an universal2 build.
  2. "Yeah, my build and tests are ok with cibuildwheel, let's publish on PyPI"
    Well, we shall remember that those wheels are never tested with the arm64 kicking in (at least for now and until Apple Silicon is available in CI, e.g. Support for VMs on Apple M1 actions/runner-images#2187). This means some testing must be done manually. In Run 32-bit tests on macOS #202, we could see that some wheels were able to build/test with the intel tag but were not being able to just be imported on i686 which implies that cibuildwheel (and probably other tools as mentioned in Run 32-bit tests on macOS #202) was not doing its job properly (IMHO).

From an end-user point of view (I'm a Mac end-user if that matters):

  1. Because of point 2 on the packager side, I expect reports that package some-package just doesn't work.
  2. "Do I really need universal2 ?"
    The simple answer to this question is probably "no, you just need what matches your hardware"
  3. "Why is downloading wheels slower ? / Why did disk usage grow so much ?"
    A simple sample will illustrate. It's basic and flawed but gives the gist of things.
Matt$ python3.8 -m venv cp38
Matt$ source cp38/bin/activate
Matt$ pip install scikit-learn scikit-image
Matt$ du -hs ./cp38/lib/python3.8/site-packages
349M	./cp38/lib/python3.8/site-packages
Matt$ find ./cp38/lib/python3.8/site-packages -type f -a \( -name '*.so' -o -name '*.dylib' \) | xargs du -ch | sort -h
...
...
202M	total

This means if wheels are all packaged as universal2 wheels, the new install would be roughly 551 MB, that's almost a 60% increase.

To sum-up, am I against universal2 wheels ? no, it might make sense for projects like websockets (real small C extension part of a bigger project), simpler for packager, does not change much package size for end-user. That being said, I think there must be an option to target x86_64, arm64 or universal2 (and maybe build universal2 then split it in two wheels).
This will not help with testing arm64/universal2 for now which shall be clearly documented to avoid any misunderstanding that running a test step in cibuildwheel will cover these cases.

Reading material that might be of interest:
https://conda-forge.org/blog/posts/2020-10-29-macos-arm64/

@joerick
Copy link
Contributor

joerick commented Dec 17, 2020

Thank you @mayeut. You know, your argument is making a lot of sense to me. cibuildwheel offers to 'build and test your wheels on all the platforms', but when making a universal2 wheel on x86_64, the arm64 part is completely untested.

Adding onto that the fact that Apple have said Apple Silicon is the future of the mac, IMO it is only a matter of time (probably a few months) before we see CI services offering macOS-arm64 runners. Once we live in that world, it's clear that the best way to run cibuildwheel will be to run one x86_64 runner to build & test x86_64 wheels, and run another arm64 runner for the arm64 wheels.

So I believe we should treat this early universal2 support as a stop-gap - something that users might choose to opt-in to, but we should save our ultimate API design for a world where Mac arm64 CI exists.

@Czaki
Copy link
Contributor Author

Czaki commented Dec 17, 2020

@mayeut I agree with you. But the problem is that Python provides universal installer and does not provide ARM only installer. And wheel takes the platform tag from the installer. As I remember discussion when switching from intel to x86_64, that it is possible to force architecture using compiler flags.

I think that now is a good time to go back to #317. Then it will be possible to test univeral wheel on both machines.

I think that MacOS x86_64 will have long support. I love the idea to have a selection of build universal wheel or separate wheel per architecture.

I still think that part of this job should be done in delicate, which should check if the wheel is really universal.

@ronaldoussoren
Copy link

Assuming any M1 runners have a full install of macOS they can run both x86_64 and arm64 code. I've been using this feature to test my own projects (for a long time, the same mechanism works with older fat binaries).

steps:

  • build
  • arch -arm64 python test
  • arch -x86_64 python test

@Czaki
Copy link
Contributor Author

Czaki commented Dec 17, 2020

So if the package provides only x86_64 or arm wheel then there is no good selection of what version install?

There are plans to provide pure arm python installer?

@ronaldoussoren Did you know how will behave pip from universal interpreter when only x86_64 and arm wheel are available?

@henryiii
Copy link
Contributor

Did you have to make fat extension modules with the fat binaries before? (Python 3.5 is a fat binary, FYI) So can't you make x86_64, arm64, and universal2 extensions from a Python Universal2 install? I don't think there's as much point in providing separate Python installers, Python just isn't that big. But extensions can be huge. And if users provide a x86_64 wheel anyway, why not make the other one ARM only?

The only benefit to a universal wheel is you can download it once and run on both arch's. But how often is your disk connected to two different arch computers? Usually, you do a separate package directory for each runner even in HPC/cloud - and for personal computers, maybe useful if you share a folder via the cloud - but you shouldn't do this with environments in general. The one use case could be making zip apps, but those really don't get used for binary code already due to OS difference (honestly, don't see them used much at all).

Assuming any M1 runners

The problem is that it will be a while before we have M1 runners on CI. I don't think Apple provides a arm64 emulator for Intel...

Did you know how will behave pip from universal interpreter

Pip currently selects the most specific wheel. So you can put a pure-python universal wheel and a set of binary ones, and if the platform matches, you get the binary one, otherwise you get the universal one. So I would assume arm would match before universal2.

@joerick
Copy link
Contributor

joerick commented Dec 17, 2020

Assuming any M1 runners have a full install of macOS they can run both x86_64 and arm64 code.

That's interesting. So once we have M1 runners, we can do both x86_64 and arm64 tests on the same machine.

Still, I think there's an argument that separate x86_64 and arm64 build/test runs might be preferable (rather than building in two test steps for universal2 wheels), see @mayeut's other reasons above.

Assuming any M1 runners

The problem is that it will be a while before we have M1 runners on CI. I don't think Apple provides a arm64 emulator for Intel...

No, they do not. So we do need a universal2 solution, at least in the short term. But I'm happy for it to be a little imperfect (we probably won't be able to test arm64 portions, initially), so long as we know where we want to end up once Apple Silicon CI runners are available.

Did you know how will behave pip from universal interpreter

Pip currently selects the most specific wheel. So you can put a pure-python universal wheel and a set of binary ones, and if the platform matches, you get the binary one, otherwise you get the universal one. So I would assume arm would match before universal2.

This matches my understanding, too. I think (somebody please correct me if I'm wrong) the wheels are chosen in the order of this list. Note that the native arch arm64 is added at the start of the list.

@henryiii
Copy link
Contributor

As far as I can tell, universal was added 23 days ago to that list. So that means that any copy of pip more than 23 days old will not be able to download and use a universal2 wheel on an Intel machine? Actually less than that, since it had to be updated via the vendoring and released. If that's the case, we really cannot stop making x86 wheels any time soon - so it really would be nice to be able to make an ARM wheel.

I don't see technically why you couldn't compile an ARM-only wheel on an Intel machine if you can compile a Univeral2 one.

@mayeut
Copy link
Member

mayeut commented Dec 17, 2020

Assuming any M1 runners have a full install of macOS they can run both x86_64 and arm64 code.

That's interesting. So once we have M1 runners, we can do both x86_64 and arm64 tests on the same machine.

Still, I think there's an argument that separate x86_64 and arm64 build/test runs might be preferable (rather than building in two test steps for universal2 wheels), see @mayeut's other reasons above.

In #202 (dealing with tests for intel tags), I proposed roughly the same things as @ronaldoussoren. The situation seemed even easier since the hardware was able to run both x86_64 and i386 natively.
Running x86_64 on arm64 comes with its own caveats. It runs using Rosetta 2, which means there's a time penalty for JIT translation. This also means you're not running you own binary and that you might hit a Rosetta 2 bug, even if unlikely.
Also, if you have multiple SIMD implementations for a function, you will not be able to test all of them or you might be seeing them running too slow for acceptable test times (I really don't know what happens if you have some AVX/AVX2/AVX512 in your code as it is stated not to be supported, does it fail to translate ? even if not run ? would be curious to see if all tests of pybase64 run or if some are skipped, if it runs at all. Would you be willing to do that test for me @ronaldoussoren ? I can provide all details to get them running).
It still remains an option I wouldn't exclude as it can lead for a simpler setup for packagers, even if providing 2 wheels instead of just the universal2 one. Anyway, M1 runners are not here yet but good to know once they do.

Did you know how will behave pip from universal interpreter

Pip currently selects the most specific wheel. So you can put a pure-python universal wheel and a set of binary ones, and if the platform matches, you get the binary one, otherwise you get the universal one. So I would assume arm would match before universal2.

As far as I can tell, universal was added 23 days ago to that list. So that means that any copy of pip more than 23 days old will not be able to download and use a universal2 wheel on an Intel machine? Actually less than that, since it had to be updated via the vendoring and released. If that's the case, we really cannot stop making x86 wheels any time soon - so it really would be nice to be able to make an ARM wheel.

You can run pip debug --verbose to get that list.
@henryiii, you have another valid point in favor of keeping things separate (or at least giving this option), indeed, pip got support for universal2 in 20.3.
Even a fresh install of the universal2 python 3.9.1 installer, there's no support for universal2

Matt$ python3.9 -m pip -V
pip 20.2.3 from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pip (python 3.9)
MacBook-Pro-de-Matthieu:auditwheel-patchelf Matt$ python3.9 -m pip debug -v | grep -A 10 'Compatible tags'
WARNING: This command is only meant for debugging. Do not use this with automation for parsing and getting these details, since the output and options of this command may change without notice.
Compatible tags: 222
  cp39-cp39-macosx_11_1_x86_64
  cp39-cp39-macosx_11_1_intel
  cp39-cp39-macosx_11_1_fat64
  cp39-cp39-macosx_11_1_fat32
  cp39-cp39-macosx_11_1_universal
  cp39-cp39-macosx_11_0_x86_64
  cp39-cp39-macosx_11_0_intel
  cp39-cp39-macosx_11_0_fat64
  cp39-cp39-macosx_11_0_fat32
  cp39-cp39-macosx_11_0_universal

Matt$ python3.9 -m pip install -U pip
Matt$ python3.9 -m pip -V
pip 20.3.3 from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pip (python 3.9)
Matt$ python3.9 -m pip debug -v | grep -A 10 'Compatible tags'
WARNING: This command is only meant for debugging. Do not use this with automation for parsing and getting these details, since the output and options of this command may change without notice.
Compatible tags: 1776
  cp39-cp39-macosx_11_0_x86_64
  cp39-cp39-macosx_11_0_intel
  cp39-cp39-macosx_11_0_fat64
  cp39-cp39-macosx_11_0_fat32
  cp39-cp39-macosx_11_0_universal2
  cp39-cp39-macosx_11_0_universal
  cp39-cp39-macosx_10_16_x86_64
  cp39-cp39-macosx_10_16_intel
  cp39-cp39-macosx_10_16_fat64
  cp39-cp39-macosx_10_16_fat32

I don't see technically why you couldn't compile an ARM-only wheel on an Intel machine if you can compile a Univeral2 one.

I agree, it might require tricks like the one done for python 3.5 (and mentioned in #484 (comment))

@henryiii
Copy link
Contributor

henryiii commented Dec 17, 2020

So, building on top of #482:

  • cibuildwheel gains an --arch option, which is set to native by default, building 32 & 64 bit where appropriate, but not adding cross compiles. And a matching CIBW_ARCH variable. Maybe CIBW_ARCH_MACOS and such? (see note 1)
  • Python 3.5 and PyPy are disabled on macOS 11 runners for now.
  • If you include universal2 or arm64 in the arch list, then cibuildwheel downloads the new Python 3.9 (and 3.8 eventually, I think?) instead of the old one. Or this could toggle based on OS version or compiler version, but I think having it toggle based on command line options is better?
  • If the new Python is downloaded, then the trick linked previously is used to force x86_64 or arm64 as specified.
  • Testing is disabled if the macOS arch does not match the target (and is not universal2 or arm64).

When we start getting Apple Silicon runners, then we can make sure the arch -x<arch> is prepended for testing when the target arch != the dist arch. This doesn't hurt for normal running, does it? We could always add it on macOS?

Note 1: If you wanted to do native on Windows & Linux, but Universal2 on macOS, it might be hard to nicely write this without specific environment variables like CIBW_ARCH_MACOS. Actually, trying to write an example for dual macOS, it's hard not to write it in such a way that it doesn't also trigger emulated ARM on Linux. However, that's what BUILD/SKIP are for, and these should be pretty easy to select on. And most users will need to break up the CI at least a bit, especially for emulated builds - they are slow. So I think not adding specific variables is also fine?

@joerick
Copy link
Contributor

joerick commented Dec 17, 2020

Yeah, I think that's pretty much bang on, there @henryiii. I think I agree with all of it.

I'll just add, we have to consider what we do with build identifiers as well. I think once we have the --archs filtering, we can add build identifiers for cp39-macosx_universal2 and cp39-macosx_arm64. The arch filtering will ignore universal2 by default, but users can opt into it by doing e.g. cibuildwheel --archs "x86_64,universal2" . (or CIBW_ARCHS_MACOS="x86_64 universal2"). In which case, we'd build both an x86_64 and a universal2 wheel on Python 3.9. Or the user could choose to set --archs "x86_64,arm64". The non-native arch wouldn't be tested on an x86_64 machine, so we would raise a warning about that.

It's worth mentioning, people might be getting confused about how build identifiers and BUILD/SKIP differ from the --archs option. --archs should only be used when a user wants to build something other than the native architecture of the machine. So, on Linux it can be used to build through emulation. On macOS it's used to cross-compile. Essentially, BUILD/SKIP should remain the normal user interface, --archs is for when you're doing something a little unusual. We can make this clear through the documentation and examples.

@henryiii
Copy link
Contributor

Yes, the one somewhat open point was that these mix. Say I don't want to do emulation (it's slow), but I do want to build Apple Silicon wheels. If I just do --archs=native,arm64, that will add the ARM builds - both for macOS, and Linux (assuming this is in a matrix - obviously everything is a bit simpler if it not). I now have to filter with my BUILD/SKIP to make sure I don't enable the emulated Linux one. But I think that's okay - the work really should be part of the BUILD/SKIP, and --archs should be about enabling "special" things (though due to the construction, it happens to be a way to limit yourself to 32/64 bit builds too if you use x86_64 or <whatever 32 is called> instead of native.

@ronaldoussoren
Copy link

ronaldoussoren commented Dec 18, 2020

So if the package provides only x86_64 or arm wheel then there is no good selection of what version install?

I'd like to see that packages start providing "universal2" wheels instead of architecture specific ones.

There are plans to provide pure arm python installer?

Not at this time. The current plan is to drop the x86_64 installer for Python 3.10 and only have a universal2 installer for now.

@ronaldoussoren Did you know how will behave pip from universal interpreter when only x86_64 and arm wheel are available?

I don't know for sure, but looking at the packaging code I'd say that the pip will install a native package when available.

@henryiii
Copy link
Contributor

henryiii commented Dec 18, 2020

I'd like to see that packages start providing "universal2" wheels instead of architecture specific ones.

For a while, almost all libraries will have to produce both x86_64 and either universal2 or arm64_64. It makes more sense / saves bandwidth and disk space to produce x86_64 and arm64. Once pip 20.3 is very common, then they could be combined - though since pip downloads the correct file automatically, I don't really see any advantage to Universal2 unless you are building by hand - something like cibuildwheel is just as happy making both sets of wheels. Or possibly if you are including a .so that is Universal, so you don't get any space savings by splitting them up in Python.

The built-in Python 3.8 and homebrew's Python 3.9 on macOS 11 do not include pip 20.3 yet. Which is really bad, actually, since macOS 11 even on Intel still requires 20.3 to download even regular wheels. Everything breaks immediately on trying to build NumPy. But at least I asked for it by updating to 11.0. :) But for 10.15 and before, there are a lot of older Pip's and having things like NumPy try to download from source because there's only a Universal2 wheel would be a disaster. So libraries have to provide two wheels for now.

The current plan is to drop the x86_64 installer for Python 3.10 and only have a universal2 installer for now

Since we can build all three from this one installer, maybe we should likely just switch to using this installer for 3.9 (like we only download the non-fat installers for all Python's that support it), and then use our workaround to build x86_64 unless asked to do differently. Unless for some reason it can't build x86_64 against an old 10.14 or unupdated 10.15 Xcode. I was thinking we could have it toggle on based on some criteria, but maybe that's not needed.

@ronaldoussoren
Copy link

The advantage of a univeral2 wheel is that there's only a single wheel for macOS, not multiple. As I wrote earlier it is possible to build and test both architectures in a single go when using an M1 builder. Building "universal2" wheels is pretty trivial, this works out of the box when you don't have to build C libraries and most libraries I've wrapped myself build as universal out of the box as well (one exception to the rule are libraries that compile different sets of files for different architectures, such as OpenSSL)

The users that really need this are those that redistribute wheels and in particular users of py2app, pyinstaller and the like. With universal2 wheels it is possible to build an application bundle that's a Universal Application. With per-architecture wheels this is close to impossible because those tools use the packages installed on the system.

So, please provide an easy way to build "universal2" wheels.

@ronaldoussoren
Copy link

Note that most software will have the same test results for both architectures. In the past the exception to this were low-level packages using architecture-specific code (for example by using libffi).

This time there are some system level changes as well, although the only ones I know of are (1) all arm64 code must be signed (the compiler will automatically add ad-hoc signatures for this), en (2) the low-level mach timer APIs have a different resolution.

To make testing fun: I've seen some reports that the Rosetta emulation software does not implement some vector instructions. That could affect testing some numeric code when optimising for the x86_64 CPU's in Apple hardware.

@Czaki
Copy link
Contributor Author

Czaki commented Dec 19, 2020

Note that most software will have the same test results for both architectures. In the past the exception to this were low-level packages using architecture-specific code (for example by using libffi).

One of the biggest things to be checked in the test is verification if all needed libraries are available. As was mentioned above the biggest problem with intel wheels is the lack of some dependencies.

@ronaldoussoren
Copy link

Assuming any M1 runners have a full install of macOS they can run both x86_64 and arm64 code.

That's interesting. So once we have M1 runners, we can do both x86_64 and arm64 tests on the same machine.
Still, I think there's an argument that separate x86_64 and arm64 build/test runs might be preferable (rather than building in two test steps for universal2 wheels), see @mayeut's other reasons above.

In #202 (dealing with tests for intel tags), I proposed roughly the same things as @ronaldoussoren. The situation seemed even easier since the hardware was able to run both x86_64 and i386 natively.
Running x86_64 on arm64 comes with its own caveats. It runs using Rosetta 2, which means there's a time penalty for JIT translation.

The time penalty for translation shouldn't be an issue for CI, even for interactive use the overhead of initial translation isn't too bad (and that's on my DTK, M1 systems should be significantly faster). The primary issue with Rosetta 2 is that this is optional software, which means future M1 CI runnings in the various public CI systems might not have it installed.

This also means you're not running you own binary and that you might hit a Rosetta 2 bug, even if unlikely.
Also, if you have multiple SIMD implementations for a function, you will not be able to test all of them or you might be seeing them running too slow for acceptable test times (I really don't know what happens if you have some AVX/AVX2/AVX512 in your code as it is stated not to be supported, does it fail to translate ?

From what I've read (no references, sorry) using unsupported instructions will crash as runtime, testing for them at runtime (IIRC using CPUID) should work. That requires explicit support in software and likely isn't done (especially because clang and mach-o don't support the GCC function attribute 'target_clones' that allows compiling a function for a number of CPUs with dynamic selection of the best variant)

BTW. Isn't "not being able to test all SIMD variants" an issue in general unless you can arrange to run tests on a system that supports all those variants?

even if not run ? would be curious to see if all tests of pybase64 run or if some are skipped, if it runs at all. Would you be willing to do that test for me @ronaldoussoren ? I can provide all details to get them running).

I can test, with the caveat that I only have access to a DTK system and not an M1 system. I'm not sure if that effects Rosetta 2 emulation. I have ordered an M1 laptop though, and that should arrive this year.

It still remains an option I wouldn't exclude as it can lead for a simpler setup for packagers, even if providing 2 wheels instead of just the universal2 one. Anyway, M1 runners are not here yet but good to know once they do.

Did you know how will behave pip from universal interpreter

Pip currently selects the most specific wheel. So you can put a pure-python universal wheel and a set of binary ones, and if the platform matches, you get the binary one, otherwise you get the universal one. So I would assume arm would match before universal2.

The code suggests as much. That's something I don't like at all, I'd prefer to get a "universal2" wheel when running a "universal2" python.

As far as I can tell, universal was added 23 days ago to that list. So that means that any copy of pip more than 23 days old will not be able to download and use a universal2 wheel on an Intel machine? Actually less than that, since it had to be updated via the vendoring and released. If that's the case, we really cannot stop making x86 wheels any time soon - so it really would be nice to be able to make an ARM wheel.

You can run pip debug --verbose to get that list.
@henryiii, you have another valid point in favor of keeping things separate (or at least giving this option), indeed, pip got support for universal2 in 20.3.
Even a fresh install of the universal2 python 3.9.1 installer, there's no support for universal2

Matt$ python3.9 -m pip -V
pip 20.2.3 from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pip (python 3.9)
MacBook-Pro-de-Matthieu:auditwheel-patchelf Matt$ python3.9 -m pip debug -v | grep -A 10 'Compatible tags'
WARNING: This command is only meant for debugging. Do not use this with automation for parsing and getting these details, since the output and options of this command may change without notice.
Compatible tags: 222
  cp39-cp39-macosx_11_1_x86_64
  cp39-cp39-macosx_11_1_intel
  cp39-cp39-macosx_11_1_fat64
  cp39-cp39-macosx_11_1_fat32
  cp39-cp39-macosx_11_1_universal
  cp39-cp39-macosx_11_0_x86_64
  cp39-cp39-macosx_11_0_intel
  cp39-cp39-macosx_11_0_fat64
  cp39-cp39-macosx_11_0_fat32
  cp39-cp39-macosx_11_0_universal

Matt$ python3.9 -m pip install -U pip
Matt$ python3.9 -m pip -V
pip 20.3.3 from /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pip (python 3.9)
Matt$ python3.9 -m pip debug -v | grep -A 10 'Compatible tags'
WARNING: This command is only meant for debugging. Do not use this with automation for parsing and getting these details, since the output and options of this command may change without notice.
Compatible tags: 1776
  cp39-cp39-macosx_11_0_x86_64
  cp39-cp39-macosx_11_0_intel
  cp39-cp39-macosx_11_0_fat64
  cp39-cp39-macosx_11_0_fat32
  cp39-cp39-macosx_11_0_universal2
  cp39-cp39-macosx_11_0_universal
  cp39-cp39-macosx_10_16_x86_64
  cp39-cp39-macosx_10_16_intel
  cp39-cp39-macosx_10_16_fat64
  cp39-cp39-macosx_10_16_fat32

I don't see technically why you couldn't compile an ARM-only wheel on an Intel machine if you can compile a Univeral2 one.

I agree, it might require tricks like the one done for python 3.5 (and mentioned in #484 (comment))

As mentioned elsewhere I'd prefer to see "universal2" wheels everywhere and no architecture specific wheels. I guess it is not possible to avoid building x86_64 wheels for now because you need a pretty recent copy of pip for "universal2" support, but other than that architecture specific wheels have no clear advantages and do have a disadvantage: building a Universal Application using py2app and pyinstaller requires using "universal2" wheels.

@mayeut
Copy link
Member

mayeut commented Dec 19, 2020

The time penalty for translation shouldn't be an issue for CI, even for interactive use the overhead of initial translation isn't too bad (and that's on my DTK, M1 systems should be significantly faster). The primary issue with Rosetta 2 is that this is optional software, which means future M1 CI runnings in the various public CI systems might not have it installed.

Thanks for that feedback, do you have estimates on translation times ? let's say with a first translation of numpy (or even interpreter + numpy but I guess the interpreter is already translated and there might not be a way to partially clear rosetta cache) ?

BTW. Isn't "not being able to test all SIMD variants" an issue in general unless you can arrange to run tests on a system that supports all those variants?

Yes, you're right. Even though I doubt we have or will ever have any way to test AVX512 on macOS CI available systems (maybe I'm wrong here, did not check for AVX512), AVX and AVX2 are supported by most, if not all, CI providers.

I can test, with the caveat that I only have access to a DTK system and not an M1 system. I'm not sure if that effects Rosetta 2 emulation. I have ordered an M1 laptop though, and that should arrive this year.

Thanks, here are the steps (not sure arch -x86_64 is required inside the venv):

arch -x86_64 python -m venv pybase64_test
source pybase64_test/bin/activate
arch -x86_64 python -m pip install pybase64 pytest
arch -x86_64 python -m pytest pybase64_test/lib/python3.8/site-packages/pybase64/tests

Looking forward to see if it works and how many tests are skipped.

As mentioned elsewhere I'd prefer to see "universal2" wheels everywhere and no architecture specific wheels. I guess it is not possible to avoid building x86_64 wheels for now because you need a pretty recent copy of pip for "universal2" support, but other than that architecture specific wheels have no clear advantages and do have a disadvantage: building a Universal Application using py2app and pyinstaller requires using "universal2" wheels.

Let me try to answer all those points.
First, thanks for bringing-up a use-case that clearly benefits from universal2 wheels.
I do not agree (c.f. previous comments) that architecture specific wheels have no clear advantages. The use-case you bring-up is, IMHO, not an "end-user" use-case. Here I define "end-user" as the one that will run the software.
For things like a final package app, I do understand the value (depending on your targeted audience) of having an universal2 installer:

  • End-users do not have to choose between 2 installers flavors they might not even understand
  • Portable Apps
  • Binaries might only be a small fraction of the installer size so that does not matter for download speed
  • Less burden for packager

As an end-user on macOS, I don't want my (costly) SSD running out of space because of things that are not needed that means I would expect the installer to strip unneeded arch at installation stage (not applicable for "portable" apps or more advanced usage depending on your target audience) and that when I do pip install somepackage, I don't have an overhead related to universal2.
As I see it, universal2 wheels only has disadvantages for x86_64 end-users. I think this is also the case for arm64 end-users. One might argue that this might not be the case for arm64 end-users in this transition period where you still might need to run things using rosetta 2 translation but in this case, for wheels end-users, I think you're already an advanced user knowing you can run arch -x86_64 python and that you'll know that if you are running things this way, your python platform libs are probably a mix of universal2, arm64 and x86_64 and thus you're better of using virtual environments.

Given the use-case you propose, I would probably be in favor of having the 3 wheels built as a default setting (i.e. not choose between architecture specific on one hand and universal2 on the other). I can see 2 workflows to that end:

  • flow for "simple" wheels: build universal2, split wheel to produce x86_64, arm64, test what we can on all 3.
  • flow for "complex" wheels: build x86_64 and arm64, merge wheels to produce universal2, test what we can on all 3.

This would probably require some changes to occur in delocate to allow splitting/merging.

End-users would get smaller wheels while "universal2 packagers" would still be able to get universal2 wheels.

This would also probably require some changes in packaging/pip to provide an option to only retrieve universal2 wheels for those wanting 100% universal2 platform wheels.

Regarding pip not being able to install universal2 on x86_64, while we can't make a direct comparison, we can get a feeling reading the following comment in manylinux. It suggests that 99,92% of the linux systems downloading manylinux wheels are manylinux2010 compliant but only 76,9% of them are able to download those wheels because of an outdated pip (introduced almost 2 years ago). For manylinux2014, it's 99,4%/60,2% (after 1 year). Obviously, this will change for python 3.10 but probably not 3.8/3.9

I think that this issue raises points that are going far beyond the scope of cibuildwheel and might be better discussed on https://discuss.python.org/c/packaging/14 in order to get a wider audience and feedbacks. Any thoughts ?

@Czaki
Copy link
Contributor Author

Czaki commented Dec 19, 2020

I think that this issue raises points that are going far beyond the scope of cibuildwheel and might be better discussed on discuss.python.org/c/packaging/14 in order to get a wider audience and feedbacks. Any thoughts ?

Good idea.

@YannickJadoul
Copy link
Member

Chiming in on the way this gets added to cibuildwheel: @joerick and @henryiii, you seem both to be making implicit arguments against --arch.

It's worth mentioning, people might be getting confused about how build identifiers and BUILD/SKIP differ from the --archs option. --archs should only be used when a user wants to build something other than the native architecture of the machine.

Say I don't want to do emulation (it's slow), but I do want to build Apple Silicon wheels. If I just do --archs=native,arm64, that will add the ARM builds - both for macOS, and Linux (assuming this is in a matrix - obviously everything is a bit simpler if it not).

So I'm hesitant to mix the two meanings of --arch here, as you're already indicating there will be confusion and inelegant workarounds in CI configuration (I still need to catch up on the qemu PRs, though; sorry for the delay, there). Is there a reason we cannot do this as part of the build identifiers? I'm not convinced that that's amazing either, btw, but... well, the identifiers already contain the architecture, and there's nothing to be "enabled" (and checked) like for qemu?

But maybe this is a discussion that's more appropriate in #484, as these are for cibuildwheel's implementation of options, etc?

@YannickJadoul
Copy link
Member

Given the use-case you propose, I would probably be in favor of having the 3 wheels built as a default setting (i.e. not choose between architecture specific on one hand and universal2 on the other). I can see 2 workflows to that end:

  • flow for "simple" wheels: build universal2, split wheel to produce x86_64, arm64, test what we can on all 3.

  • flow for "complex" wheels: build x86_64 and arm64, merge wheels to produce universal2, test what we can on all 3.

I quite like that idea, if that's possible! One caveat is that I would expect/prefer to see exactly one flow that's advised; it's going to be only confusing if there's multiple ways of doing things without a good technical reason to do so?
Also, what about these external libraries & dependencies? Can you just split/merge these as well?

At any rate, the universal2 wheel would almost be "for free", this way? So we could just add an option CIBW_MACOS_UNIVERSAL_WHEEL (by default on, I'd say) that could disable this if you really, really don't want to produce it, but otherwise always produce one?

@henryiii
Copy link
Contributor

While it depends on how this shapes up, I'm strongly in favor of --arch at least for Linux emulation. It should never be the default to emulate an architecture when building - for a rough ballpark estimate, imagine that emulated architectures take 10x longer to build. And test too, technically, though generally testing is fast. This means it needs to be opt-in; personally I think it maybe even should be a command-line only argument, since it's something that controls if cibuildwheel tries to emulate or not. In fact, it works really elegantly there - autodetect the arch(s) available, or manually specify them - exactly symmetric with --platform (honesty, it seems to be even more useful than platform, though Imm guessing you can specify linux on any docker enabled host?).

Cross-compilation (still referring to Linux here) is harder. It's extremely useful - it lets you save most of that 10x speed penalty mentioned above. But it's hard - you have to set things up to target something you are not running on, and things like setuptools even seem to hard code in incorrect shebang lines when they make scripts if you are cross-compiling - the idea that the running Python is the target Python is harder to wrap one's head around.

macOS is special - though it's much closer to cross-compiling than it is to emulation (at least, building AS on Intel is). Because Apple controls the toolchain, excellent support for cross compiling, and specially "universal" compiling where both possibilities are compiled, is pretty easy and commonly supported. Many programs (for now) are shipping in only universal forms - like CMake (which is currently causing me pain due to the filename change in several places). And this is the direction that Python is moving.

However, there's a big difference for Python packages - those are almost always downloaded by your package manager (pip), not the user. Other package managers universally (no pun intended) are not using Universal downloads - Conda, Homebrew, etc. They (including pip) already have multiple downloads for different situations, and adding x86_64 and ARM64 is not an issue at all; and the space and download savings is 2x! A large Python environment with something like PyTorch or Tensorflow can be over a GB when you factor in dependencies; if those packages only shipped universal wheels, both arch's would have venvs that would double in size. PyPI's downloads for macOS would double, etc. Now if Pip for example could strip a universal wheel when it unpacks it, the storage space issue would be solved - I have no idea what's possible for merging and splitting universal binaries/wheels. There are a few reasons to like universal wheels, yes, but most users creating an environment with pip will be adversely affected by being forced to download universal wheels when their package manager knows exactly what arch it's on. Imagine if we had universal wheels for linux, that packed i686, x86_64, ARM64, PowerPC, etc? That would be a mess, and I don't see why only universal wheels for macOS is much better. You only have a few copies of Python, so having that universal is fine - if it's a few MB, it's not a big deal. (And I'd always get Python from homebrew anyway - I don't think I have ever downloaded the one from Python.org to one of my Macs).

Now, for cibuildwheel users, there are several possibilities:

  1. Small package, few downloads: Providing a x86_64 wheel and a Universal wheel should be just fine. Eventually, maybe even a Universal 2 wheel.
  2. Large package, few downloads: Just x86_64 and arm64 would probably be fine.
  3. Any package with a lot of downloads (scikit-learn, matplotlib, NumPy (though that's not a cibuildwheel project)): I think they should provide all three. That way, you an get a universal wheel if you need to make a "zipapp"-like bundle, but most users don't have to download both arch's.

Anyway, getting back to the topic at hand, armed with the points above: Selecting the cross compilation arch and the emulation arch conceptually are a bit irritating when mixed: If you have to add --arch arm64 to get Apple Silicon builds, that does not automatically mean that you want arm64 emulation on Linux. If your build takes 30 minutes, that will blow up the Linux one but not the macOS one, because they are conceptually different. But, we do have selectors - and the main reason to add --archs is to not add a bunch of surprising selectors that will kill the job's compilation time - Linux emulation should be opt-in. Maybe we could have --xarchs, which sets up cross compiling arch's, and macOS currently would always be specified with --xarchs universal2,arm64 (or just one). So something like this:

--xarchs univeral2,arm64 # Enables the universal2 and arm64 selectors on Mac - eventually may enable arm64 selectors on linux if we add cross-compilation support for Linux some day
--archs auto,arm64 # Enables the arm64 selectors on Linux - eventually might enable arm64 emulation on Mac if that were to show up some day

For each selector, it only is enabled if supported. If both xarch and arch contain a selector, and they both are supported on that platform, then cross compiling is used for the compile, and emulation is used for testing.

--xarchs is all about enabling, not selecting - that's done by CIBW_BUILD and CIBW_SKIP. If you don't want x86 macOS builds, you filter it out. --archs is too, though it does happen to have "auto" as a default, which you can replace with a single arch, causing it to sort-of "filter". But that's just for consistency with --platform.

Sadly, there is some platform overlap - if you want to only emulate-build arm64 linux, but also want arm64 macOS builds, and we've added cross-compile support, you'd need separate runs of cibuildwheel to support that. But it's pretty minimal.

Once we support building on Apple Silicon hosts, then --arch x86_64 would enable emulated testing of x86_64 and Universal2 builds.

@henryiii
Copy link
Contributor

(Note, this is the design I'm thinking of, not averse to others, but haven't thought them through as much)

@Czaki
Copy link
Contributor Author

Czaki commented Dec 20, 2020

When we talk about macos problems. Did the new clang version supports OpenMP? Or, if code needs its usage, then still gcc usage is mandatory? Because I meet packages compiled with gcc, which may not support unversal2 compilation.

@mayeut
Copy link
Member

mayeut commented Dec 20, 2020

I've not thought that much in terms of implementation details or even actual user facing options yet (might have time to do that the week after next).

Given the use-case you propose, I would probably be in favor of having the 3 wheels built as a default setting (i.e. not choose between architecture specific on one hand and universal2 on the other). I can see 2 workflows to that end:

  • flow for "simple" wheels: build universal2, split wheel to produce x86_64, arm64, test what we can on all 3.
  • flow for "complex" wheels: build x86_64 and arm64, merge wheels to produce universal2, test what we can on all 3.

I quite like that idea, if that's possible! One caveat is that I would expect/prefer to see exactly one flow that's advised; it's going to be only confusing if there's multiple ways of doing things without a good technical reason to do so?

Well, if we want only 1 workflow, the "complex" one will always work. I can only see 2 issues with this flow:

  • It's not "easily" reproducible locally (i.e. without taking appropriate steps, the default with an official universal2 installer is to build an universal2 wheel).
  • You need 2 builds instead of 1 which might take a bit more time.

It will require support, probably in delocate, to merge those wheels in an universal2 one (at this point, I think any option would require delocate to be updated in order to get "optimal" wheels)

In the meantime, the universal2 can also be built on its own (so 3rd build) as a default but there must be a way to disable this build (extensions having arch specific optimizations or depending on libraries that do not support universal2 might not be able to build using universal2, I certainly am in this case with pybase64 but, as it is probably an edge case, I think opting-out is probably the way to go here).

Also, what about these external libraries & dependencies? Can you just split/merge these as well?

Yes, all the tools exist for this. I think the best place to integrate this would be in delocate.
However, everything gets a bit more complicated when talking about external libraries & dependencies.

I can think of 3 cases for dependencies if building using universal2:

  • It supports universal2 out-of-the box: Nothing to worry about except stripping when building specific arch wheels (and that's why it should probably be handled by delocate)
  • It does not support universal2 and has a configure step that does not generate headers used by its API that are arch specific: In this case the user of cibuildwheel will have to merge the binaries to make the dependency an universal2 one.
  • It does not support universal2 and has a configure step that generates headers used by its API that are arch specific: In this case just merging the binaries is not enough, special care must be taken about headers...

IMHO, given those 3 cases, I think the build twice / merge once is the only option that works everywhere and that building (rather than merging) universal2 should only be a stop-gap that can be disabled while waiting for a delocate that can handle merging binaries (and, if the wheel itself is providing arch specific headers - maybe this kind of wheel exist -, maybe keep that opt-out at that point just to disable universal2 altogether).

This flow might create some concerns once Apple Silicon runners are available in CI.
If builds are done on separate VMs, where shall the merge happen ?
One option would be to always build on Apple Silicon runners, tests all 3 wheels there but also test x86_64/universal2 on an x86_64 runner (even if #317 does not land, I think it can be easy enough, depending on your CI, to add a specific test running on x86_64) .

@henryiii
Copy link
Contributor

henryiii commented Dec 20, 2020

I feel we (cibuildwheel) probably shouldn't try too hard to force a particular workflow - exactly the same thing will not work across projects, especially this early stage. I do think the simplest thing for most projects, and the path officially supported by Apple for applications (as I mentioned, I think packages in a package managed system are slightly different) is Universal binaries - it's quite possible the "simplest" workflow would be to build universal binaries, and then split off at least an x86_64 wheel. Only if a package cannot build universal should we have the workaround path of building separately (and, in the future, this path may end up becoming the main one if Apple Silicon runners become common, with a merge instead of a split).

Building packages could be quite tricky without Universal wheels for all dependencies. What happens if I build a package and it relies on libX via pyproject.toml - if setup.py imports it, I need the x86_64 or Universal wheel, then if I build against it, I need the arm64 or Universal wheel. Without Universal wheels, pip has to know about what I'm trying to build to get the right package. Pip either has to be smart about what I'm doing or I have to have a way to force universal wheels even when there's a better match (which is exactly the right behavior most of the time).

One argument against universal2 identifiers suddenly appearing with no opt-in is that most core libraries (like NumPy) don't have universal wheels, so you aren't likely to be able to depend on a binary wheel (though in many cases you don't make static links, so maybe that's often okay?). Obviously, there's also the situation with adding arbitrary libraries, too.

Did the new clang version supports OpenMP?

Clang has supported OpenMP for at least three years, maybe more? I know I first wrote about it for High Sierra. Apple doesn't build libomp for you and therefore it's not as simple as -fomp, but other than that, it's supported. I assume that will never change.

@joerick
Copy link
Contributor

joerick commented Dec 20, 2020

Phew. Great discussion. Lots of decisions to be made here.

It seems that we've agreed that while universal2 is something of an inconvenience for cibuildwheel (both from a user interface and perhaps a packagers' point of view), it's also something that we need to support, because some users need it, and it will be good for the ecosystem.

To move forward, we'll need an API design that works today (when M1 CI runners don't exist), but will grow to work when they arrive, and will fit into the --arch option being worked on over in #482. The "build universal2 wheel and then strip archs" approach has great merit, but it's perhaps more of an optimisation that could be applied once we have something less efficient working.

So to think about the interface, I like @henryiii 's line of thinking as a starting point.

Selecting the cross compilation arch and the emulation arch conceptually are a bit irritating when mixed: If you have to add --arch arm64 to get Apple Silicon builds, that does not automatically mean that you want arm64 emulation on Linux.

This problem goes away if we have an CIBW_ARCHS_MACOS option, or it could also be handled in a CI config matrix. Also, the Linux name for that arch is aarch64, not arm64, so that might even be an invalid configuration on linux.

But, we do have selectors - and the main reason to add --archs is to not add a bunch of surprising selectors that will kill the job's compilation time - Linux emulation should be opt-in. Maybe we could have --xarchs, which sets up cross compiling arch's, and macOS currently would always be specified with --xarchs universal2,arm64 (or just one). So something like this:

--xarchs univeral2,arm64 # Enables the universal2 and arm64 selectors on Mac - eventually may enable arm64 selectors on linux if we add cross-compilation support for Linux some day
--archs auto,arm64 # Enables the arm64 selectors on Linux - eventually might enable arm64 emulation on Mac if that were to show up some day

For each selector, it only is enabled if supported. If both xarch and arch contain a selector, and they both are supported on that platform, then cross compiling is used for the compile, and emulation is used for testing.

I quite like this approach. xarchs is make a little obtuse, but --cross-compile-archs would work. We can also do this using env vars, so something like CIBW_CROSS_COMPILE_ARCHS and (CIBW_CROSS_COMPILE_ARCHS_MACOS etc) makes this very expressive.

This is certainly expressive enough to cover the possibility space and give the user enough control. I do wonder, though, to @YannickJadoul's point, if we might be over-complicating this a little.

That said, if we can design good enough defaults, maybe most users wouldn't need to touch it.

So, for defaults, let's try a scenario:

Option 1
  • on x86_64:
    • CIBW_ARCHS: auto, meaning x86_64 only.
    • CIBW_CROSS_COMPILE_ARCHS: universal2
  • on arm64:
    • CIBW_ARCHS: auto, meaning arm64,universal2 (arm64 can cross-compile x86_64, and emulate it very efficiently though Rosetta 2 for testing)
    • CIBW_CROSS_COMPILE_ARCHS: universal2

So in this case, the x86_64 runner builds and x86_64 wheel, and universal2 is built there but not tested. The arm64 runner builds and tests arm64 and universal2 though emulation.

This kinda stinks, because the user has to manually configure somewhere in the CI from which runner to get the universal2 wheel, and the arm64 one is actually somehow 'better' because it's been tested. It's also overspecified, because having an arch in CROSS_COMPILE_ARCHS and ARCHS means cross compile and emulate to test, but then what does having universal2 in just ARCHS mean? To emulate both build and test? Why would be bother supporting that?

So maybe rather than CIBW_CROSS_COMPILE_ARCHS, we have CIBW_ARCHS_BUILD_ONLY? Then maybe defaults look like:

Option 2
  • on x86_64:
    • CIBW_ARCHS: auto, meaning x86_64 only.
    • CIBW_ARCHS_BUILD_ONLY: universal2
  • on arm64:
    • CIBW_ARCHS: auto, meaning arm64,universal2
    • CIBW_ARCHS_BUILD_ONLY:

Now, setting CIBW_ARCHS: universal2 in the x86_64 runner is an obvious misconfiguration, because we can't test that. But then it's not a misconfiguration if the user didn't set CIBW_TEST_COMMAND. So that's confusing. Hang on, do we really need this new option? How about:

Option 3
  • on x86_64:
    • CIBW_ARCHS: auto, meaning x86_64 only.
  • on arm64:
    • CIBW_ARCHS: auto, meaning arm64,universal2

Then in the interim, before arm64 runners are available, we document that to get a universal2 wheel, set CIBW_ARCHS_MACOS=auto,universal2. We make sure to document that while this works, it's not testing the arm64 portions of the wheel. We can also add a runtime warning to that effect.

Option 4

Or, even the simplest option of all, would be to not even use CIBW_ARCHS for this. Just BUILD/SKIP and the build selectors. By default, x86_64 builds all wheels x86_64,universal2, arm64, but only tests the x86_64 wheel and x86_64 portions of the universal2 wheel. We raise runtime warnings and documentation to this effect. When arm64 runners arrive, users must use their CI matrix features to configure some kind of split between x86_64 and arm64 runners. Or maybe just use arm64 runners and do all the x86_64 wheel through crosscomp/rosetta2.


Apologies for the long post. But it's been useful to run a few scenarios. I'm leaning more towards option 3, myself, because once arm64 runners arrive, it will provide the simplest config whose defaults do the 'most right' thing, and it doesn't seem crazy hard to understand. But option 4 also has some merits, in that it's simpler, less to understand, and since x86_64 will likely phase out in the long-term, will probably be where we all end up. But it might take us 2-3 years to get to that point!

Curious to hear opinions.

@henryiii
Copy link
Contributor

henryiii commented Dec 20, 2020

The reason for having variables in the first place come down to these two points:

  1. We don't want emulated or cross compiling to show up by default in existing workflows (or ever, really). Running cibuildwheel out-of-the-box should not run emulated architectures - in fact, it's nearly perfectly symmetric with --platform this way, auto detected but overridable. Your output wheels should not depend on whether macos-latest is 10.15 or 11, so that should also be opt-in (though not quite as critical). Great granularity in opt-in is, however, not important - we have CIBW_BUILD/SKIP for that.
  2. There are choices to be made, maybe not now, but eventually if both cross compiling and emulation are supported: which do you chose? There is no right answer for all projects. For a complex compile, the effort required to make cross compilation work might well be worth it - the performance difference is huge. But that's not something you can control with selectors. Both forms produce the same output wheel and have the same selector.

Universal vs. native is not in the list above - that's a bit of a special case. The issue that might come up is that x86_64 is shared between OS's, so it will be a little tricker to filter out macOS x86_64 someday. But it's not that hard with CIBW_SKIP. arm64 vs aarch64 is easy, at least. :)

How about CIBW_MACOS_FORMAT=universal2,x86_64,arm64? That is, it's only a macOS variable, as it's not really related to the arch's on Linux. Then eventually we could grow some system cross compiling on Linux if we can support it that might or might not look similar. But there's no case where you'd really want to merge these, the options are different, so they should stay separate (that is, no CIBW_FORMAT, which is why I didn't put MACOS at the end). We could default to CIBW_MACOS_FORMAT=auto (native only).

PS: Note that "universal2" is not really an arch, and it's already macOS specific. So adding it to ARCHS seems odd - the above avoids that.

In regards to an earlier point: I think Pip 20.3+ will be common on macOS well before it becomes common on Linux (CentOS 7 has Pip 9), but Homebrew and Apple's command line tools still both provide < 20.3. In fact, cibuildwheel also does until the current update PRs go in! :)

@joerick
Copy link
Contributor

joerick commented Jan 5, 2021

Checking back in here after some work has gone into #484.

Currently, the strategy I'm working on is that we'll use the existing CIBW_ARCHS variable to control this. On x86_64 the default is x86_64. On arm64, the default is arm64 universal2. This gives the following properties:

  • Once Apple Silicon CI runners are available, running cibuildwheel as part of a matrix with both x86_64 and arm64 runners without specifying any options will do the 'right' thing - building and testing as much natively as possible, and building universal2 wheels on arm64 where the x86_64 part can be tested via Rosetta2.
  • The user still has full control - they can cross-compile in any direction by setting CIBW_ARCHS. You can build universal2 and arm64 on x86_64 runners. cibuildwheel will raise a warning that parts of these wheels are not tested, if TEST_COMMAND is set.
  • Before Apple Silicon CI is available, we'll recommend users set CIBW_ARCHS_MACOS="x86_64 universal2 arm64" so packagers can start to build for Apple Silicon.

I realise this isn't a perfect solution - universal2 isn't really an arch, and there's no distinction between cross-compiling and emulation in our API. But I think those concerns are mostly theoretical, and this provides a pragmatic way forward without increasing the API surface area too much.


An aside: while working on this, I had some vague thoughts about our build selectors (#516)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants