Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build for Apple M1 silicon #49

Closed
mattip opened this issue Jan 10, 2021 · 20 comments
Closed

Build for Apple M1 silicon #49

mattip opened this issue Jan 10, 2021 · 20 comments

Comments

@mattip
Copy link
Collaborator

mattip commented Jan 10, 2021

I am not sure how we can convince multibuild/CMake to build OpenBLAS for Apple M1 silicon. Can we directly use the ARM64 artifact?

@ogrisel
Copy link
Contributor

ogrisel commented Jan 10, 2021

I think conda-forge is using a cross-compiling setup to build the macos/arm64 version of its packages from a macos/x86_64 host (e.g. on github actions or azure pipelines):

https://conda-forge.org/blog/posts/2020-10-29-macos-arm64/

The cross-compilation feature is apparently a feature of conda-build and uses the MacOSX11.0.sdk already installed on Azure Pipelines. I don't know how to do the same with CMake directly but by reading the blog post above it looks like a significant amount of work. Furthermore it's not possible to run the tests of the resulting build on the x86_64 host.

Note (for openblas): the conda-forge macos/arm64 build recipe sets the target to VORTEX: https://github.com/conda-forge/openblas-feedstock/blob/master/recipe/build.sh .

@isuruf
Copy link
Contributor

isuruf commented Jan 11, 2021

I don't know how to do the same with CMake directly but by reading the blog post above it looks like a significant amount of work.

It's not hard if you are using the compilers from Apple and is not using conda. You already do a kind of cross compiling when build universal binaries. It's the same for universal2 binaries.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 11, 2021

But to build openblas, one needs a gfortran setup that is ARM64 compatible (including the libgortran runtime that has to be vendored in the wheels with delocate). At the moment this wheel building setup relies on gfortran-4.9.0-Mavericks.dmg ( https://github.com/MacPython/gfortran-install/tree/master/archives ) which I assume has no chance to be able to generate macos/arm64 binaries.

I assume that in this case one would have to upgrade on something like the following experimental releases:

https://github.com/fxcoudert/gfortran-for-macOS

when building for the macos/arm64 target, while keeping the mavericks version when targeting 10.9.

but this ARM64 version is not meant to run on a x86_64 host AFAIK.

@rgommers
Copy link
Collaborator

In scipy/scipy#13347 we'd like to update minimum GCC to >= 5.5, for that to happen we'd need to upgrade gfortran also.

Re this issue: last I checked Apple didn't really care about Fortran and there was no solution for Fotran on ARM64 yet. Some Julia folks were loudly complaining on Twitter. Not sure if anything has changed since then?

@ogrisel
Copy link
Contributor

ogrisel commented Jan 11, 2021

Apparently the conda-forge team (@isuruf and others) managed to build a working gfortran that is enough to build a working scipy stack (based on openblas) from source on Apple M1.

I asked @fxcoudert whether the gfortran-for-macOS package that is currently used to build the OpenBLAS libs vendored in the numpy and scipy wheels can be used for cross-compiling and the answer is no, but the author provides a link to a doc on how to build such a cross-compiler toolchain, see: fxcoudert/gfortran-for-macOS#15

@fxcoudert
Copy link

Hello 👋

Current status of gfortran:

As part of Homebrew (where I am a maintainer):

  • we ship the GCC 10 backport (and provide binaries)
  • we use it to build openblas, openmpi, scipy, and a lot of scientific libraries
  • we have had no trouble with it so far

Regarding cross-compilation:

I don't have time right now to build one and host it on my page, but I can help you if you have questions or issues.

@mattip
Copy link
Collaborator Author

mattip commented Jan 11, 2021

If we are going the route of releasing cross-compiled binaries with no ability to test the that the package works (something I am not in favor of), it would make sense to host the compiler chain somewhere public so that each project would not have to build its own.

@isuruf
Copy link
Contributor

isuruf commented Jan 11, 2021

We have a x86_64 to arm64 darwin cross-compiler built from iains fork above in conda-forge, but it's the 11.0.0 dev branch and not the 10.x branch that @fxcoudert mentioned. It works really great and we have built a lot of the pydata stack including scipy with these compilers.

Here are some things we can do,

  1. Build the 10.x branch on conda-forge (this is trivial as we already have the infrastructure to build 11.x branch)
  2. Remove the bits that are conda specific and host the toolchain somewhere.

Building fat binaries is not possible, but a shell script wrapper that uses the two compilers and fuse them using lipo might not be too hard.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 11, 2021

If we are going the route of releasing cross-compiled binaries with no ability to test the that the package works (something I am not in favor of), it would make sense to host the compiler chain somewhere public so that each project would not have to build its own.

The alternative would be to wait for public Continuous Integration services to add Apple M1-based builders but as far as I know there is no ETA for that. So cross-compiling on public CI workers + local manual testing from time to time seems to be the only option for the time being.

@isuruf
Copy link
Contributor

isuruf commented Jan 12, 2021

@fxcoudert, looks like config.sub in your gcc fork are outdated and doesn't recognize arm64-apple-darwin20.1.0. Can you please update them? I can also send a PR if you prefer

@fxcoudert
Copy link

fxcoudert commented Jan 12, 2021

@isuruf Thanks, I'll update. aarch64 is recommended over arm64, I think.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 13, 2021

Building fat binaries is not possible, but a shell script wrapper that uses the two compilers and fuse them using lipo might not be too hard.

I am not even sure that fat binaries are needed. What people really want is pip install numpy / scipy to work out of the box and they don't care if this is the same wheel as for x86_64 on PyPI or two different wheel files.

@rgommers
Copy link
Collaborator

The problem may be that the packaging people chose fat wheels and a fat python.org Python installer by default. It's a really weird choice of course, but if that's all universal2 then I'm not quite sure that pip is going to choose correctly when different packages only have two thin wheels?

If that does work fine, then I'm all for no fat wheels. The extra space and bandwidth that takes is useless (we're also not stuffing 32/64-bit Windows together, or x86/arm64 Linux).

@ogrisel
Copy link
Contributor

ogrisel commented Jan 13, 2021

I would vote for: let's start with trying to get thin wheels to work. If pip does not like them (e.g. by trying with test.pypi.org), then we can try to hack around to fuse them as suggested by @isuruf in #49 (comment). The short term blocker is to get a maintainable gfortran cross-compiling tool chain. I think it's good to mutualise this maintenance effort with homebrew and conda-forge developers.

Out of curiosity, @fxcoudert how do you build the arm64 compatible binaries for openblas in homebrew at the moment? Do you have access to some public CI service with M1 workers? Or do you host your own farm of M1-based mac minis somewhere?

@matthew-brett
Copy link
Contributor

My guess would be that that two thin wheels won't work, because the Python.org universal2 build will require a universal2 wheel.

How does the Python.org binary work? Does it start in M1 mode by default, or x86_64 mode by default? If it starts in M1 mode by default, and it will be rare to start in x86_64 mode, then we could consider a thin M1 wheel, with a universal2 filename. But that's a bit messy.

Fusing is fiddly but not difficult. There were a few recipes for Multibuild that used to do that to make i386 / x86_64 fat wheels.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 13, 2021

I installed Python 3.9 from python.org and it does get executed in ARM64 mode by default:

ogrisel@mba scikit-learn % which python3.9
/usr/local/bin/python3.9
ogrisel@mba scikit-learn % ls -l /usr/local/bin/python3.9
lrwxr-xr-x  1 root  wheel  71 Jan 11 14:56 /usr/local/bin/python3.9 -> ../../../Library/Frameworks/Python.framework/Versions/3.9/bin/python3.9
ogrisel@mba scikit-learn % python3.9 -c "import platform; print(platform.platform())"
macOS-11.1-arm64-arm-64bit
ogrisel@mba scikit-learn % arch -x86_64 python3.9 -c "import platform; print(platform.platform())"
macOS-11.1-x86_64-i386-64bit

@isuruf
Copy link
Contributor

isuruf commented Jan 13, 2021

Here's a cross compiling toolchain I scribbled together from the conda toolchain removing conda specific parts
gfortran-darwin-arm64.tar.gz
You'll need to run it on Big Sur (Intel or M1) and do export SDKROOT=$(xcrun -show-sdk-path) beforehand.

@fxcoudert
Copy link

@ogrisel Homebrew hosts its own Apple Silicon machines

@isuruf
Copy link
Contributor

isuruf commented Jan 19, 2021

See https://github.com/matthew-brett/multibuild/pull/383 and MacPython/gfortran-install#4

@rgommers
Copy link
Collaborator

Updated link from comment above: multi-build/multibuild#383

There are arm64 builds for this repo now:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants