Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS wheels are not actually universal2 builds #1793

Open
ianhbell opened this issue Mar 15, 2024 · 15 comments
Open

MacOS wheels are not actually universal2 builds #1793

ianhbell opened this issue Mar 15, 2024 · 15 comments

Comments

@ianhbell
Copy link

Description

I have universal2 wheels that are were formerly working with C++17 standard specified. I then upgraded to the use of C++20 features (especially concepts). Everything seems ok. The build appears externally to be universal2. But when I try to load on my arm64 mac, I am told that it cannot load the package. When I do a file on the compiled .so extracted from the wheel, I see:

(base) 647ibmac:teqp ihb$ file teqp.cpython-311-darwin.so 
teqp.cpython-311-darwin.so: Mach-O 64-bit bundle x86_64

but on the working one from a prior version, it shows both arch.

Anyone have any idea? Or is it related to my MACOSX_DEPLOYMENT_TARGET=10.15? Shouldn't that cause an error though if the deployment target is too old?

Build log

https://github.com/usnistgov/teqp/actions/runs/8282966941

CI config

https://github.com/usnistgov/teqp/blob/main/.github/workflows/build_cibuildwheel.yml

@ianhbell
Copy link
Author

I also tried bumping the deployment target to 11.0, same issue

@mayeut
Copy link
Member

mayeut commented Mar 16, 2024

You're using a custom script (based on a pybind sample) which calls cmake with default option ending with an x86_64 build only.
you'd probably be better off using scikit-build-core
maybe @henryiii has more insight being also involved in both pybind11 & scikit-build-core.

@ianhbell
Copy link
Author

But it was working very recently, and I think the only meaningful change was the switch to C++20 code.

@henryiii
Copy link
Contributor

You are missing https://github.com/pybind/cmake_example/blob/62fe6fcc27824ba6fc4492a624a502aa3d82ed4e/setup.py#L99-L103. It won’t work if you don’t tell CMake what you are trying to do. I’d highly recommend scikit-build-core over trying to hack setuptools, which is fragile.

@henryiii
Copy link
Contributor

I’m surprised MACOSX_DEPLOYMENT_TARGET=10.15 supports C++20, as 10.12-10.14 (depending on what features you used) is required to support C++17. But maybe it supports the subset you are using?

@ianhbell
Copy link
Author

It looks like the arm64 wheels ignore the stated deployment target when you have c++20 code and it reverts instead to 12.0, while the x86_64 wheels seem to be ok with 10.15, and perhaps that is why I see the weird behavior I do, which seems like a bug in cibuildwheel maybe?

@ianhbell
Copy link
Author

This is what I get when I build x86_64 and arm64 wheels with the deployment target set to 10.15:
image

@ianhbell
Copy link
Author

@mayeut I tried to enable scikit-build-core, seemed very promising, but the wheel building killed the github actions, I guess because I was consuming too much resources in my builds, although the action didn't say why it had been killed. The builds are using highly templated C++ code and it is not unusual to need a few GB of compiler memory

@henryiii
Copy link
Contributor

Scikit-build-core does not use extra memory. Your error is:

ModuleNotFoundError: No module named 'scikit_build_core'

So you forgot to update the build-system.requires.

Did you check the binary in the “arm” wheel that was being produced before to make sure it was arm? I expect it was Intel. Setuptools knows you are targeting arm but you didn’t tell CMake. Your current patch or scikit-build-core both would tell CMake. (Scikit-build-code would also include Windows ARM, etc, too)

@ianhbell
Copy link
Author

Thank you @henryiii , your fix to the setup.py did the trick (it's super weird that it was working before without this change). Anyhow, now I get the right arch in the compiled library:

(base) xxxx:teqp xxx$ file teqp.cpython-39-darwin.so 
teqp.cpython-39-darwin.so: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit bundle x86_64] [arm64]
teqp.cpython-39-darwin.so (for architecture x86_64):	Mach-O 64-bit bundle x86_64
teqp.cpython-39-darwin.so (for architecture arm64):	Mach-O 64-bit bundle arm64

@ianhbell
Copy link
Author

@henryiii I get panics like this one: https://github.com/usnistgov/teqp/actions/runs/8307869615 . I think is is related to consuming too much memory, but the message is not helpful.

I think I did have the necessaries in my branch: usnistgov/teqp@main...scikit-build-core . It was at least compiling locally and I could tell I was using the new workflow.

Unrelated: I was unable to single-source the version from my code with scikit-build-core; I had tried to follow the method shown in the docs. That's also a non-starter for me. I l

@henryiii
Copy link
Contributor

[tool.setuptools.dynamic]
version = {attr = "teqp.__version__"}

Scikit-build-core is not setuptools. This won’t do anything. Scikit-build-core has a built-in regex plugin that does exactly what you want. https://scikit-build-core.readthedocs.io/en/latest/configuration.html#dynamic-metadata (click the Regex tab)

Can you enable verbosity when building in cibuildwheel?

@henryiii
Copy link
Contributor

henryiii commented Mar 17, 2024

You should set:

[tool.cibuildwheel]
build-verbosity = 1

or CIBW_BUILD_VERBOSITY=1 (You've set it only for Windows). The default build backend (pip) hides all backend output by default. That way you can see what's happening.

Two possible causes: Scikit-build-core uses Ninja by default, which will try to use all your machine cores. On GHA, that's probably 2 or so, but that still might be enough to cause it to run out of memory if your code is large. You can set the standard CMake environment variable CMAKE_BUILD_PARALLEL_LEVEL=1 to control this. It's probably best to set it in cibuildwheel's environment or GitHub Actions's config, as most users would probably like the multihreaded builds, but you can also set it in scikit-build's config for everyone.

The other is that scikit-build-core defaults to Release mode, which will apply -O2 or -O3 optimizations (forgot which), which take more effort to compile. You can change this, though, if you want, via build-type.

@henryiii
Copy link
Contributor

Cibuildwheel builds Linux wheels in a docker container, which means they do not have access to the host's environment variables. You have to either specify the variables you want to pass through with tool.cibuildwheel.linux.environment-pass/CIBW_ENVIRONMENT_PASS_LINUX or set the value using tool.cibuildwheel.environment/CIBW_ENVIRONMENT.

@ianhbell
Copy link
Author

Ultimately I gave up with scikit-build-core. I have a working version now with setuptools hacking (thanks to fixing the archs) and the benefits offered by scikit-build-core (faster builds with ninja) are not compelling in my case to justify more hours of debugging. Perhaps I will revisit later on, but bigger fish to fry.

Thank you @henryiii for your constant help on this topic, it is much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants