Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build hdf5 from source when building wheels #930

Merged
merged 2 commits into from Dec 26, 2021

Conversation

xmatthias
Copy link
Contributor

@xmatthias xmatthias commented Dec 18, 2021

This PR will add building hdf5 from source (therefore upgrading hdf5 to 1.12.1 - as requested in #912).
It'll also use the builtin blosc library for x86 wheels (this is optional and could be removed / reverted).

Wheels from my last run can be found here in the artifact download.

unfortunately, it does also include #929 - as i'm unable to build wheels from source on arm macos (and they didn't work anyway).
I could make CI more complicated to keep this part separate, but i'm not sure that will make sense.

it'll also increase runtime for the wheel CI quite some - as the build on aarch64 happnes as cross-compile, which can only utilize one core. with 2.5 hours it's however still well within range of the 6h github action runtime limit.


Building blosc from the vendored source on ARM64 is not possible at the moment.
The failure on this can be found in this run.

final errors for this are the following:

    /project/c-blosc/blosc/shuffle.c:153:3: error: impossible constraint in ‘asm’
      153 |   __asm__ __volatile__ (
          |   ^~~~~~~
    /project/c-blosc/blosc/shuffle.c:153:3: error: impossible constraint in ‘asm’
      153 |   __asm__ __volatile__ (
          |   ^~~~~~~
    /project/c-blosc/blosc/shuffle.c:185:3: error: impossible constraint in ‘asm’
      185 |   __asm__ __volatile__ (
          |   ^~~~~~~
    error: command 'gcc' failed with exit status 1

Therefore, aarch64 wheels do still include blosc-devel from the yum source (as they do now).

I suspect this has to do with the blosc version that is vendored, which might need an update to properly support compiling on arm64 - but didn't investigate further.


Important before merging:

On linux x86_64, all extension files become rather huge (16MB).
This does not happen on arm64, nor on macos - but i have no idea what causes it - as the building itself runs as expected (nothing that stands out to me in the logs).

I'm out of ideas for the moment, therefore i'm posting this here so maybe someone can jump in and give me a hint, or even provide a fix for this.
Maybe it's because of the updated hdf5 version - but i doubt that can cause such a huge spike.

Because of this, I think it might make sense to hold off merging this until we have found the root for the huge .so files. Maybe it's normal / expected - but it seems quite strange as it's only happening on one environment - where the macOS build is not significantly different (other than being another system, obviously).

@avalentino avalentino added this to the 3.6.2 milestone Dec 19, 2021
@avalentino
Copy link
Member

Thanks @xmatthias

Because of this, I think it might make sense to hold off merging this until we have found the root for the huge .so files. Maybe it's normal / expected - but it seems quite strange as it's only happening on one environment - where the macOS build is not significantly different (other than being another system, obviously).

do you think it could make sense to mark this PR as WIP?

@xmatthias xmatthias changed the title Build hdf5 from source when building wheels [WIP] Build hdf5 from source when building wheels Dec 19, 2021
@xmatthias
Copy link
Contributor Author

do you think it could make sense to mark this PR as WIP?

I've done so now - but with the hint that i'll need some input to improve this (so maybe you could also add the "help wanted" label?).

@FrancescAlted
Copy link
Member

Building blosc from the vendored source on ARM64 is not possible at the moment.

Yes, C-Blosc 1.x series does not support ARM64 (and probably will never do). C-Blosc2 do have native support for it though. However, C-Blosc2 is only backward compatible with C-Blosc1, not forward compatible, so switching to 2 means an important change in the format, and should be done in a new major version of PyTables. In case you are interested in doing this change, tell me and I can try to provide a PR for that during Xmas vacation.

@avalentino
Copy link
Member

Yes, C-Blosc 1.x series does not support ARM64 (and probably will never do).

Sorry @FrancescAlted, this also include Apple Silicon?

@FrancescAlted
Copy link
Member

Yes, C-Blosc 1.x series does not support ARM64 (and probably will never do).

Sorry @FrancescAlted, this also include Apple Silicon?

MacOS comes with an emulation layer for x86 on top of Apple Silicon, so C-Blosc1 should work well there. I was referring to native aarch64 on Apple Silicon.

@xmatthias
Copy link
Contributor Author

MacOS comes with an emulation layer for x86 on top of Apple Silicon, so C-Blosc1 should work well there. I was referring to native aarch64 on Apple Silicon.

i think this is only true if python is running via rosetta (the emulation layer).
For newer versions of python (python 3.9 and up IIRC) - it's also possible to run python natively, where i don't think the emulation layer will apply (i don't think you can have emulated libraries, while the original binary is not emulated).

@FrancescAlted
Copy link
Member

For newer versions of python (python 3.9 and up IIRC) - it's also possible to run python natively, where i don't think the emulation layer will apply (i don't think you can have emulated libraries, while the original binary is not emulated).

Fair point. No, probably you cannot mix native binaries with emulated libraries (not a good idea anyways).

@avalentino avalentino linked an issue Dec 22, 2021 that may be closed by this pull request
@avalentino
Copy link
Member

@FrancescAlted @xmatthias it seems that both c-blosc and PyTables build properly on all the platforms supported by debian, including different arm flavors [1,2].
So, if I understand correctly, we should be able to fix the build on Linux ARM living out only native builds on apple silicon, correct?

[1] https://buildd.debian.org/status/package.php?p=c-blosc
[2] https://buildd.debian.org/status/package.php?p=pytables

@xmatthias
Copy link
Contributor Author

xmatthias commented Dec 23, 2021

Debian build seems to suggest this, yes - but yet i'm unable to build pytables without preinstalled c-blosc from the sources contained in this repository by simply running setup.py.

While the resulting c-blosc version seems to be 1.21.1 in both cases - i'm not sure if it's 100% the same sourcecode (maybe debian patches something to have builds succeed?)

It should also be noted that while the github actions runner is ubuntu (debian based) - the actual build happens in docker containers with manylinux (e.g. quay.io/pypa/manylinux2014_aarch64) - which are actually centOS based, and in case of aarch64, also emulated via qemu (this should not cause a problem other than reduced performance).

To be fair - the solution could be as easy as using a different build-flag when building blosc - but i'm not that deep into how blosc is built (or should be built) that i'm confident in taking the right approach there.

@avalentino
Copy link
Member

While the resulting c-blosc version seems to be 1.21.1 in both cases - i'm not sure if it's 100% the same sourcecode (maybe debian patches something to have builds succeed?)

The only debian patch applied should be this one that only impacts the installation path AFAIK.

c-blosc version currently in the PyTables repo is 1.21.0, I'm currently working to upgrade it to v1.21.1.

Maybe we could test again this PR after the c-blosc update to see if something changes.
I doubt it anyway.

@FrancescAlted
Copy link
Member

Hey, it looks like updating C-Blosc to v1.21.1 did the trick? Interesting!

@avalentino
Copy link
Member

Very good, thanks @FrancescAlted
#931 is currently building. @xmatthias could you please rebase this PR on top of it when #931 is done?

@FrancescAlted
Copy link
Member

Yes, C-Blosc 1.x series does not support ARM64 (and probably will never do). C-Blosc2 do have native support for it though.

I had to be more precise here. C-Blosc 1.x series can be compiled for ARM, but it won't benefit from the native SIMD instructions for ARM (aka NEON). C-Blosc2 does have support for those native NEON SIMD instructions, and hence it has much better performance on ARM (that includes Apple Silicon CPUs).

More info: https://www.blosc.org/posts/arm-is-becoming-a-first-class-citizen-for-blosc/ and https://www.blosc.org/posts/arm-memory-walls-followup/.

@Czaki
Copy link

Czaki commented Dec 23, 2021

On linux x86_64, all extension files become rather huge (16MB).

Base on my experience you may check if binaries does not contain debug symbols. I do not see here artifacts in which I could check this.

@xmatthias
Copy link
Contributor Author

@Czaki the artifacts can be downloaded from this github actions run - since wheels are not built for pull requests.

.github/workflows/wheels.yml Outdated Show resolved Hide resolved
ci/github/get_hdf5_if_needed.sh Show resolved Hide resolved
@Czaki
Copy link

Czaki commented Dec 23, 2021

@Czaki the artifacts can be downloaded from this github actions run - since wheels are not built for pull requests.

it looks like both are compiled with debug symbols:

╭─czaki@grzesiek-komputer ~/Pobrane/tmp/test_wheel 
╰─$ file hdf5extension.cpython-36m-aarch64-linux-gnu.so 
hdf5extension.cpython-36m-aarch64-linux-gnu.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[sha1]=5c5a31b6a1602dafbe07732afba1c7ec54553872, with debug_info, not stripped                     /0,1s
╭─czaki@grzesiek-komputer ~/Pobrane/tmp/test_wheel 
╰─$ file hdf5extension.cpython-36m-x86_64-linux-gnu.so 
hdf5extension.cpython-36m-x86_64-linux-gnu.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=9d7f9051d0fb6f8f4a296102156ac3a7f844e3ea, with debug_info, not stripped    

There is a discussion in cibuildwheel about strip debug symbols pypa/cibuildwheel#331

If I have more time I will try to dive deeper if no one solve it.

@xmatthias
Copy link
Contributor Author

@Czaki thanks for this

based on https://github.com/h5py/h5py/blob/1d569e69fb8d2b1abea196792bb1f8c948180534/azure-pipelines.yml#L32-L33 - it might be possible to simply provide different CFLAGS to prevent this

I'll investigate this ...

@xmatthias
Copy link
Contributor Author

the aarch64 build also fails with the updated blosc sources (ci run).

Investigating the problematic lines (shuffle.cL153 and L185) makes me suspect that it's a problem with the build environment.
CI wheels for aarch64 runs in a qemu emulated docker container on a 64bit box.

I can build blosc just fine (both before and after the upgrade) on a native aarch64 box, so in general, building it should be possible on this architecture - just not in CI with emulation (there's so far no aarch64 github runners).

While i'm still investigating this in a separate branch - i have little hope that it's "easily fixable" - unless we can determine a compiler flag which helps with this "somehow".


That said - compiling with lower debug level (-g1) did work - and wheels are now back to "Normal" size (can be downloaded from here.

thanks @Czaki for the great pointer!

@xmatthias xmatthias changed the title [WIP] Build hdf5 from source when building wheels Build hdf5 from source when building wheels Dec 25, 2021
@xmatthias
Copy link
Contributor Author

xmatthias commented Dec 25, 2021

The good news is - I've been able to build AARCH64 wheels from source (including the builtin blosc) here - i've tested the wheel build on CI just now on a aarch64 system and it works fine (tests pass in ~164s).

I've had to completely disable sse2 for aarch64 systems.
This is necessary for CI (i assume the emulation does not support sse2) - but would work on native aarch64 systems (i was able to build the "master" branch just fine manually - but sse2 does not seem to be enabled there) - so i'm not sure if this might cause problems / slowdowns for people building "from source" - or if it's just the "SSE2 detection" in CI that's the problem here.

It also makes CI take ~4:30 - which is a bit concerning as it's "pretty" close to the 6h job execution limit - which would make CI fail simply due to timeout (in a "not really" fixable way).


I do have a few ideas on how to approach this:

  • Add a new environment variable to disable SSE2 support selectively "where needed".
  • run all aarch64 builds in parallel instead of in sequence
    this would reduce the job duration significantly (to about 1.5h per sub-task) - but would run the "compile hdf5 from source" step multiple times (which itself takes ~1 hour). From an overall perspective, this is therefore less efficient than the way it's done now.

As an alternative to the second option - we could take the risk "as is" for the moment - and eventually spread the jobs out in the future (should the need arise / ci start to fail because of duration).


@avalentino let me know how you'd like to approach this - and also if you'd like me to add the aarch64 changes to this PR - or if you'd prefer to keep these separate (that would mean this PR is complete).

for now it's just minor changes - but i think the Environment variable for SSE2 is not optional to ensure good support for people compiling from source.

@avalentino
Copy link
Member

@xmatthias very good news indeed!
Probably it is better to have the fix for disabling sse2 in aarch64 in a separate PR.
Indeed AFAIK aarch64 does not have sse2, I think it is an incorrect detection related to the emulation.

@xmatthias
Copy link
Contributor Author

In that case i think this PR is complete.

@avalentino
Copy link
Member

Thanks @xmatthias I will merge this PR after the fix to sse2 detection.

@xmatthias
Copy link
Contributor Author

I will merge this PR after the fix to sse2 detection.

This kind of contradicts to the above statement of

fix for disabling sse2 in aarch64 in a separate PR.

It's no problem for me to combine both in this PR - just need to know what you expect.

@Czaki
Copy link

Czaki commented Dec 25, 2021

hm. Maybe split aarch build on two jobs to reduce to 2h?
Or maybe add builds on CircleCI https://circleci.com/execution-environments/arm/ which offer native arm64 build machines?
Some example (maybe not best one) could be found here: https://github.com/Czaki/imagecodecs_build/blob/master/.circleci/config.yml

That said - compiling with lower debug level (-g1) did work - and wheels are now back to "Normal" size (can be downloaded from here.

thanks @Czaki for the great pointer!

it looks like binaries for x86_64 are still 3 times bigger (after decompress), but it is nice that size reduction is so big.

@xmatthias
Copy link
Contributor Author

xmatthias commented Dec 25, 2021

it looks like binaries for x86_64 are still 3 times bigger (after decompress), but it is nice that size reduction is so big.

what are you comparing this to?

Wheels from the last release (https://pypi.org/project/tables/3.6.1/#files) are ~14Mb for all linux x86_64 releases.
Wheels build in CI from this PR linked above are about 7 MB - which is actually a ~50% size reduction to currently available wheels.
i assume another level of reduction could be archived by using -g0 - but i'm not 100% sure that'll make sense.

@Czaki
Copy link

Czaki commented Dec 25, 2021

what are you comparing this to?

I compare so file size to so file size is archives:

Zrzut ekranu z 2021-12-25 14-46-37

4.88MB vs 1.3MB

@avalentino avalentino self-requested a review December 25, 2021 21:25
.github/workflows/wheels.yml Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
.github/workflows/wheels.yml Outdated Show resolved Hide resolved
.github/workflows/wheels.yml Outdated Show resolved Hide resolved
@avalentino
Copy link
Member

@xmatthias recent changes in setup.py caused merge conflicts. Do you mind to rebase on the current master and resolve conflicts?

@avalentino avalentino merged commit 08b421e into PyTables:master Dec 26, 2021
@xmatthias xmatthias deleted the ci/hdf5_from_source branch December 26, 2021 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants