Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: out-of-tree Pyodide builds in CI for h5py (and libhdf5) #2397

Open
agriyakhetarpal opened this issue Mar 21, 2024 · 4 comments

Comments

@agriyakhetarpal
Copy link

agriyakhetarpal commented Mar 21, 2024

Description

Hi there! I am opening this feature request to propose out-of-tree Pyodide builds, i.e., wasm32 wheels via the Emscripten toolchain for h5py, and libhdf5 that the former relies on. In my most recent work assignment (more below), I am working on improving the interoperability for the Scientific Python ecosystem of packages with Pyodide and with each other, which shall culminate with efforts towards bringing interactive documentation for these packages where they can then be run in JupyterLite notebooks, through nightly builds and wheels for these packages pushed to PyPI-like indices on Anaconda, at and during a later phase during the project (those can be picked up in follow-up PRs).

The build procedure in question here is definitely daunting and it is not something that I have implemented before, but it looks like there have been conversations about these WASM wheels earlier (#2338), where emscripten-forge/recipes#685 is still in the works to get h5py merged (hdf5 is already present). Some patches are available in the Pyodide repository upstream that I can use at my disposal (they may or may not have been a bit outdated by now, however). Some WASM builds for hdf5 do exist otherwise, but they are yet to be bumped to the version of the Emscripten toolchain that Pyodide currently uses (3.1.46): usnistgov/libhdf5-wasm#3.

Proposed solution

  1. A CI pipeline on GitHub Actions where libhdf5 is built, and subsequently h5py wheels are built, and then tested via the test suite in an activated Pyodide virtual environment providing a Python distribution/interpreter compiled with Emscripten.
  2. Fixing up and skipping any failing tests as necessary based on current Pyodide limitations and ensuring that all relevant test cases pass

Some more context

This project is being tracked at Quansight-Labs/czi-scientific-python-mgmt#18 and Quansight-Labs/czi-scientific-python-mgmt#19. It is being implemented for various repositories at the time of writing, and has been completed for NumPy (numpy/numpy#25894) and PyWavelets (PyWavelets/pywt#701), thanks to @rgommers – and I believe that h5py shall benefit from this push in the ecosystem too, as will other PyData packages.

@tacaswell
Copy link
Member

Happy to review a PR doing so.

@aragilar
Copy link
Member

Point of clarification: who is on the hook for fixing/maintaining said pipeline/wheels? Generally, platform/distribution specific wheels/builds/packages are handled by platform-specific maintainers (the exceptions being the wheels for the three main desktop systems, but even then we've previously had issues on Windows, and I can foresee similar issues happening on MacOS if MacOS diverges more from its unix heritage), I'm wary of a platform being added where the h5py maintainers are not familiar with the platform, and we're expected to maintain it (c.f. conda-forge/emscripten-forge where we help out, but are not the only maintainers).

@agriyakhetarpal
Copy link
Author

Thanks for the comment, @aragilar. I would say that I would be happy to maintain these as and when I am available to do so, and I would be okay with getting pinged on PRs or issues where breakages are experienced. Upbearing the maintainability of these pipelines is a totally valid point – the workflow can also be disabled temporarily in a significant situation where putting out wheels for a release or incumbent maintenance operations are required (for example, this happened for NumPy when it switched from setuptools/numpy.distutils to meson-python, and the workflow was restored later on, very recently last month as via the PR mentioned above).

That being mentioned, I haven't really got as far with compiling libhdf5 and h5py right now, and I would love a bit of pointers and clarifications so as to avoid duplicating my efforts before I dive right in – what should the ideal workflow look like? Do I compile libhdf5 and subsequently h5py wheels in the same job, and keep the patches for them in this repository? In any case, I will either be doing this coupled with a few other tasks and other repositories, so it's not like I will be doing it immediately or getting to the task right away – but I will revisit from time-to-time on what is needed to get to a successful build.

@tacaswell
Copy link
Member

At least for h5py I would much rather we get any needed changes merged upstream rather than carry patches in the build pipeline and similarly you should try to push any libhdf5 changes upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants