Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scipy in Pyodide status report (advice about LLVM fortran compilers?) #15290

Open
hoodmane opened this issue Dec 27, 2021 · 31 comments
Open

Scipy in Pyodide status report (advice about LLVM fortran compilers?) #15290

hoodmane opened this issue Dec 27, 2021 · 31 comments
Labels
Build issues Issues with building from source, including different choices of architecture, compilers and OS

Comments

@hoodmane
Copy link
Contributor

hoodmane commented Dec 27, 2021

I got Scipy 1.7.3 working in Pyodide. This was quite difficult. I wanted to give a summary of some of our difficulties in case people are interested. Basically all of our issues are Fortran related. My two main questions for people here are:

  1. are any patches upstreamable?
  2. does anyone have advice about flang or other LLVM-based Fortran compilers?

Some patches may be interesting

  1. BLAS detection fails because we have BLAS for Wasm installed but not native BLAS. setup.py detects that native BLAS is missing because it doesn't realize it is being cross compiled. It would be useful to have a flag for setup.py to tell it to skip this detection for cross compilation. We currently just disable it.
  2. I believe this patch was written by @rth. I don't know what it does, why we need it, or whether it could be appropriate to upstream.
  3. Some issues with const are fixed in this patch. May be related to cython_blas does not use const in signatures #14262.
  4. We have a clash in the definition of sasum. For some reason our sasum returns double. Maybe has to do with our use of f2c. Patched here.
  5. The make int return values patch, more about this in next section.

We also have 5 patches which are related to f2c issues and one patch due to a problem with Pyodide's packaging system. These are not suitable to upstream.

int vs void return types

Wasm is very picky about function signatures. If a function is defined with return value int and then imported with return value void this causes trouble for us. I believe Fortran ABI returns integers to indicate which custom return was used, but people mostly ignore them. Anyways, we use sed and manual patching to turn all of the functions. I am not sure if this stuff could be appropriate to upstream.

The compiler

Our biggest difficulty is the compiler. We are currently using f2c, which only works with Fortran 77. It is a bit of a miracle that this works. gfortran and other mature fortran compilers don't have a wasm backend. Flang classic seems promising and I have had luck producing wasm binaries with it, but the version that is distributed on apt is based on LLVM 7. We need to link the object files with Emscripten because WASM has no standard for dynamic linking and so we need to use Emscripten linker to create dynamic libraries. But our Emscripten toolchain is based on LLVM 13. LLVM 7 and LLVM 13 do not use the same object file format, so it doesn't work to link objects produced by flang classic with emscripten. The most recent version of flang classic apparently works with LLVM 10, I haven't yet checked if it's possible to generate wasm object files with LLVM 10 and link them with LLVM 13, but if that is possible this could potentially be an approach.

We really just need to fix the compiler, but here are some issues caused by the f2c:

We are stuck on CLAPACK 3.2

LAPACK 3.3 introduces some dynamically sized arrays and other features which aren't compatible with Fortran 77, so it can't be f2c'd anymore. Trying to build SciPy using LAPACK 3.2, we end up with 36 missing symbols. Conveniently, each LAPACK function is defined in a distinct file so we can just copy the missing ones into SciPy. The four functions cuncsd, dorcsd, sorcsd, and zuncsd use dynamically subbed arrays so I have to delete them.

mvnun and mvnun_weighted also don't work

Again, dynamically sized arrays are the culprit. So we delete them.

f2c output requires patching to fix implicit cast function arguments

A lot of this can be done automatically by collecting up the definition signatures and then fixing the declarations so that they agree with the definition. But implicit casts from character to integer appear in a bunch of places and these require manual patching to fix up the extra ftnlen arguments that Fortran ABI has for character * arguments.

f2c doesn't handle the common keyword correctly

It leads to duplicate symbols. We have to manually add some externs.

@tylerjereddy tylerjereddy added the Build issues Issues with building from source, including different choices of architecture, compilers and OS label Dec 27, 2021
@tylerjereddy
Copy link
Contributor

does anyone have advice about flang or other LLVM-based Fortran compilers?

@certik might

@certik
Copy link

certik commented Dec 27, 2021

There is also new Flang as part of the LLVM project.

Finally, we are developing @lfortran, and one of our goals is to make it work well for SciPy. It's currently in alpha, so not ready for production use yet, but large parts of Fortran already work (you have to workaround current limitations). Here is the issue to compile the Fortran code in SciPy:

If anyone here is interested to help out, please definitely let me know! We will get it done eventually, but if there is anybody who has time to help, we will get there much faster.

@hoodmane
Copy link
Contributor Author

There is also new Flang

As I understand it, new Flang is also not ready for use yet? I built a version of it but it doesn't seem to work to compile a test file.

@h-vetinari
Copy link
Member

As I understand it, new Flang is also not ready for use yet?

Code generation is still a work-in-progress, see e.g. here. It might squeak in for LLVM 14 (note that that repo is not the LLVM codebase itself but a different one from which the work is being upstreamed into LLVM).

There are still several other pieces missing (see other projects in that repo), and the real target are the different applications. Let's hope most things will work by LLVM 15 or so.

@carlkl
Copy link
Member

carlkl commented Jun 18, 2022

I have no clue about that project (never tried that): FORTRAN In The Browser, see also:

but it is a recent addition to this problem space. Seems to be a kind of modernized dragonegg - gfortran-4.6 based.

@rgommers
Copy link
Member

I'll reference the discussion in https://discuss.scientific-python.org/t/releasing-or-not-32-bit-windows-wheels/282, which touches the Fortran problem as well.

gfortran-4.6 based.

That's going to be too old; we need 4.8 as a minimum, and it will be updated to minimum of 5.5 or 6.1 in the near future (there's already a PR open somewhere, it just needs finishing).

5. The make int return values patch, more about this in next section.

Updated link to patch set: https://github.com/pyodide/pyodide/tree/main/packages/scipy/patches. This one is 0008-make-int-return-values.patch.

The corresponding change for NumPy is relevant to follow: numpy/numpy#21772

  1. are any patches upstreamable?

I missed this the first time around, sorry @hoodmane. I'd say yes, quite a few are. The setup.py ones not so much anymore, because we are dropping the numpy.distutils based build completely soon. But the ones that are related to type declarations should all be fixes/improvements that we can upstream.

@h-vetinari
Copy link
Member

gfortran-4.6 based.

That's going to be too old; we need 4.8 as a minimum, and it will be updated to minimum of 5.5 or 6.1 in the near future (there's already a PR open somewhere, it just needs finishing).

I think you mean MacPython/scipy-wheels#140. This needs some infrastructure work, and I put this on the backburner until the cibuildwheel setup has arrived for scipy, so as to not (potentially) do redundant work.

@carlkl
Copy link
Member

carlkl commented Jun 20, 2022

I brought that into discussion on the hope it could be useful.

That's going to be too old; we need 4.8 as a minimum, and it will be updated to minimum of 5.5 or 6.1 in the near future (there's already a PR open somewhere, it just needs finishing).

Maybe @StarGate01 could comment on that. There is an open issue about that: StarGate01/Full-Stack-Fortran#6

@StarGate01
Copy link

Hi, updating the gfortran version in my f90wasm compiler would indeed be very nice, not just for this project.

Unfortunately, GCC internals change quite a bit between versions, and dragonegg is not being developed further anymore. I tested a dragonegg fork which claimed compatibility to GCC 8, but had no success (at least for Fortran, no IR code was emitted). Updating to 4.8 might be easier.

What version are you targeting for the future? GCC 5.5 or 6.1?

In the long term, I hope for the flang Fortran frontend in LLVM to be finished, but that has been in development for years at this point.

@h-vetinari
Copy link
Member

h-vetinari commented Jun 20, 2022

What version are you targeting for the future? GCC 5.5 or 6.1?

The lowest common denominator was the compilers in the manylinux_2_24 image (GCC 6.3; though we're actually using 6.5), but that image did not ever really take off and is close to EOL now. Bumping compiler versions has a long lead time, but there would still be direct benefits for scipy (e.g. using a current boost version needs GCC>=7). I expect (hope?) things to keep moving slowly but steadily.

In the long term, I hope for the flang Fortran frontend in LLVM to be finished, but that has been in development for years at this point.

I've been following this process in some detail, and progess is being made! For example, the fir-dev repo has been archived as the bulk of the upstreaming was completed and now work continues in LLVM directly. There'll also be a (still-experimental) flag to compile binaries with flang in LLVM 15, and hopefully things will get mature enough to start using it soon after**. 🤞

** it'll be longer still for performance to become comparable with gfortran, but well; most issues in this space are currently concerned with having a usable fortran compiler at all...

@hoodmane
Copy link
Contributor Author

@rgommers About a week ago I upstreamed or dropped almost all of our patches, only #21772 is left =)

@rgommers
Copy link
Member

@hoodmane for SciPy? I merged gh-15955 a few days ago; the last PR from you before that is from April. And https://github.com/pyodide/pyodide/tree/main/packages/scipy/patches is the current set, right?

@hoodmane
Copy link
Contributor Author

Ah yes sorry I just confused scipy with numpy.

@certik
Copy link

certik commented Jun 21, 2022

@h-vetinari, @hoodmane what are your requirements / expectations for a Fortran compiler for SciPy?

Our goal with LFortran is to deliver everything that is needed. It uses LLVM backend, can produces WASM, runs on every platform, it compiles quickly, the runtime performance is already very competitive with GFortran, we are aiming to support all of Fortran (from F77 to F2018). As a bonus, it can also run interactively (like Python does). We are also very interested in collaborating with f2py, so that we can use the knowledge that LFortran has and generate the Python wrappers automatically. We are in alpha version, and are working very hard to bring it to beta.

@carlkl
Copy link
Member

carlkl commented Jun 21, 2022

@certik, last time a took a look at Lfortran it was qualified as alpha. Now it seems to be in a more mature state. However, the latest release tag on github is v.015.0 but as far I can see the latest version on conda-forge is v.0.14.0. Shouldn't the interested tester wait for v.015.0 conda-forge binaries?

A big plus would be the integration of Lfortran in the msys2 ecosystem. A Fortran compiler for their clang-based environments would be a big plus. But I am now deviating from the topic of the issue...

@certik
Copy link

certik commented Jun 21, 2022

Yes, LFortran is still in alpha, that is, you have to workaround current limitations. Once we get to beta, it should mostly work for most codes, modulo bugs. Yes, we should update Conda. We've been focusing on features themselves, once we can compile SciPy or some other codes, we'll definitely ensure all packages are up-to-date.

@h-vetinari
Copy link
Member

@h-vetinari, @hoodmane what are your requirements / expectations for a Fortran compiler for SciPy?

From the POV of conda-forge, largely what @carlkl said, though ideally we can also integrate with MSVC directly. Basically, the wish would be to be able to just change the fortran compiler/version on windows in the current compilers in conda-forge to use LFortran (or flang) - but keep using MSVC for C/C++ -, and have those compilers set up compatibly (with their activation & exported runtimes + libs) such that the binaries produced are working without having to change much about existing feedstock-recipes that need a fortran compiler.**

Changing affected feedstock (needing fortran on windows) to the LLVM stack on windows would be conceivable, but much less desirable (and same for msys2).

This is just my POV though - I presume @isuruf would likey have some deeper thoughts or insights about using flang vs. LFortran in conda-forge.

** this can be tested on a per-feedstock basis (e.g. scipy) by overriding compilers in the local recipe/conda_build_config.yaml, and doesn't have to go through the global pins right away.

@carlkl
Copy link
Member

carlkl commented Jun 21, 2022

@certik, I have just compiled Lfortran with msys2:

  • on the ucrt64 environment (gcc-12.1)
  • on the clang64 environment (clang-14.0) - with a workaround for clock_gettimes not being found

the latter is interesting, as there is no Fortran beside f2c on this environment. So the same situation as here.

@hoodmane
Copy link
Contributor Author

hoodmane commented Jun 21, 2022

@certik how do I get a wasm target for lfortran? So far it's giving me "LFortranException: No available targets are compatible with triple (whatever I put in)"

@hoodmane
Copy link
Contributor Author

Ah I see WITH_TARGET_WASM.

@hoodmane
Copy link
Contributor Author

I guess it looks like the wasm backend isn't working right now.

what are your requirements / expectations for a Fortran compiler for SciPy?

It should be able to compile scipy to wasm object files in a way that is ABI compatible with Emscripten-generated C/C++ code.

@hoodmane
Copy link
Contributor Author

hoodmane commented Jun 21, 2022

Also I have to say I'm offended that gitlab won't even let people search the issue tracker without a login:

Screenshot from 2022-06-21 09-20-54

@certik
Copy link

certik commented Jun 22, 2022

@hoodmane thanks for trying it --- the WASM backend should work, we just merged support for it: https://gitlab.com/lfortran/lfortran/-/merge_requests/1769, there are instructions in the MR how to test it out. (We are also writing our own WASM backend that does not use LLVM, which will enable to run LFortran itself in the browser, but for the SciPy use case probably that is not interesting.)

Yes, I am sorry about GitLab, we might need to move away from it for reasons like this one.

@hoodmane
Copy link
Contributor Author

Ah so maybe the problem I was having is that that PR is more recent than the latest release?

@hoodmane
Copy link
Contributor Author

Okay, well it is working. How do I tell it to include relocations? I get:

The following argument was not expected: -fPIC

@hoodmane
Copy link
Contributor Author

It's odd because it seems to hard code Reloc::PIC_:
https://github.com/lfortran/lfortran/blob/master/src/libasr/codegen/evaluator.cpp#L183

@certik
Copy link

certik commented Jun 23, 2022

I think we include those by default. Go ahead and open issues in LFortran and we can get it resolved/fixed. These should be all relatively easy. My main focus is on implementing the missing features to compile SciPy, currently it will not compile it yet, but on things that it will compile, you can run those in the browser, that should work.

@hoodmane
Copy link
Contributor Author

Opened an issue here:
https://gitlab.com/lfortran/lfortran/-/issues/714

@carlkl
Copy link
Member

carlkl commented May 2, 2023

I noticed that R is now available based on Webassembly, see https://r-wasm.org/ and https://github.com/r-wasm/webr.
A docker file https://github.com/georgestagg/webr-flang-docker with a patched flang version is used for building: https://github.com/lionel-/f18-llvm-project/tree/fix-webr. This may also be helpful for pyodide.

@carlkl
Copy link
Member

carlkl commented May 2, 2023

.. sorry, commented twice

@certik
Copy link

certik commented May 2, 2023

Current status of LFortran: https://lfortran.org/blog/2023/05/lfortran-breakthrough-now-building-legacy-and-modern-minpack/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build issues Issues with building from source, including different choices of architecture, compilers and OS
Projects
None yet
Development

No branches or pull requests

7 participants