Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CentOS 8 #1432

Open
jakirkham opened this issue May 3, 2021 · 27 comments
Open

CentOS 8 #1432

jakirkham opened this issue May 3, 2021 · 27 comments

Comments

@jakirkham
Copy link
Member

A few use cases for CentOS 8 have come up recently. Namely CUDA support for ARM and PPC64LE. Potentially more use cases will show up in the future. Am opening this issue so that we can discuss how best to handle this need

cc @jaimergp @kkraus14 @isuruf @beckermr @conda-forge/core

@beckermr
Copy link
Member

beckermr commented May 3, 2021

CentOS 8 EOL is December 31 of this year. I don't think implementing support for it is well-motivated. Vendors will have to move on from it anyways.

@jakirkham
Copy link
Member Author

Right so maybe we need to use an alternative. Rocky Linux has come up before.

Also here's a longer post on alternatives: https://haydenjames.io/what-centos-alternative-distro-should-you-choose/

@jakirkham
Copy link
Member Author

jakirkham commented May 3, 2021

However it's worth noting we are using upstream Docker images for these architectures & CUDA versions. So the OS is already fixed

Edit: Raised upstream issue ( https://gitlab.com/nvidia/container-images/cuda/-/issues/123 ) about this

@beckermr
Copy link
Member

beckermr commented May 3, 2021

@chenghlee What is anaconda moving to? Mirroring that is likely a good idea.

@jakirkham
Copy link
Member Author

A suggestion brought up on the NVIDIA CUDA image repo upstream would be to look at RedHat's Universal Base images, which are also being supplied. Have not looked at these closely yet, but that might be something else to consider

@chenghlee
Copy link

Anaconda's current plan is to stay on CentOS/RHEL 7 (glibc 2.17) as much as possible for the packages on repo.anaconda.com (defaults); if we need a newer glibc for some reason, we'll likely look at Debian 9 or 10.

@kkraus14
Copy link
Contributor

kkraus14 commented May 5, 2021

if we need a newer glibc for some reason, we'll likely look at Debian 9 or 10.

Debian 9 is not supported by CUDA: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

The only OS supported by CUDA across x86_64, PPC64, and ARM SBSA is RHEL 8 (since CentOS 8 isn't really a thing anymore).

@mbargull
Copy link
Member

mbargull commented May 5, 2021

if we need a newer glibc for some reason, we'll likely look at Debian 9 or 10.

Debian 9 is also EOL in about a year and 10 uses the same base glibc version as CentOS 8, so I'd advice to go for 10 when really needed.
EDIT: Going for Debian 10 would be problematic for Ubuntu 18.04 though (and Debian 9 also for Ubuntu 16.04 but, as I now learned, that Ubuntu version is EOL since a couple of days by now.)

@jjacobelli
Copy link

Hey, I would like to raise again this issue. Do we know which OS should we use? Should we use ubi8? Maybe we can start with centos8 now and take some time before the end of the year to decide?

@jakirkham
Copy link
Member Author

When we had discussed supporting other architectures that needed CentOS 8 for deployment previously, we came to the conclusion that we may be able to actually build things on CentOS 7. We just wouldn't be able to load any libraries (via a Python import or otherwise). However as we already don't do these things with GPU builds, this may not actually present much of a problem. This was the thinking anyway

There are a few reasons we were thinking of this (admittedly somewhat hacky solution).

First is CentOS 8's EOL is really quite soon (December 2021; yes this year). Also the new CentOS release system, CentOS Stream, really doesn't work well for our use case (of building with a really old GLIBC that is supported by the vast majority of systems out there). So we may find ourselves abandoning CentOS for some other solution in the future. What that future solution will be is somewhat unclear, but there are a few options being considered (Debian, Rocky Linux, UBI, etc.)

Second adding a new OS (like CentOS 8) involves doing a fair bit of work. Namely building docker images, rebuilding the compiler toolchain, building CDTs, etc.. So for something that won't be around for more than a few months, it really isn't worth undertaking that work at least not in conda-forge as a whole.

There are probably more reasons that I'm forgetting, but those are already fairly significant considerations.

Anyways so to tie a bow on this we might want to try just using CentOS 7 in the case where we need it and see how that goes. That said, it still isn't quite that simple, but maybe we can discuss the other points offline

@leofang
Copy link
Member

leofang commented Jun 4, 2021

I am under the impression that we can just install a CUDA runfile distribution in a vanilla cos7 image. I think this would work for x86-64, but I am less certain about aarch64 or ppc64le.

@jakirkham
Copy link
Member Author

The current Docker images are covered under the NVIDIA licensing agreement. Am not sure that would be true of some custom made image. This is something we would need to figure out

@jjacobelli
Copy link

When we had discussed supporting other architectures that needed CentOS 8 for deployment previously, we came to the conclusion that we may be able to actually build things on CentOS 7. We just wouldn't be able to load any libraries (via a Python import or otherwise). However as we already don't do these things with GPU builds, this may not actually present much of a problem. This was the thinking anyway

Building the packages on a CentOS 7 should work, but the issue I'm facing right now is that some packages (i.e. cuda-toolkit) are trying to run some tests at the end of the build that may require to load the libraries and so it's failing if we don't have the right version of the GLIBC. Should we consider not running these tests on other arch than x86_64?
Example of failing CI: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=332546&view=logs&j=81eb4d60-76fc-5ac4-a959-9ebb9871bfee&t=e733809a-cb57-567e-b6dc-c69c35a56404

===== testing package: cudatoolkit-11.2.2-h24a0247_8 =====
running run_test.py
Finding cublas from Conda environment
	located at $PREFIX/lib/libcublas.so.11.4.1.1043
	trying to open library...	ok
Finding cusparse from Conda environment
	located at $PREFIX/lib/libcusparse.so.11.4.1.1152
	trying to open library...	ok
Finding cufft from Conda environment
	located at $PREFIX/lib/libcufft.so.10.4.1.152
	trying to open library...	ok
Finding curand from Conda environment
	located at $PREFIX/lib/libcurand.so.10.2.3.152
	trying to open library...	ERROR: failed to open curand:
/lib64/libm.so.6: version `GLIBC_2.27' not found (required by $PREFIX/lib/libcurand.so.10.2.3.152)
Finding nvvm from Conda environment
	located at $PREFIX/lib/libnvvm.so.4.0.0
	trying to open library...	ok
Finding cudart from Conda environment
	located at $PREFIX/lib/libcudart.so.11.2.152
	trying to open library...	ok
Finding cudadevrt from Conda environment
	located at $PREFIX/lib/libcudadevrt.a
Finding libdevice from Conda environment
	searching for compute_20...	ok
	searching for compute_30...	ok
	searching for compute_35...	ok
	searching for compute_50...	ok
Tests failed for cudatoolkit-11.2.2-h24a0247_8.tar.bz2 - moving package to /home/conda/feedstock_root/build_artifacts/broken

@jakirkham
Copy link
Member Author

Yeah I think not running the tests or running parts of the tests that don't require library loading would be preferable.

One other thing we might consider is checking for GLIBC version or try loading the libraries (and not error if that fails). This can be useful as we can still opt to run these tests on systems with a new enough GLIBC

For example CuPy has checks similar to this where it won't run some tests if a GPU is missing. This allows us to still run the tests on systems that have a GPU. We can also use the conda build --test command with the package produced to run the tests on that package on a system with a GPU to make sure it works ok. Mentioning all of this as we can use similar strategies with cudatoolkit packages on ARM

@h-vetinari
Copy link
Member

The massively reduced support for CentOS 8 is really a pity for this next step. Assuming AlmaLinux and/or RockyLinux can uphold their promises of 1:1 compatibility with the pre-stream CentOS, I still think it would be interesting to try them out?

So far, both only support x86 & aarch64, haven't found anything about ppc yet. RockyLinux just released 8.4rc, AlmaLinux was a bit faster there (though aarch64 support doesn't show up on the main page yet).

@h-vetinari
Copy link
Member

Rocky Linux & Alma Linux 8.4 have been released a few days ago (both for x86 & aarch64). No update about PPC support, but I've asked on their respective discourse servers.

Regarding compatibility, here are relevant quote from their websites (emphasis mine):

Rocky Linux is a community enterprise operating system designed to be 100% bug-for-bug compatible with America's top enterprise Linux distribution now that its downstream partner has shifted direction. It is under intensive development by the community. Rocky Linux is led by Gregory Kurtzer, founder of the CentOS project.

Alma Linux:

Governed and driven by the community, focused on long-term stability and providing a robust production-grade platform that is 1:1 binary compatible with pre-Stream CentOS and RHEL®.

Since both promise 1:1 compat so prominently, could this not be an option?

@h-vetinari
Copy link
Member

No update about PPC support, but I've asked on their respective discourse servers.

Update: Rocky Linux is planning a PPC release soon.

@oleksandr-pavlyk
Copy link

oleksandr-pavlyk commented Feb 13, 2023

Reviving the thread. PEP-600 has opened a way to build binaries targeting newer versions of GLIBC.
Auditwheel 5.3.0 supports GLIBC 2.35 and older.

It's high time conda ecosystem evolved beyond GBLIC 2.17 to accommodate Python packages built with newer toolchain whose runtime libraries require GLIBC > 2.17.

Settling the question of what's the next version is hard, but must it be the single version everyone must agree to? Is it possible for conda to have multiple sysroot versions for the same platform?

@beckermr
Copy link
Member

We can have multiple sysroots at once, so that helps a lot. Right now we support 2.12 and 2.17. I suspect the next one to add is one of 2.27 or 2.28. We have a related question of adopting a new distribution from which to build CDTs and supply our default linux environment.

@h-vetinari
Copy link
Member

It's high time conda ecosystem evolved beyond GBLIC 2.17 to accommodate Python packages built with newer toolchain whose runtime libraries require GLIBC > 2.17.

Settling the question of what's the next version is hard, but must it be the single version everyone must agree to? Is it possible for conda to have multiple sysroot versions for the same platform?

A lot of similar discussions happened for manylinux_2_28 (successor to manylinux2014 == manylinux_2_17), which have some relevant information1, particularly the realisation that AlmaLinux / RockyLinux / rhubi are all effectively a full-fledged replacement to CentOS2.

Even though we technically could have several, I think it would be easiest to just choose one of those RHEL-ABI-compatible distros, which would also continue in the spirit of why CentOS was a good choice previously.

Footnotes

  1. even though manylinux has harder constraints than conda-forge (no own compiler infra), and therefore settled one of the (following the demise of CentOS) several RHEL-alikes, which benefit from the devtoolset backports of newer GCCs to old OSes.

  2. This was after a failed attempt at getting the Debian-based manylinux_2_24 off the ground (to reduce the glibc version jump from CentOS 7).

@jakirkham
Copy link
Member Author

We could look at AlmaLinux 8, which has aarch64 & ppc64le support

@beckermr
Copy link
Member

Yeah alma 8 is a good choice.

@h-vetinari
Copy link
Member

h-vetinari commented Apr 18, 2023

I just saw that as of LLVM 17, libcxx only supports glibc >=2.24. LLVM 17 will be released in a couple months.

The good thing is that (compared to e.g. #1844), libcxx-on-linux isn't part of our default compiler stack. GCC & libstd++ claim to only require the 20+ year old glibc 2.3, though I'm doubtful if anyone still tests gcc with something that old.

OTOH, libstdc++ docs also say:

4.7. Recent GNU/Linux glibc required?

[...] The guideline is simple: the more recent the C++ library, the more recent the C library. (This is also documented in the main GCC installation instructions.)

In fact, since Microsoft finally started supporting C11/C17 a few years ago (thus unblocking cross-platform projects from having to stay on C89), several projects are now moving to require C11 (including CPython), which needs a newer glibc, c.f. e.g. conda-forge/linux-sysroot-feedstock#44.

While I don't want to rehash the discussion in #1436, the EOL of CentOS 7 is now about a year away, from which point on the bitrot of support for old glibc (resp. the move towards requiring C11+) will likely accelerate even more. We definitely need newer sysroots soon.

@beckermr
Copy link
Member

Last I checked alma 8 was a good choice for us.

There is a list of todo items for whatever we choose

  • check with anaconda on what they are doing
  • decide on the cdt name (eg alma8)
  • put in changes into the cdt scripts to build alma8 ones
  • build the sysroots
  • assemble docker images

I am sure missing stuff but that's a start.

@kkraus14
Copy link
Contributor

Last I checked alma 8 was a good choice for us.

This also aligns with manylinux 2_28 which is AlmaLinux 8 based. This gives us glibc 2.28 which does have a few possible incompatibilities:

  • Ubuntu 18.04 isn't quite EOL yet and uses glibc 2.27. Ubuntu 20.04 upgraded to glibc 2.31 and Ubuntu 22.04 upgraded to glibc 2.35.
  • SUSE 12 isn't EOL and uses glibc 2.22. SUSE 15 upgraded to glibc 2.31.

@h-vetinari
Copy link
Member

h-vetinari commented Apr 18, 2023

This also aligns with manylinux 2_28 which is AlmaLinux 8 based

Yup, I linked the discussion to the pypa/manylinux issue where this was decided a bit further up.

This gives us glibc 2.28 which does have a few possible incompatibilities

In the context of the discussion from #1436, this is not about raising the lower bound to 2.28, but about enabling to build packages that (for whatever reason) require glibc >2.17. IOW, the upstream maintainers have already lifted their floor past our current ceiling and so we need to have a way to build such packages. But dropping CentOS 6, much less 7, is a whole 'nother ballpark1.

Footnotes

  1. I'd be in favour of the former, but there was a lot of discussion in Dropping CentOS 6 & Moving to CentOS 7 #1436, and for now the opt-in to newer glibc's seems to be working well enough that we haven't been forced to abandon CentOS 6 as the default yet.

@jakirkham
Copy link
Member Author

Thanks for adding this to the agenda, Axel! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

9 participants