Update to CMake 3.13 for better CUDA support and to enable build concurrency #3261

maxhgerlach · 2021-11-05T17:06:53Z

Checklist before submitting

Did you read the contributor guide?
Did you update the docs?
Did you write any tests to validate this change?
Did you update the CHANGELOG, if this change affects users?

Description

This PR replaces the build process for Horovod's CUDA kernels by one relying on features offered in recent versions of CMake. In particular, the deprecated module FindCUDA is replaced by CMake's first-class CUDA language support and the more modern module FindCUDAToolkit. This fixes the race condition of #2543 and allows us to re-enable build concurrency via -j, which will certainly be appreciated in many places.

To ensure that these new features are available I bumped up the minimum required version to ~~3.18~~ 3.13. I believe that this should not be a big problem even on older systems as a recent CMake can usually be obtained easily via pip install cmake.

Edit: By shipping a module based on FindCUDAToolkit from CMake 3.17.5 we can build with CMake >= 3.13.

I am not an expert with CMake by any means, so any feedback would be more than welcome!

Fixes #2543.

Review process to land

All tests and other checks must succeed.
At least one member of the technical steering committee must review and approve.
If any member of the technical steering committee requests changes, they must be addressed.

chongxiaoc · 2021-11-05T17:25:24Z

Mention it related to #2543

github-actions · 2021-11-05T18:02:57Z

Unit Test Results

    830 files ±0     830 suites ±0 9h 28m 2s ⏱️ + 30m 7s
    717 tests ±0     672 ✔️ ±  0     45 💤 ±  0 0 ❌ ±0
17 988 runs ±0 12 644 ✔️ - 14 5 344 💤 +14 0 ❌ ±0

Results for commit 9f42442. ± Comparison against base commit 31bba3b.

♻️ This comment has been updated with latest results.

github-actions · 2021-11-05T18:03:12Z

Unit Test Results (with flaky tests)

    962 files +  44     962 suites +44 10h 51m 56s ⏱️ + 1h 11m 6s
    717 tests ±    0     666 ✔️ -     5     45 💤 ±  0 6 ❌ +5
20 732 runs +714 14 539 ✔️ +639 6 184 💤 +68 9 ❌ +7

For more details on these failures, see this check.

Results for commit 9f42442. ± Comparison against base commit 31bba3b.

♻️ This comment has been updated with latest results.

nvcastet · 2021-11-11T15:55:38Z

@maxhgerlach: To solve the build concurrency, if i remember correctly the problem was not the cmake version. It is related to the fact we build 2 versions of the same library which is causing issues with the intermediate files that get overwritten when the 2 versions are built concurrently.
I have not had the chance to make a PR for it. But the fix is easy, it is just adding a build dependency between horovod_cuda_kernels and compatible_horovod_cuda_kernels. That solved the problem when I tested the statibility of the build in a forloop.

nvcastet · 2021-11-11T16:01:49Z

I also think it is good idea to update the cmake version too. :)
FYI, just for comparison, pytorch has a minimum of 3.10 https://github.com/pytorch/pytorch/blob/master/CMakeLists.txt#L1
I don't have a list of the latest cmake supported on the different package managers (pip, conda...) with different architectures and with different OSs. But for sure we probably do not want to block someone not having the good version of cmake in their production environment since our compilation happens at install time.

nvcastet · 2021-11-11T16:06:24Z

@maxhgerlach For build concurrency, just adding

add_dependencies(compatible_horovod_cuda_kernels horovod_cuda_kernels)

will do the trick.

maxhgerlach · 2021-11-17T18:33:27Z

Hi @nvcastet, thanks for your comments!

To solve the build concurrency, if i remember correctly the problem was not the cmake version. It is related to the fact we build 2 versions of the same library which is causing issues with the intermediate files that get overwritten when the 2 versions are built concurrently.

I was under the impression that this race condition was ultimately caused by a bug in CMake's FindCUDA module (needed for the two cuda_add_library() calls in master), which has been deprecated for a while now. After some analysis @Flamefire summarized the situation in #2543 (comment) and also reported the issue upstream at https://gitlab.kitware.com/cmake/cmake/-/issues/21623.

For Horovod 1.0 I think it would be beneficial to cut the ties to this deprecated module, which may or may not work correctly in various versions of CMake, and instead move on to CMake's first-class CUDA language support and fix the race condition at the same time (as proposed by this draft PR). enable_language(CUDA) was already introduced with CMake 3.8, but the FindCUDA module could not be dropped completely then because of some ancillary functionality. The replacement for that (finding CUDA include and library directories for non-CUDA targets etc.) is now provided by FindCUDAToolkit, which came with CMake 3.17.

I also think it is good idea to update the cmake version too. :)
FYI, just for comparison, pytorch has a minimum of 3.10 https://github.com/pytorch/pytorch/blob/master/CMakeLists.txt#L1
I don't have a list of the latest cmake supported on the different package managers (pip, conda...) with different architectures and with different OSs. But for sure we probably do not want to block someone not having the good version of cmake in their production environment since our compilation happens at install time.

If we don't want to require a version quite as recent as 3.18 or 3.17, we may also get away with packaging just that FindCUDAToolkit module and requiring some intermediate version of CMake >= 3.8 (as suggested by @leezu in #2543 (comment)). What do you think?

nvcastet · 2021-11-17T23:55:56Z

Thanks @maxhgerlach! I missed @Flamefire's comment on cuda_add_library() being buggy when used on several versions of the same library and FindCUDAToolkit fixing the issue. In that case, I agree upgrading cmake is better than adding the extra build dependency between the libraries.
Thanks a lot for the thorough explanation and the PR to upgrade cmake.
I found the list of cmake versions that come by default on the different OSs: https://gitlab.kitware.com/cmake/community/-/wikis/CMake-Versions-on-Linux-Distros
I agree that users can easily get a newer version via pip or conda. So jumping to 3.18 may not be an issue.
@leezu Do you know why MXNet does not move to cmake 3.17 or 3.18 to get FindCUDAToolkit directly from cmake?
@tgaddair What are your thoughts on the cmake version?

maxhgerlach · 2021-11-19T09:32:31Z

The only tests that still fail are for torchhead and mxnethead and these issues appear to have been fixed on master. Apart from that all the builds and tests have run fine in the CI now.

The eightfold build concurrency seems to have shaved off a few minutes from the docker build times, but I'm not sure how comparable these are between Github actions workflow runs at different times.

From the table that @nvcastet linked to, requiring only (say) CMake 3.10 instead of 3.18 would enable people to build with the standard package sources of these distros:

Red Hat 8 (2019)
openSUSE 15.1 (2019)
Ubuntu 18.04 (2018)
Debian 10 (2019)

Apparently even Ubuntu 20.04 only comes with CMake 3.16 and would require users to add a more recent extra package.

Then again it's really pretty easy to get a recent CMake via pip, conda, snap, or a PPA or similar (and many C++ projects require this).

Anyway, if we find it worthwhile to lower the version requirement somewhat from 3.18, I'd be willing to look into it, but it might take some time.

EnricoMi · 2021-11-19T20:29:34Z

This is awesome! Build time in GitHub Actions sadly do not benefit much from the concurrency as the workers have only two cores. But users definitively benefit from this. 🎉

maxhgerlach · 2021-11-24T10:39:32Z

Good point regarding the number of cores available to the workers, @EnricoMi! A default MAKEFLAGS=-j8 still seems to work fine, though, even if that's more processes than the VMs can use effectively.

When I set unlimited scaling with -j, I had weird problems with hanging builds and disappearing logs (this run https://github.com/horovod/horovod/runs/4243022771), so probably some effective limit to memory or ... was exceeded. I had assumed that make -j would not schedule more processes than the number of available CPU threads, but a brief look into man make just proved me wrong.

EnricoMi · 2021-11-24T12:04:28Z

Interesting default imposed by make. You could set MAKEFLAGS=-j2 in our ci.yaml if the Horovod default of -j8 is a problem for GitHub.

maxhgerlach · 2021-12-16T11:24:06Z

After rebasing to master, enable_language(CUDA) does not work anymore on the ppc64le Jenkins worker.

I would get an "error: identifier "__ieee128" is undefined". This appears to be a bug with GCC 8+ and CUDA 10. See LLNL/blt#341 (comment) I decided to disable quadruple precision there via -mno-float128, which shouldn't be a issue for Horovod.

libstdc++ with gcc 8.2, however, has a bug that prevents compilation with -mno-float128: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84654
We would need to upgrade to 8.3 at least. However, I didn't manage to do that via conda in our Jenkinsfile. Trying to downgrade to 7.3 also hangs for a while, then fails with lots of conflicts (edit) https://powerci.osuosl.org/job/Horovod_PPC64LE_GPU_PIPELINE/view/change-requests/job/PR-3261/13/console

@nvcastet, would you know how to easily upgrade or downgrade the compiler in that Docker container?

nvcastet · 2021-12-16T16:07:20Z

Most of the packages for the OpenCE release are built with 8.2.0:
https://github.com/open-ce/horovod-feedstock/blob/main/config/conda_build_config.yaml
So it would be great to stay on a matching version and we don't want to break their infra.
@maxhgerlach What are the settings that change when building with enable_language(CUDA) (compiler flags?) vs cmake we currently use (where ppc64le builds fine)?
@npanpaliya Any thought?

maxhgerlach · 2021-12-16T23:34:17Z

OK, then it will make sense to stick with gcc 8.2 and search some other workaround. 🙂

What are the settings that change when building with enable_language(CUDA) (compiler flags?) vs cmake we currently use (where ppc64le builds fine)?

With enable_language(CUDA) CMake appears to compile a test program CMakeCUDACompilerId.cu (likely generated from CMakeCUDACompilerId.cu.in) and this fails in the current ppc64le container (copied from https://powerci.osuosl.org/job/Horovod_PPC64LE_GPU_PIPELINE/view/change-requests/job/PR-3261/8/console):

      #$ "/opt/anaconda3/envs/wmlce/bin"/powerpc64le-conda_cos7-linux-gnu-c++
      -D__CUDA_ARCH__=300 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__
      -D__NVCC__ "-I/usr/local/cuda/bin/../targets/ppc64le-linux/include"
      -D__CUDACC_VER_MAJOR__=10 -D__CUDACC_VER_MINOR__=2
      -D__CUDACC_VER_BUILD__=89 -include "cuda_runtime.h"
      "CMakeCUDACompilerId.cu" -o "tmp/CMakeCUDACompilerId.cpp1.ii"

      #$ cicc --c++14 --gnu_version=80200 --allow_managed --unsigned_chars -arch
      compute_30 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name
      "CMakeCUDACompilerId.fatbin.c" -tused -nvvmir-library
      "/usr/local/cuda/bin/../nvvm/libdevice/libdevice.10.bc"
      --gen_module_id_file --module_id_file_name
      "tmp/CMakeCUDACompilerId.module_id" --orig_src_file_name
      "CMakeCUDACompilerId.cu" --gen_c_file_name
      "tmp/CMakeCUDACompilerId.cudafe1.c" --stub_file_name
      "tmp/CMakeCUDACompilerId.cudafe1.stub.c" --gen_device_file_name
      "tmp/CMakeCUDACompilerId.cudafe1.gpu" "tmp/CMakeCUDACompilerId.cpp1.ii" -o
      "tmp/CMakeCUDACompilerId.ptx"


      /opt/anaconda3/envs/wmlce/powerpc64le-conda_cos7-linux-gnu/include/c++/8.2.0/type_traits(335):
      error: identifier "__ieee128" is undefined

(Before the last update of the ppc64le Jenkins we were on gcc 7.3, which doesn't trigger this bug.)

I think the --c++14 in the second command line might explain why we don't see the issue when we build Horovod's CUDA kernels. Those are still compiled in C++11 mode. So sneaking in a -std=c++11 might help here!

maxhgerlach · 2021-12-17T09:14:02Z

Putting -std=c++11 into CMAKE_CUDA_FLAGS has indeed fixed the ppc64le build.

maxhgerlach · 2022-01-11T15:01:09Z

The latest test failure appears to be something related to Ray on Buildkite and is probably not caused by this PR.

nvcastet

Looks good to me. Thanks Max for the PR!

nvcastet · 2022-01-11T16:46:58Z

CMakeLists.txt

+if (CMAKE_CUDA_COMPILER)
+    if ((CMAKE_CXX_COMPILER_ID MATCHES GNU) AND (CMAKE_SYSTEM_PROCESSOR MATCHES ppc64le))
+        if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 8.0)
+            set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -std=c++11")


Why is that needed if in horovod/common/ops/cuda/CMakeLists.txt, you already set:

set(CMAKE_CUDA_STANDARD 11)

?

Hi @nvcastet, thanks for the review and all the advice earlier!

The -std=c++11 flag here is for enable_language(CUDA) in this top-level CMakeLists.txt. CMake will apparently compile a test program at that point to gauge whether the compiler is really set up correctly etc. That fails on ppc64le with our versions of gcc and CUDA, however, because of a float128-related bug and one way to circumvent that is to disable C++14 support. I tried set(CMAKE_CUDA_STANDARD 11) first to achieve that, but that setting is apparently ignored at this stage, so we got the same error: Jenkins log, intermediate commit. In contrast, CMAKE_CUDA_FLAGS is not ignored there. I don't know why.

From https://cliutils.gitlab.io/modern-cmake/chapters/packages/CUDA.html: Unlike the older languages, CUDA support has been rapidly evolving, and building CUDA is hard, so I would recommend you require a very recent version of CMake! CMake 3.17 and 3.18 have a lot of improvements directly targeting CUDA. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

…indCUDAToolkit Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Else build arg NCCL_VERSION does not override env variable from base container. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

This appears to be a bug with GCC 8+ and CUDA 10. It's mitigated by not building with C++11. Alternatively we could disable quadruple precision. LLNL/blt#341 (comment) However, libstdc++8 with gcc 8.2 has a bug preventing compilation with-mno-float128. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84654 Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

We achieve this by shipping a FindCUDAToolkit.cmake based on CMake 3.17.5. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Version 3.13 seems to be unavailable via Kitware's apt repo and the pip command line is easier anyway. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

maxhgerlach · 2022-01-17T17:06:58Z

Merging this now as overall feedback was positive.

I'll post a follow-up PR shortly to automatically install a recent CMake to a temporary location and use that to build Horovod.

maxhgerlach changed the title ~~Update to CUDA support from CMake 3.18 and enable build concurrency~~ Update to CMake 3.18 for better CUDA support and enable build concurrency Nov 5, 2021

maxhgerlach marked this pull request as draft November 5, 2021 17:17

maxhgerlach changed the title ~~Update to CMake 3.18 for better CUDA support and enable build concurrency~~ Update to CMake 3.18 for better CUDA support and to enable build concurrency Nov 5, 2021

maxhgerlach force-pushed the update-cmake branch from 9564837 to 4d6854e Compare November 17, 2021 17:51

maxhgerlach force-pushed the update-cmake branch from 4589cad to e9ffc85 Compare November 18, 2021 16:05

maxhgerlach marked this pull request as ready for review November 19, 2021 09:32

maxhgerlach force-pushed the update-cmake branch from e9ffc85 to 616c419 Compare December 16, 2021 09:45

maxhgerlach marked this pull request as draft December 16, 2021 09:56

maxhgerlach force-pushed the update-cmake branch 4 times, most recently from c964fa2 to 895e1e0 Compare December 16, 2021 11:11

maxhgerlach force-pushed the update-cmake branch 2 times, most recently from 5d758a5 to 7d55479 Compare December 17, 2021 08:56

nvcastet approved these changes Jan 11, 2022

View reviewed changes

maxhgerlach added 19 commits January 15, 2022 12:15

Fix a doc typo

3a3c3b9

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Replace deprecated usage of FindCUDA with CMake's CUDA language and F…

d0533b3

…indCUDAToolkit Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Enable build parallelism by default

db1c859

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Remove a few concessions for old versions of CMake

bdfe11f

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Update Dockerfiles to install CMake 3.18

dac507f

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Mention DOCKER_BUILDKIT=1 in Readme

d81cb54

Else build arg NCCL_VERSION does not override env variable from base container. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Update some docs and changelog

f1be10a

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Fix --std=c++... argument redefinition

b51882f

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Fix enabling CUDA for mixed cpu+gpu Horovod builds

bebd206

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Require cmake>=3.18 in Jenkinsfile.ppc64le

852bf85

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Limit default build concurrency to -j8

9ddea9c

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Add misging gpg to Dockerfiles

7e28c00

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Update changelog

7c15c17

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Fix sudo in horovod-ray/Dockerfile

cfc6058

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Fix linking CUDA runtime libraries (fixes TF 1.15 test case)

c7ad067

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Reduce required CMake version to 3.13

97c8825

We achieve this by shipping a FindCUDAToolkit.cmake based on CMake 3.17.5. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

Docker: Install CMake via pip

9f42442

Version 3.13 seems to be unavailable via Kitware's apt repo and the pip command line is easier anyway. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>

maxhgerlach force-pushed the update-cmake branch from 034c070 to 9f42442 Compare January 15, 2022 11:16

maxhgerlach merged commit a5edcd0 into horovod:master Jan 17, 2022

maxhgerlach mentioned this pull request Jan 17, 2022

Build Horovod with temporarily installed CMake if necessary #3371

Merged

4 tasks

maxhgerlach deleted the update-cmake branch January 18, 2022 14:50

maxhgerlach mentioned this pull request Feb 25, 2022

Fix FindNVTX.cmake #3421

Merged

ioga mentioned this pull request Mar 9, 2022

pytorch model weight updates aren't averaged when running on GKE #3461

Closed

This was referenced Jul 5, 2023

remove superflous dependencies from Horovod easyconfig on top of PyTorch easybuilders/easybuild-easyconfigs#18262

Merged

{tools}[foss/2021b,foss/2022a] Horovod w/ TensorFlow easybuilders/easybuild-easyconfigs#18265

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to CMake 3.13 for better CUDA support and to enable build concurrency #3261

Update to CMake 3.13 for better CUDA support and to enable build concurrency #3261

maxhgerlach commented Nov 5, 2021 •

edited

chongxiaoc commented Nov 5, 2021

github-actions bot commented Nov 5, 2021 •

edited

github-actions bot commented Nov 5, 2021 •

edited

nvcastet commented Nov 11, 2021 •

edited

nvcastet commented Nov 11, 2021 •

edited

nvcastet commented Nov 11, 2021 •

edited

maxhgerlach commented Nov 17, 2021 •

edited

nvcastet commented Nov 17, 2021

maxhgerlach commented Nov 19, 2021

EnricoMi commented Nov 19, 2021

maxhgerlach commented Nov 24, 2021

EnricoMi commented Nov 24, 2021

maxhgerlach commented Dec 16, 2021 •

edited

nvcastet commented Dec 16, 2021

maxhgerlach commented Dec 16, 2021 •

edited

maxhgerlach commented Dec 17, 2021 •

edited

maxhgerlach commented Jan 11, 2022

nvcastet left a comment

nvcastet Jan 11, 2022

maxhgerlach Jan 12, 2022

maxhgerlach commented Jan 17, 2022

Update to CMake 3.13 for better CUDA support and to enable build concurrency #3261

Update to CMake 3.13 for better CUDA support and to enable build concurrency #3261

Conversation

maxhgerlach commented Nov 5, 2021 • edited

Checklist before submitting

Description

Review process to land

chongxiaoc commented Nov 5, 2021

github-actions bot commented Nov 5, 2021 • edited

Unit Test Results

github-actions bot commented Nov 5, 2021 • edited

Unit Test Results (with flaky tests)

nvcastet commented Nov 11, 2021 • edited

nvcastet commented Nov 11, 2021 • edited

nvcastet commented Nov 11, 2021 • edited

maxhgerlach commented Nov 17, 2021 • edited

nvcastet commented Nov 17, 2021

maxhgerlach commented Nov 19, 2021

EnricoMi commented Nov 19, 2021

maxhgerlach commented Nov 24, 2021

EnricoMi commented Nov 24, 2021

maxhgerlach commented Dec 16, 2021 • edited

nvcastet commented Dec 16, 2021

maxhgerlach commented Dec 16, 2021 • edited

maxhgerlach commented Dec 17, 2021 • edited

maxhgerlach commented Jan 11, 2022

nvcastet left a comment

Choose a reason for hiding this comment

nvcastet Jan 11, 2022

Choose a reason for hiding this comment

maxhgerlach Jan 12, 2022

Choose a reason for hiding this comment

maxhgerlach commented Jan 17, 2022

maxhgerlach commented Nov 5, 2021 •

edited

github-actions bot commented Nov 5, 2021 •

edited

github-actions bot commented Nov 5, 2021 •

edited

nvcastet commented Nov 11, 2021 •

edited

nvcastet commented Nov 11, 2021 •

edited

nvcastet commented Nov 11, 2021 •

edited

maxhgerlach commented Nov 17, 2021 •

edited

maxhgerlach commented Dec 16, 2021 •

edited

maxhgerlach commented Dec 16, 2021 •

edited

maxhgerlach commented Dec 17, 2021 •

edited