New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to CMake 3.13 for better CUDA support and to enable build concurrency #3261
Conversation
Mention it related to #2543 |
Unit Test Results (with flaky tests) 962 files + 44 962 suites +44 10h 51m 56s ⏱️ + 1h 11m 6s For more details on these failures, see this check. Results for commit 9f42442. ± Comparison against base commit 31bba3b. ♻️ This comment has been updated with latest results. |
@maxhgerlach: To solve the build concurrency, if i remember correctly the problem was not the cmake version. It is related to the fact we build 2 versions of the same library which is causing issues with the intermediate files that get overwritten when the 2 versions are built concurrently. |
I also think it is good idea to update the cmake version too. :) |
@maxhgerlach For build concurrency, just adding
will do the trick. |
9564837
to
4d6854e
Compare
Hi @nvcastet, thanks for your comments!
I was under the impression that this race condition was ultimately caused by a bug in CMake's FindCUDA module (needed for the two For Horovod 1.0 I think it would be beneficial to cut the ties to this deprecated module, which may or may not work correctly in various versions of CMake, and instead move on to CMake's first-class CUDA language support and fix the race condition at the same time (as proposed by this draft PR).
If we don't want to require a version quite as recent as 3.18 or 3.17, we may also get away with packaging just that FindCUDAToolkit module and requiring some intermediate version of CMake >= 3.8 (as suggested by @leezu in #2543 (comment)). What do you think? |
Thanks @maxhgerlach! I missed @Flamefire's comment on |
4589cad
to
e9ffc85
Compare
The only tests that still fail are for torchhead and mxnethead and these issues appear to have been fixed on master. Apart from that all the builds and tests have run fine in the CI now. The eightfold build concurrency seems to have shaved off a few minutes from the docker build times, but I'm not sure how comparable these are between Github actions workflow runs at different times. From the table that @nvcastet linked to, requiring only (say) CMake 3.10 instead of 3.18 would enable people to build with the standard package sources of these distros:
Apparently even Ubuntu 20.04 only comes with CMake 3.16 and would require users to add a more recent extra package. Then again it's really pretty easy to get a recent CMake via pip, conda, snap, or a PPA or similar (and many C++ projects require this). Anyway, if we find it worthwhile to lower the version requirement somewhat from 3.18, I'd be willing to look into it, but it might take some time. |
This is awesome! Build time in GitHub Actions sadly do not benefit much from the concurrency as the workers have only two cores. But users definitively benefit from this. 🎉 |
Good point regarding the number of cores available to the workers, @EnricoMi! A default When I set unlimited scaling with |
Interesting default imposed by make. You could set |
e9ffc85
to
616c419
Compare
c964fa2
to
895e1e0
Compare
After rebasing to master, I would get an "error: identifier "__ieee128" is undefined". This appears to be a bug with GCC 8+ and CUDA 10. See LLNL/blt#341 (comment) I decided to disable quadruple precision there via libstdc++ with gcc 8.2, however, has a bug that prevents compilation with @nvcastet, would you know how to easily upgrade or downgrade the compiler in that Docker container? |
Most of the packages for the OpenCE release are built with 8.2.0: |
OK, then it will make sense to stick with gcc 8.2 and search some other workaround. 🙂
With
(Before the last update of the ppc64le Jenkins we were on gcc 7.3, which doesn't trigger this bug.) I think the |
5d758a5
to
7d55479
Compare
Putting |
The latest test failure appears to be something related to Ray on Buildkite and is probably not caused by this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks Max for the PR!
if (CMAKE_CUDA_COMPILER) | ||
if ((CMAKE_CXX_COMPILER_ID MATCHES GNU) AND (CMAKE_SYSTEM_PROCESSOR MATCHES ppc64le)) | ||
if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 8.0) | ||
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -std=c++11") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that needed if in horovod/common/ops/cuda/CMakeLists.txt
, you already set:
set(CMAKE_CUDA_STANDARD 11)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nvcastet, thanks for the review and all the advice earlier!
The -std=c++11
flag here is for enable_language(CUDA)
in this top-level CMakeLists.txt
. CMake will apparently compile a test program at that point to gauge whether the compiler is really set up correctly etc. That fails on ppc64le with our versions of gcc and CUDA, however, because of a float128-related bug and one way to circumvent that is to disable C++14 support. I tried set(CMAKE_CUDA_STANDARD 11)
first to achieve that, but that setting is apparently ignored at this stage, so we got the same error: Jenkins log, intermediate commit. In contrast, CMAKE_CUDA_FLAGS
is not ignored there. I don't know why.
From https://cliutils.gitlab.io/modern-cmake/chapters/packages/CUDA.html: Unlike the older languages, CUDA support has been rapidly evolving, and building CUDA is hard, so I would recommend you require a very recent version of CMake! CMake 3.17 and 3.18 have a lot of improvements directly targeting CUDA. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
…indCUDAToolkit Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Else build arg NCCL_VERSION does not override env variable from base container. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
This appears to be a bug with GCC 8+ and CUDA 10. It's mitigated by not building with C++11. Alternatively we could disable quadruple precision. LLNL/blt#341 (comment) However, libstdc++8 with gcc 8.2 has a bug preventing compilation with-mno-float128. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84654 Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
We achieve this by shipping a FindCUDAToolkit.cmake based on CMake 3.17.5. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Version 3.13 seems to be unavailable via Kitware's apt repo and the pip command line is easier anyway. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
034c070
to
9f42442
Compare
Merging this now as overall feedback was positive. I'll post a follow-up PR shortly to automatically install a recent CMake to a temporary location and use that to build Horovod. |
Checklist before submitting
Description
This PR replaces the build process for Horovod's CUDA kernels by one relying on features offered in recent versions of CMake. In particular, the deprecated module
FindCUDA
is replaced by CMake's first-class CUDA language support and the more modern moduleFindCUDAToolkit
. This fixes the race condition of #2543 and allows us to re-enable build concurrency via-j
, which will certainly be appreciated in many places.To ensure that these new features are available I bumped up the minimum required version to
3.183.13. I believe that this should not be a big problem even on older systems as a recent CMake can usually be obtained easily viapip install cmake
.Edit: By shipping a module based on
FindCUDAToolkit
from CMake 3.17.5 we can build with CMake >= 3.13.I am not an expert with CMake by any means, so any feedback would be more than welcome!
Fixes #2543.
Review process to land