New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Horovod with temporarily installed CMake if necessary #3371
Conversation
Unit Test Results (with flaky tests) 998 files + 108 998 suites +108 9h 58m 38s ⏱️ + 37m 44s For more details on these failures, see this check. Results for commit c9f0ad0. ± Comparison against base commit a5edcd0. ♻️ This comment has been updated with latest results. |
fc803cf
to
c9f0ad0
Compare
@@ -33,6 +33,7 @@ RUN add-apt-repository ppa:ubuntu-toolchain-r/test | |||
RUN apt-get update -qq && apt-get install -y --no-install-recommends \ | |||
wget \ | |||
ca-certificates \ | |||
cmake \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could rely here on setup.py
installing the latest cmake so we always test this really works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this apt-get line currently installs cmake 3.10 on the Ubuntu 18.04 we use for testing (too old to build Horovod). So yes, as it stands, we rely on setup.py to install a temporary CMake 3.13 to build Horovod and the mechanism from this PR is tested. This would change once we update to Ubuntu 20.04 or later.
I left cmake in here for any non-Horovod builds in this Docker container, e.g., for oneCCL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and just for the record: The ppc64le build on Jenkins currently comes with a pretty recent CMake on its own, so using a recent system CMake to build Horovod is also still under test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, our ancient Ubuntu 18.04 means this apt-get intall cmake
is not sufficient. Then it makes sense we need to add that extra code to the CodeQL yaml. Once we update to Ubuntu 20.04 we will lose testing that auto-install. We should remember when we upgrade to keep Ubuntu 18.04 for one of the test combinations, e.g. TF 1.15 to see that auto-install feature still being tested (as long as we want to support Ubuntu 18.04 in our test setup).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remember when we upgrade to keep Ubuntu 18.04 for one of the test combinations, e.g. TF 1.15 to see that auto-install feature still being tested (as long as we want to support Ubuntu 18.04 in our test setup).
That's a good plan!
https://github.com/horovod/horovod/runs/4852098212?check_suite_focus=true#step:10:4652
Yep, here we can see that it used a CMake binary from /tmp
:
11.63 Could not find a recent CMake to build Horovod. Attempting to install CMake 3.13 to a temporary location via pip.
...
13.92 Running CMake in build/temp.linux-x86_64-3.7/RelWithDebInfo:
13.92 /tmp/horovod-cmake-tmpmj4n5l40/bin/run_cmake /tmp/pip-req-build-fzb3ontk -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELWITHDEBINFO=/tmp/pip-req-build-fzb3ontk/build/lib.linux-x86_64-3.7 -DPYTHON_EXECUTABLE:FILEPATH=/usr/bin/python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note: In the meantime we have moved our test images to Ubuntu 20.04, but a few test images still use Ubuntu 18.04, so we now test both.
@@ -100,6 +100,9 @@ jobs: | |||
sed -i -e "s%^# setup ssh service$%${command//$'\n'/\\n}\n\n# setup ssh service%" Dockerfile.test.?pu | |||
|
|||
command=$(cat <<EOF | |||
# Install recent CMake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is installing cmake
through apt-get install
in the Dockerfile
s not sufficient here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need CMake 3.13 to compile the C++ parts of Horovod. On Ubuntu 18.04 (and without extra package sources like a PPA from Kitware) apt-get install cmake
installs CMake 3.10 IIRC.
The mechanism of this PR does not work with the C++ build for CodeQL because we don't use setup.py there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be removed again as we have moved to Ubuntu 20.04 for our test images in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this and also went back to installing CMake via apt in the Dockerfiles based on Ubuntu 20.04 (docker/*/Dockerfile
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -100,6 +100,9 @@ jobs: | |||
sed -i -e "s%^# setup ssh service$%${command//$'\n'/\\n}\n\n# setup ssh service%" Dockerfile.test.?pu | |||
|
|||
command=$(cat <<EOF | |||
# Install recent CMake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be removed again as we have moved to Ubuntu 20.04 for our test images in the meantime.
@@ -33,6 +33,7 @@ RUN add-apt-repository ppa:ubuntu-toolchain-r/test | |||
RUN apt-get update -qq && apt-get install -y --no-install-recommends \ | |||
wget \ | |||
ca-certificates \ | |||
cmake \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note: In the meantime we have moved our test images to Ubuntu 20.04, but a few test images still use Ubuntu 18.04, so we now test both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! We ran into this issue as well using master, so I agree it would be good to get this into the release.
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
(accidental remainder from previous PR) Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
This tests that Horovod can be built with an automatically installed CMake from a temp dir. Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
@maxhgerlach I have rebased this with master, lets see if we now get the tests terminate |
@maxhgerlach looks like the last minute change to the https://github.com/horovod/horovod/runs/5385819691?check_suite_focus=true |
The change triggered
I have fixed that for other images already, will apply the fix here too. |
Great find, thanks @EnricoMi! Somehow Github Actions doesn't show me any logs for the hanging steps. Do you know of any way to get them? Edit: Not possible and it's been a known issue for 2+ years, oof. https://github.community/t/how-to-see-the-full-log-while-a-workflow-is-in-progress/17455 |
The full log becomes available after the job terminates. While a job is running, only new lines are shown. |
Note to self: add timeout to docker build jobs ;-) |
And the only way to get to see those appears to be to keep a tab open and to not let your computer go to sleep... |
Checklist before submitting
Description
This is a follow-up to PR #3261, which would require users of many distros to install a recent CMake from an external source (like
pip
).If such a situation is identified in
setup.py
, we can automatically install a sufficiently recent CMake to a temporary directory and use that to build Horovod. That directory is cleaned up afterwards and the user or system environment is not affected. IfHOROVOD_CMAKE
is set, only that CMake binary is used without any change of behavior.Review process to land