Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Build Horovod in test images with HOROVOD_DEBUG=1 #3307

Open
maxhgerlach opened this issue Dec 8, 2021 · 0 comments
Open

CI: Build Horovod in test images with HOROVOD_DEBUG=1 #3307

maxhgerlach opened this issue Dec 8, 2021 · 0 comments

Comments

@maxhgerlach
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Some times unit tests may trigger bugs in Horovod's C++ backend. These bugs may cause segmentation faults at times or, if we are unlucky, may go unnoticed although some internal state is corrupted. We have some assert() macros all over the code base, however these assertions are not checked in release mode. They would be useful though to identify bugs before they trigger segmentation faults. Assertion failure messages are also more specific and easier to understand than segfaults.

One example would be part 2 (related to hvd.barrier) of PR #3300. In local debug builds an assertion failure would be raised before the segmentation fault was triggered.

Describe the solution you'd like
I propose to set the environment variable HOROVOD_DEBUG=1 when building Horovod in CI test containers. Then debug symbols will be included and assertions will be checked at runtime.

Describe alternatives you've considered
Counter arguments I can think of:

  • Tests might take a bit longer to run with debug code: I don't believe that there would be significant slowdowns, but of course I could be proven wrong.
  • Certain bugs might only be observable in release builds, not in debug builds: This would be bad, but personally I would expect that we miss more problems because assertions are not checked. Some test cases could still be built in release mode to partially cover this situation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

1 participant