You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Some times unit tests may trigger bugs in Horovod's C++ backend. These bugs may cause segmentation faults at times or, if we are unlucky, may go unnoticed although some internal state is corrupted. We have some assert() macros all over the code base, however these assertions are not checked in release mode. They would be useful though to identify bugs before they trigger segmentation faults. Assertion failure messages are also more specific and easier to understand than segfaults.
One example would be part 2 (related to hvd.barrier) of PR #3300. In local debug builds an assertion failure would be raised before the segmentation fault was triggered.
Describe the solution you'd like
I propose to set the environment variable HOROVOD_DEBUG=1 when building Horovod in CI test containers. Then debug symbols will be included and assertions will be checked at runtime.
Describe alternatives you've considered
Counter arguments I can think of:
Tests might take a bit longer to run with debug code: I don't believe that there would be significant slowdowns, but of course I could be proven wrong.
Certain bugs might only be observable in release builds, not in debug builds: This would be bad, but personally I would expect that we miss more problems because assertions are not checked. Some test cases could still be built in release mode to partially cover this situation.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Some times unit tests may trigger bugs in Horovod's C++ backend. These bugs may cause segmentation faults at times or, if we are unlucky, may go unnoticed although some internal state is corrupted. We have some
assert()
macros all over the code base, however these assertions are not checked in release mode. They would be useful though to identify bugs before they trigger segmentation faults. Assertion failure messages are also more specific and easier to understand than segfaults.One example would be part 2 (related to
hvd.barrier
) of PR #3300. In local debug builds an assertion failure would be raised before the segmentation fault was triggered.Describe the solution you'd like
I propose to set the environment variable
HOROVOD_DEBUG=1
when building Horovod in CI test containers. Then debug symbols will be included and assertions will be checked at runtime.Describe alternatives you've considered
Counter arguments I can think of:
The text was updated successfully, but these errors were encountered: