New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky ray tests #3430
Fix flaky ray tests #3430
Conversation
Unit Test Results (with flaky tests) 886 files - 74 886 suites - 74 9h 59m 22s ⏱️ + 5m 27s Results for commit d627a27. ± Comparison against base commit 7b5346e. ♻️ This comment has been updated with latest results. |
06e9969
to
4e9c44c
Compare
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
This reverts commit 6f9e7f9. Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
This reverts commit df5a49b. Signed-off-by: Enrico Minack <github@enrico.minack.dev>
d21b8a2
to
d627a27
Compare
test/single/test_ray.py
Outdated
# The code after the yield will run as teardown code. | ||
ray.shutdown() | ||
finally: | ||
if orig_devices: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
and ""
have different meanings for CUDA_VISIBLE_DEVICES
. Probably safer to be explicit:
if orig_devices is not None:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, will better do that then.
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
477a95a
to
0f047ff
Compare
Ray tests assert that available ray resources after the test are identical to before the test. This turns out to be flaky.
This removes the
check_resources
assertion as it has race conditions. Further, this restores theCUDA_VISIBLE_DEVICES
environment variable after tests finish that modify it.And this improves assertion context to further debug the issue of 8 GPUs though 4 workers have been started.