-
-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove test_ucx_config_w_env_var
flaky condition
#5765
Remove test_ucx_config_w_env_var
flaky condition
#5765
Conversation
cc @charlesbluca @jakirkham as per your comments in #5229 . |
test_ucx_config_w_env_var
flaky condition
Unit Test Results 18 files ±0 18 suites ±0 10h 49m 56s ⏱️ + 1h 38m 0s For more details on these failures and errors, see this check. Results for commit 3f0f589. ± Comparison against base commit dbaae68. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Peter 🙂
Had a few questions below
@@ -95,23 +95,26 @@ async def test_ucx_config(cleanup): | |||
|
|||
|
|||
@pytest.mark.flaky( | |||
reruns=10, reruns_delay=5, condition=ucp.get_ucx_version() < (1, 11, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we guarantee having 1.11.0+ at this point? Do we document what versions we support anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm dropping that condition to mark the test flaky to all UCX versions. In the next release we'll drop UCX < 1.11.1 support rapidsai/ucx-py#829, but today we theoretically support any version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Might be worth thinking about how to document UCX compatibility in Dask at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is actually left to UCX-Py, not Dask, so it's gonna be whatever UCX-Py supports, therefore I don't think it makes sense to document UCX as it's an indirect dependency. This will be more evident soon when we drop support for UCX < 1.11.1 and then there will be no more checks for UCX version in code, I have changes prepared but am still in the process of testing multi-GPU to ensure nothing breaks.
) | ||
def test_ucx_config_w_env_var(cleanup, loop): | ||
env = os.environ.copy() | ||
env["DASK_RMM__POOL_SIZE"] = "1000.00 MB" | ||
|
||
port = "13339" | ||
sched_addr = f"ucx://{HOST}:{port}" | ||
sched_addr = f"ucx://127.0.0.1:{port}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why HOST
doesn't work for us
Also should we assign to a local host
variable if this needs to change in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work, what I noticed is 127.0.0.1
fails less often, that's the only reason I'm changing it here as most other Distributed tests seem to rely on 127.0.0.1
as well, see https://github.com/dask/distributed/blob/main/distributed/cli/tests/test_dask_scheduler.py for instance. I think we don't need to worry about a host
variable here honestly, being straightforward seems good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok maybe we should leave a comment about that. Perhaps something like this
sched_addr = f"ucx://127.0.0.1:{port}" | |
# Use localhost directly (this appears to be less flaky than HOST) | |
sched_addr = f"ucx://127.0.0.1:{port}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3f0f589, I expanded a bit your comment suggestion too.
Plan to merge in later this afternoon unless there are other comments |
This is failing on |
Thank Peter! 😄 |
Thanks for reviews @jakirkham and @quasiben ! |
Mark
test_ucx_config_w_env_var
flaky independent of UCX version. Using127.0.0.1
greatly reduces the probability of the issue, but it still happens ~1% of the time when I run it locally, therefore marking it flaky for the time being.