Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove test_ucx_config_w_env_var flaky condition #5765

Merged

Conversation

pentschev
Copy link
Member

Mark test_ucx_config_w_env_var flaky independent of UCX version. Using 127.0.0.1 greatly reduces the probability of the issue, but it still happens ~1% of the time when I run it locally, therefore marking it flaky for the time being.

@pentschev
Copy link
Member Author

cc @charlesbluca @jakirkham as per your comments in #5229 .

@pentschev pentschev changed the title Remove test_ucx_config_w_env_var flaky condition Remove test_ucx_config_w_env_var flaky condition Feb 4, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2022

Unit Test Results

       18 files  ±0         18 suites  ±0   10h 49m 56s ⏱️ + 1h 38m 0s
  2 596 tests +1    2 513 ✔️  -   1       79 💤  -   1  3 +2  1 🔥 +1 
23 230 runs  +9  21 678 ✔️  - 91  1 547 💤 +96  4 +3  1 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit 3f0f589. ± Comparison against base commit dbaae68.

♻️ This comment has been updated with latest results.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Peter 🙂

Had a few questions below

@@ -95,23 +95,26 @@ async def test_ucx_config(cleanup):


@pytest.mark.flaky(
reruns=10, reruns_delay=5, condition=ucp.get_ucx_version() < (1, 11, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we guarantee having 1.11.0+ at this point? Do we document what versions we support anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm dropping that condition to mark the test flaky to all UCX versions. In the next release we'll drop UCX < 1.11.1 support rapidsai/ucx-py#829, but today we theoretically support any version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Might be worth thinking about how to document UCX compatibility in Dask at some point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually left to UCX-Py, not Dask, so it's gonna be whatever UCX-Py supports, therefore I don't think it makes sense to document UCX as it's an indirect dependency. This will be more evident soon when we drop support for UCX < 1.11.1 and then there will be no more checks for UCX version in code, I have changes prepared but am still in the process of testing multi-GPU to ensure nothing breaks.

)
def test_ucx_config_w_env_var(cleanup, loop):
env = os.environ.copy()
env["DASK_RMM__POOL_SIZE"] = "1000.00 MB"

port = "13339"
sched_addr = f"ucx://{HOST}:{port}"
sched_addr = f"ucx://127.0.0.1:{port}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why HOST doesn't work for us

Also should we assign to a local host variable if this needs to change in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work, what I noticed is 127.0.0.1 fails less often, that's the only reason I'm changing it here as most other Distributed tests seem to rely on 127.0.0.1 as well, see https://github.com/dask/distributed/blob/main/distributed/cli/tests/test_dask_scheduler.py for instance. I think we don't need to worry about a host variable here honestly, being straightforward seems good enough.

Copy link
Member

@jakirkham jakirkham Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok maybe we should leave a comment about that. Perhaps something like this

Suggested change
sched_addr = f"ucx://127.0.0.1:{port}"
# Use localhost directly (this appears to be less flaky than HOST)
sched_addr = f"ucx://127.0.0.1:{port}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3f0f589, I expanded a bit your comment suggestion too.

@quasiben
Copy link
Member

Plan to merge in later this afternoon unless there are other comments

@quasiben
Copy link
Member

This is failing on test_pause_executor which is unrelated to this PR. Thanks @pentschev -- merging in

@quasiben quasiben merged commit 820427a into dask:main Feb 14, 2022
@jakirkham
Copy link
Member

Thank Peter! 😄

@pentschev
Copy link
Member Author

Thanks for reviews @jakirkham and @quasiben !

@pentschev pentschev deleted the remove-flaky-condition-test_ucx_config_w_env_var branch May 25, 2022 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants