Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tests for P2P stable ordering #8458

Merged
merged 2 commits into from
Feb 13, 2024
Merged

Conversation

hendrikmakait
Copy link
Member

Follow-up to #8453
Closes dask/dask#10708 by including regression test.

  • Tests added / passed
  • Passes pre-commit run --all-files

Copy link
Contributor

github-actions bot commented Jan 12, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    26 files  +     8      26 suites  +8   9h 20m 55s ⏱️ + 4h 4m 50s
 3 966 tests +     7   3 855 ✅ +     6    109 💤 ±  0  2 ❌ +1 
47 329 runs  +14 567  45 098 ✅ +13 838  2 229 💤 +728  2 ❌ +1 

For more details on these failures, see this check.

Results for commit bba2b06. ± Comparison against base commit a31744f.

This pull request removes 2 and adds 9 tests. Note that renamed tests count towards both.
distributed.shuffle.tests.test_shuffle ‑ test_stable_ordering[False]
distributed.shuffle.tests.test_shuffle ‑ test_stable_ordering[True]
distributed.protocol.tests.test_pickle ‑ test_pickle_functions_in_main
distributed.shuffle.tests.test_shuffle ‑ test_drop_duplicates_stable_ordering[first-False]
distributed.shuffle.tests.test_shuffle ‑ test_drop_duplicates_stable_ordering[first-True]
distributed.shuffle.tests.test_shuffle ‑ test_drop_duplicates_stable_ordering[last-False]
distributed.shuffle.tests.test_shuffle ‑ test_drop_duplicates_stable_ordering[last-True]
distributed.shuffle.tests.test_shuffle ‑ test_shuffle_stable_ordering[first-False]
distributed.shuffle.tests.test_shuffle ‑ test_shuffle_stable_ordering[first-True]
distributed.shuffle.tests.test_shuffle ‑ test_shuffle_stable_ordering[last-False]
distributed.shuffle.tests.test_shuffle ‑ test_shuffle_stable_ordering[last-True]

♻️ This comment has been updated with latest results.

Comment on lines 2703 to 2717
@pytest.mark.parametrize("disk", [True, False])
@pytest.mark.parametrize("keep", ["first", "last"])
@gen_cluster(client=True)
async def test_shuffle_stable_ordering(c, s, a, b, keep, disk):
"""Ensures that shuffling guarantees ordering for individual entries
belonging to the same shuffle key"""
df = dask.datasets.timeseries(
start="2000-01-01",
end="2000-02-01",
dtypes={"x": int, "y": int, "z": int},
freq="1 s",
)
df["x"] = df["x"] % 23
df["y"] = df["y"] % 19
df["z"] = df["z"] % 17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is easier to construct a dataframe very explicitly and assert explicitly on the output group instead of testing this indirectly via drop_duplicates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's easier to write the test, but I see your point that it might be easier to argue about it.

@fjetter fjetter merged commit 24a88f5 into dask:main Feb 13, 2024
28 of 34 checks passed
@hendrikmakait hendrikmakait deleted the improved-test branch February 13, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shuffle-based drop duplicates produces incorrect result with shuffle="p2p"
2 participants