Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide stack for suspended coro in test timeout #5446

Merged
merged 6 commits into from Oct 21, 2021

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Oct 21, 2021

This allows us to retrieve some superficial information in case of a test timeout. Typically we only receive a plain timeout. This will print the frame the coro is currently suspended upon timeout. I hope this provides at least some indication of why a test is stuck. It does only show one frame level, though. Nested stack is not available according to the asyncio implementation. I figured this is still better than nothing. ¯_(ツ)_/¯

An example output looks like this when running the new test w/out the pytest exception assert

E               asyncio.exceptions.TimeoutError: Test timeout.
E               Stack for <Task pending name='Task-44' coro=<test_provide_stack_on_timeout.<locals>.inner_test() running at /Users/fjetter/workspace/distributed/distributed/tests/test_utils_test.py:382> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x153ca1100>()]>> (most recent call last):
E                 File "/Users/fjetter/workspace/distributed/distributed/tests/test_utils_test.py", line 382, in inner_test
E                   await asyncio.sleep(sleep_time)

@crusaderky
Copy link
Collaborator

There are deterministic regressions:

FAILED distributed/tests/test_scheduler.py::test_scheduler_init_pulls_blocked_handlers_from_config
FAILED distributed/tests/test_utils_test.py::test_gen_cluster_legacy_implicit
FAILED distributed/tests/test_utils_test.py::test_gen_cluster_legacy_explicit

@fjetter
Copy link
Member Author

fjetter commented Oct 21, 2021

I removed the test_gen_cluster_legacy_implicit / test_gen_cluster_legacy_explicit since I have no idea what they are supposed to test. I assume it's about the gen.coroutine call at the beginning. However, I don't see a reason to not define a gen_cluster decorated function as async and instead raise now.

@crusaderky
Copy link
Collaborator

This looks good. The only issue I have with it is that it will mask TimeoutError's coming from inside the actual test function. In theory we could fix it by inspecting the stack trace and making sure that it was raised by utils_test.py. Unsure if it's worth the extra complexity though.

@fjetter
Copy link
Member Author

fjetter commented Oct 21, 2021

This looks good. The only issue I have with it is that it will mask TimeoutError's coming from inside the actual test function. In theory we could fix it by inspecting the stack trace and making sure that it was raised by utils_test.py. Unsure if it's worth the extra complexity though.

I am raising the TimeoutError as raise TimeoutError(...) from None to strip further context since in most situation this is just overly verbose spam. If there are cases where this provides useful information, we can just raise with cause

distributed/utils_test.py Outdated Show resolved Hide resolved
distributed/utils_test.py Outdated Show resolved Hide resolved
Co-authored-by: crusaderky <crusaderky@gmail.com>
@crusaderky crusaderky merged commit 33d83bc into dask:main Oct 21, 2021
@fjetter fjetter deleted the provide_stack_on_timeout branch October 22, 2021 10:10
pentschev added a commit to pentschev/dask-cuda that referenced this pull request Oct 22, 2021
The way coroutines are handled in Distributed test with gen_cluster was
changed in dask/distributed#5446, breaking the
two tests that still relied on gen_cluster. Since we don't encourage
using using coroutines/yield, there's really little reason to keep those
tests.
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this pull request Oct 22, 2021
The way coroutines are handled in Distributed test with gen_cluster was
changed in dask/distributed#5446, breaking the
two tests that still relied on gen_cluster. Since we don't encourage
using using coroutines/yield, there's really little reason to keep those
tests.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Benjamin Zaitlen (https://github.com/quasiben)

URL: #758
zanieb pushed a commit to zanieb/distributed that referenced this pull request Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants