Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail_hard should reraise #6399

Merged
merged 1 commit into from
May 20, 2022
Merged

fail_hard should reraise #6399

merged 1 commit into from
May 20, 2022

Conversation

crusaderky
Copy link
Collaborator

Functions decorated by @fail_hard are frequently called by something else. fail_hard should not block exceptions.

Also, the decorator is a last resort when error handling for everything that we foresaw could possibly go wrong failed.
Instead of trying to force a crash in methods that should not crash to begin with, which is only possible if bypassing the usual callers, write tests which use custom methods which do crash.

Also write methods for the sync use case, which was insofar untested.

Blocks #6388

@crusaderky crusaderky self-assigned this May 20, 2022
crusaderky added a commit to crusaderky/distributed that referenced this pull request May 20, 2022
@crusaderky crusaderky mentioned this pull request May 20, 2022
@mrocklin
Copy link
Member

cc @gjoseph92

@crusaderky crusaderky linked an issue May 20, 2022 that may be closed by this pull request
8 tasks
Copy link
Collaborator

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a lot of discussion about this, see #6210 (comment).

The basic idea was that if @fail_hard was being triggered at all, some state is clearly broken, and we want to shut down the worker immediately. We want shutdown to happen cleanly if possible. If we don't reraise the exception, it's basically a goto self.cose(). This is bad, but it means we don't have to reason about "what is the error handler for the reraised exception going to do? Could it interfere with our close in some way?". We're tossing reasonable control flow out the window in favor of doing everything we can to minimize other concurrent things that might interfere with Worker.close.

This isn't based on evidence, just intuition (less concurrency -> more predictability). I agree it's problematic. If your change here seems to make things better, I'm fine trying it. I'm curious why it's a blocker—I assume you want to test somewhere that an exception was raised?

@crusaderky
Copy link
Collaborator Author

crusaderky commented May 20, 2022

It's a blocker because before the test was calling gather_dep and expecting its except Exception: block to raise an exception itself, which will no longer happen.

@crusaderky crusaderky merged commit cc383a6 into dask:main May 20, 2022
@crusaderky crusaderky deleted the fail_hard branch May 20, 2022 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate ensure_communicating transitions to new WorkerState event mechanism
3 participants