Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler/Worker threads hang when calling sys.exit() #8644

Open
jacobtomlinson opened this issue May 8, 2024 · 1 comment
Open

Scheduler/Worker threads hang when calling sys.exit() #8644

jacobtomlinson opened this issue May 8, 2024 · 1 comment

Comments

@jacobtomlinson
Copy link
Member

jacobtomlinson commented May 8, 2024

Describe the issue:

When using the Scheduler or Worker class to start cluster components if the program is exited with sys.exit() (as is done in dask-mpi) the Python process hangs, likely due to a background thread holding the process open.

Minimal Complete Verifiable Example:

import sys
from distributed import Client, Scheduler
from distributed.utils import LoopRunner


async def main():
    async with Scheduler() as scheduler:
        async with Client(scheduler.address, asynchronous=True) as client:
            await client.shutdown()
    print("Done, exiting")
    sys.exit()  # Hangs at this line, comment this out and the program exits as expected


loop_runner = LoopRunner(loop=None, asynchronous=False)
loop_runner.run_sync(main)

I tried to strip things down as far as possible to still reproduce the issue, but I note this doesn't happen when using asyncio.run(main) instead of the LoopRunner that is commonly used in distributed.

Anything else we need to know?:

Environment:

  • Dask version: 2024.4.2
  • Python version: 3.11.9
  • Operating System: Ubuntu
  • Install method (conda, pip, source):
@jacobtomlinson jacobtomlinson changed the title Scheduler/worker threads hang when calling sys.exit(0) Scheduler/Worker threads hang when calling sys.exit() May 8, 2024
@jacobtomlinson
Copy link
Member Author

jacobtomlinson commented May 8, 2024

I found a workaround for this behaviour using a signal to defer the shutdown. This example does not hang.

+import os
+import signal
 import sys
 from distributed import Client, Scheduler
 from distributed.utils import LoopRunner
 
 
 async def main():
     async with Scheduler() as scheduler:
         async with Client(scheduler.address, asynchronous=True) as client:
             await client.shutdown()
     print("Done, exiting")
-    sys.exit()  # Hangs at this line, comment this out and the program exits as expected
+    os.kill(os.getpid(), signal.SIGINT)  # Shutdown using a signal instead
 
 
+signal.signal(signal.SIGINT, lambda *_: sys.exit())  # Exit gracefully on signal
 loop_runner = LoopRunner(loop=None, asynchronous=False)
 loop_runner.run_sync(main)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant