Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cluster/worker option to log cuDF spilling statistics #1254

Open
charlesbluca opened this issue Sep 29, 2023 · 1 comment
Open

Add a cluster/worker option to log cuDF spilling statistics #1254

charlesbluca opened this issue Sep 29, 2023 · 1 comment

Comments

@charlesbluca
Copy link
Member

Chatting with @quasiben, it seems like in addition to #1226, it would be useful to have some (preferably machine-parseable) method to track cuDF spilling statistics beyond the dashboard page; in our initial conversations around this, a potential implementation of this looked like an option/argument for the worker/cluster APIs to enable logging of cuDF spilling during and/or after computation:

$ CUDF_SPILL=on CUDF_SPILL_STATS=1 dask cuda worker --cudf-spill-logging tcp://10.33.227.163:8786
2023-09-29 07:36:11,333 - distributed.nanny - INFO - Start Nanny at: 'tcp://10.33.227.163:38751'
...
2023-09-29 07:40:14,483 - distributed.worker - INFO - Worker tcp://10.33.227.163:45905 spilled 24 bytes from GPU in 0.01s
2023-09-29 07:40:14,483 - distributed.worker - INFO - Worker tcp://10.33.227.163:45905 unspilled 24 bytes to GPU in 0.01s
...
2023-09-29 07:36:14,483 - distributed.worker - INFO - -------------------------------------------------
2023-09-29 07:36:14,483 - distributed.worker - INFO -                Worker: tcp://10.33.227.163:45905
2023-09-29 07:36:14,483 - distributed.worker - INFO -         Bytes spilled:                        24
2023-09-29 07:36:14,483 - distributed.worker - INFO -   Time spent spilling:                     0.02s
2023-09-29 07:36:14,483 - distributed.worker - INFO - -------------------------------------------------
2023-09-29 07:40:38,126 - distributed.nanny - INFO - Worker process 3868728 was killed by signal 9

Imagine this could look like a worker plugin that polls the cuDF spilling statistics periodically (is there a way we could "subscribe" a worker to cuDF spilling event?) and at worker closing time, but am interested in if there's a better approach we could take here.

@pentschev
Copy link
Member

Imagine this could look like a worker plugin that polls the cuDF spilling statistics periodically (is there a way we could "subscribe" a worker to cuDF spilling event?) and at worker closing time, but am interested in if there's a better approach we could take here.

I don't think we have a "proper" way of doing something like this, the closest to that is probably the LoggerBuffer interface that could be plugged in to a PeriodicCallback as suggested in #442 (comment) . Other than that, I don't think there's any pre-baked solutions for this, but I agree this could be a useful feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants