Allow Client to subscribe to events // Remote printing and warning #5217

fjetter · 2021-08-16T15:32:42Z

There is the idea floating around to log exceptions into our internal event system. See also #5184

This is an attempt to enable a simple publish+subscribe mechanism for these events. This works as a prototype but I'm not entirely sure whether this is a good idea since we also have the actual pubsub extension which is a bit more powerful.

There are a few notable differences

pubsub allows for direct Worker<->Worker communication. We don't necessarily need that but it doesn't hurt, does it?
pubsub serializes all messages using our protocol.to_serialize mechanism. iiuc, this ensures that the message is never actually deserialized on the scheduler?
log_events to store all logged events on the scheduler in a deque.

We already advertise the log_event functionality in our docs https://distributed.dask.org/en/latest/logging.html#structured-logs but the pubsub extension is more powerful and I don't like the idea of having two systems if there is no good reason for it. It's not hard to extend the pubsub extensions to also log events (on whitelisted topics) in a deque, i.e. we could reuse this mechanism for events but we'd need to break 2.)

fjetter · 2021-08-17T13:41:55Z

Took print function and tests from #5220

This now registers default handlers for warning and prints. Do we want to have a distributed.warn similar to distributed.print?

mrocklin · 2021-08-17T13:55:24Z

Do we want to have a distributed.warn similar to distributed.print?

I'm not against the idea :) It hasn't come up in recent memory though, which is why I passed on doing it before.

Something that has and might be more interesting would be logging. Do we want something like the following?

from dask.distributed import logger

def f():
    logger.warn("...")

(this should likely be a follow-up though if we do it)

fjetter · 2021-08-17T14:08:43Z

Something that has and might be more interesting would be logging. Do we want something like the following?

That suggestion moved me from "hate it" over to "love it" and now I'm stuck in the vicinity of "confused"

The question is what this function can do what a simple import logging; logging.getLogger() cannot (Trick question: the logging module can do almost everything :) )

For reference. I opened an issue a while ago with the intention of aligning our information/log stream a bit #4762 That discussion might fit nicely in over there

fjetter · 2021-08-17T14:14:58Z

I'm wondering what a good place would be to burry docs about this. At least the remote print/warn is a bit hidden

gjoseph92 · 2021-08-17T14:26:42Z

Probably https://docs.dask.org/en/latest/debugging.html? And/or https://distributed.dask.org/en/latest/logging.html?

I also love the idea of a distributed.logger that's a logging.Logger instance, but with the handler set up set up to use this system, and maybe with a formatter attached that automatically includes the current worker address, task key, etc. That would be a value add.

It hasn't come up in recent memory though, which is why I passed on doing it before.

dask/dask#8001 is a good example I think. I imagine there are a few other places where dask or third-party libraries built on dask might like to be able to warn users about performance or usage problems that can only be known at runtime, but just haven't had good infrastructure for sending those warnings back.

fjetter · 2021-08-17T15:00:22Z

I also love the idea of a distributed.logger that's a logging.Logger instance, but with the handler set up set up to use this system

In this case, I'd suggest to do a distributed.getLogger instead which calls logging.getLogger and attaches a bunch of handlers to it to not resort to one instance of loggers. I would again like to point this discussion to #4762 since wrapping logging functionality might lock us out of external libs if not done carefully. That might be fine since we're rather hesitant to introduce new deps anyway but it's worth a discussion

, and maybe with a formatter attached that automatically includes the current worker address, task key, etc. That would be a value add.

Even better, it is possible to attach metadata to a log record (see extra kwarg). this way users can also define their own formatters and this info would not be lost. Some of this can be done automatically even and I would actually love to see this kind of information attached to every logger we have. automagically!

mrocklin · 2021-08-17T15:13:56Z

I'm liking the enthusiasm here :)

distributed/client.py

fjetter · 2021-08-17T18:48:12Z

Leading underscores are in and since gabe seems to have already a potential usecase for the warning I added the function as well.

I'll add docs in another PR to the debugging/logging section as proposed

distributed/worker.py

distributed/tests/test_client.py

mrocklin · 2021-08-18T13:38:26Z

Anything left to do here or should we merge on passed tests?

fjetter · 2021-08-18T13:53:32Z

Not much to do other than more robust tests

fjetter · 2021-08-18T14:14:21Z

Encountered #5227 in some of the failing tests

distributed/tests/test_client.py

fjetter · 2021-08-18T16:25:21Z

ubu py3.79 ci1,ubu py3.9 ci1, ubu py3.8 ci1 failure fixed in Fix flaky test_worker_reconnects_mid_compute #5227
GPU failure distributed.comm.tests.test_ucx_config.test_ucx_config_w_env_var flaky #5229

mrocklin · 2021-08-20T17:13:18Z

I'm doing a debugging session that could definitely use this right about now :) (but we should probably wait until the release today before merging (if we're also releasing distributed along with dask))

jrbourbeau · 2021-08-20T17:21:29Z

There's also a failing test right now. I've got this on my TODO list for next week

fjetter · 2021-09-08T14:23:29Z

Test failures are fixed by

Co-authored-by: Matthew Rocklin <mrocklin@gmail.com>

mrocklin · 2021-09-13T13:36:57Z

Woo!

…

On Mon, Sep 13, 2021 at 4:53 AM Florian Jetter ***@***.***> wrote: Merged #5217 <#5217> into main. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5217 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTE7KPL3FYVLMOJKNDDUBXCYPANCNFSM5CH7DWOA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

@fjetter

…inting, building on the initial implementation by @fjetter in PR dask#5217.

@fjetter

…rn(), building on the initial implementation by @fjetter in PR dask#5217.

maxbane · 2022-10-15T00:59:10Z

Hey guys, I created a gist that shows a proof-of-concept of how to use the subscribe mechanism to forward arbitrary logging statements by tasks running on workers to the client session, in a pretty general and flexible way: https://gist.github.com/maxbane/595bf38e894c49f58e20fb905d24bf30

Would the dask maintainers have any interest in a PR that added forward_logging() (see gist) as an instance method of Client?

fjetter · 2022-10-18T16:08:02Z

Would the dask maintainers have any interest in a PR that added

I think there is some appetite for this and this idea was discussed earlier already. Generally I do not mind if this is opt-in but this is something that should not be enabled by default. Depending on the application, log aggregation can be quite an overwhelming task and I would not want the scheduler to be clogged by logs (we're routing all events over the scheduler), i.e. this would be one of the "use at your own risk" features ;)

maxbane · 2022-10-18T18:46:58Z

Cool, yeah, agreed there is a risk of bogging the cluster down if every worker starts shipping every log record at high verbosity back to the scheduler. Hence it should be opt-in, and furthermore the idea would be to give the user some knobs to narrow the set of shipped records -- to really bog things down they'd have to enable forwarding on the root logger (rather than some more specific named logger) and write their tasks explicitly to set a high level of verbosity, like INFO or DEBUG on the root logger.

Personally I think the feature would be useful for targeted debugging sessions where your tasks are using some library that produces useful logging when you set the right level on its loggers. You could then get the logs right in your client-side notebook cell / interpreter session without having to go digging elsewhere through worker logs. Or perhaps your client itself is started by some automated job and you want the aggregation of the tasks' logs (at some reasonable level!) collected into that job's local output.

Also, I've seen multiple people asking about task log aggregation e.g. on Stack Overflow, and it would be nice if dask had a self-contained, opt-in solution for it instead of telling everyone to just use whatever their cluster manager's solution is (not everyone is even using a cluster manager).

fjetter mentioned this pull request Aug 16, 2021

Add a _on_scheduler_message callback to the DaskExecutor. Closes #4874 PrefectHQ/prefect#4875

Merged

3 tasks

mrocklin mentioned this pull request Aug 16, 2021

[Idea/Draft/Proposal] Exception handling for server exceptions #5184

Open

fjetter changed the title ~~WIP Subscribe events~~ WIP Subscribe events - Simple handler Aug 16, 2021

This was referenced Aug 16, 2021

WIP Subscribe events - PubSub Extension #5218

Closed

Log event print warn #5220

Closed

jrbourbeau reviewed Aug 17, 2021

View reviewed changes

distributed/client.py Outdated Show resolved Hide resolved

distributed/client.py Outdated Show resolved Hide resolved

distributed/client.py Show resolved Hide resolved

distributed/client.py Outdated Show resolved Hide resolved

gjoseph92 mentioned this pull request Aug 17, 2021

in code suggestion of when to use split_out in dask.dataframe.groupby dask/dask#8001

Open

mrocklin reviewed Aug 17, 2021

View reviewed changes

distributed/worker.py Outdated Show resolved Hide resolved

jrbourbeau reviewed Aug 17, 2021

View reviewed changes

distributed/tests/test_client.py Outdated Show resolved Hide resolved

fjetter mentioned this pull request Aug 18, 2021

Worker has no exceptions attribute anymore #5225

Closed

fjetter force-pushed the subscribe_events branch from fa90565 to 3654028 Compare August 18, 2021 13:18

fjetter changed the title ~~WIP Subscribe events - Simple handler~~ Allow Client to subscribe to events // Remote printing and warning Aug 18, 2021

mrocklin reviewed Aug 18, 2021

View reviewed changes

distributed/tests/test_client.py Show resolved Hide resolved

fjetter mentioned this pull request Aug 18, 2021

First iteration for exception forwarding #5196

Closed

This was referenced Aug 19, 2021

Registering a nanny plugin should fail if workers are not using nannies #5231

Open

Send print statements back to the client eriknw/afar#13

Closed

fjetter force-pushed the subscribe_events branch from cd2a991 to 13b9671 Compare September 8, 2021 12:49

fjetter mentioned this pull request Sep 8, 2021

Memory prioritization on workers #5250

Open

fjetter and others added 11 commits September 9, 2021 12:08

Allow topic subscription on client side

8b238c7

Allow comm as argument again

f8901d5

Add distributed.print for remote printing to client session

9455e99

Co-authored-by: Matthew Rocklin <mrocklin@gmail.com>

more robust test_events_subscribe_topic

0d9c569

Add distributed.warn function

b6c2c5d

Stringify print msg before submission

f808d13

more robust test_event_subscribe

853913b

Fix flaky prometheus metric collection test

f26cff7

Handle non-msgpack serializable object in worker.print

6f3a3a6

one more wait

37a7ef8

Fix linting

a0b0e51

fjetter force-pushed the subscribe_events branch from a84126d to a0b0e51 Compare September 9, 2021 10:08

fjetter mentioned this pull request Sep 10, 2021

Error logs in event system #5307

Open

fjetter merged commit 8863adb into dask:main Sep 13, 2021

jakirkham mentioned this pull request Sep 21, 2021

distributed.comm.tests.test_ucx_config.test_ucx_config_w_env_var flaky #5229

Closed

eriknw mentioned this pull request Sep 27, 2021

Realtime printing eriknw/afar#28

Merged

fjetter mentioned this pull request Jun 1, 2022

Capturing and logging stdout/stderr on workers #2033

Open

maxbane mentioned this pull request Oct 2, 2022

distributed.print() breaks easily because of stringified kwargs #7095

Closed

jrbourbeau mentioned this pull request Oct 5, 2022

Add distributed.print and distributed.warn to API docs #7113

Closed

maxbane added a commit to maxbane/distributed that referenced this pull request Oct 7, 2022

A more correct and elaborately documented implementation of remote pr…

9d28ee7

…inting, building on the initial implementation by @fjetter in PR dask#5217.

maxbane added a commit to maxbane/distributed that referenced this pull request Oct 7, 2022

A more correct and elaborately documented implementation of remote wa…

4485bad

…rn(), building on the initial implementation by @fjetter in PR dask#5217.

maxbane mentioned this pull request Oct 7, 2022

Revamped implementations of remote print() and warn(), fixing #7095 #7129

Merged

maxbane mentioned this pull request Nov 9, 2022

Add methods Client.forward_logging() and Client.unforward_logging(). #7276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Client to subscribe to events // Remote printing and warning #5217

Allow Client to subscribe to events // Remote printing and warning #5217

fjetter commented Aug 16, 2021

fjetter commented Aug 17, 2021

mrocklin commented Aug 17, 2021

fjetter commented Aug 17, 2021

fjetter commented Aug 17, 2021

gjoseph92 commented Aug 17, 2021

fjetter commented Aug 17, 2021

mrocklin commented Aug 17, 2021

fjetter commented Aug 17, 2021

mrocklin commented Aug 18, 2021

fjetter commented Aug 18, 2021

fjetter commented Aug 18, 2021

fjetter commented Aug 18, 2021 •

edited

mrocklin commented Aug 20, 2021

jrbourbeau commented Aug 20, 2021

fjetter commented Sep 8, 2021

mrocklin commented Sep 13, 2021 via email

maxbane commented Oct 15, 2022

fjetter commented Oct 18, 2022

maxbane commented Oct 18, 2022

Allow Client to subscribe to events // Remote printing and warning #5217

Allow Client to subscribe to events // Remote printing and warning #5217

Conversation

fjetter commented Aug 16, 2021

fjetter commented Aug 17, 2021

mrocklin commented Aug 17, 2021

fjetter commented Aug 17, 2021

fjetter commented Aug 17, 2021

gjoseph92 commented Aug 17, 2021

fjetter commented Aug 17, 2021

mrocklin commented Aug 17, 2021

fjetter commented Aug 17, 2021

mrocklin commented Aug 18, 2021

fjetter commented Aug 18, 2021

fjetter commented Aug 18, 2021

fjetter commented Aug 18, 2021 • edited

mrocklin commented Aug 20, 2021

jrbourbeau commented Aug 20, 2021

fjetter commented Sep 8, 2021

mrocklin commented Sep 13, 2021 via email

maxbane commented Oct 15, 2022

fjetter commented Oct 18, 2022

maxbane commented Oct 18, 2022

fjetter commented Aug 18, 2021 •

edited