Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Client to subscribe to events // Remote printing and warning #5217

Merged
merged 11 commits into from
Sep 13, 2021

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Aug 16, 2021

There is the idea floating around to log exceptions into our internal event system. See also #5184

This is an attempt to enable a simple publish+subscribe mechanism for these events. This works as a prototype but I'm not entirely sure whether this is a good idea since we also have the actual pubsub extension which is a bit more powerful.

There are a few notable differences

  1. pubsub allows for direct Worker<->Worker communication. We don't necessarily need that but it doesn't hurt, does it?
  2. pubsub serializes all messages using our protocol.to_serialize mechanism. iiuc, this ensures that the message is never actually deserialized on the scheduler?
  3. log_events to store all logged events on the scheduler in a deque.

We already advertise the log_event functionality in our docs https://distributed.dask.org/en/latest/logging.html#structured-logs but the pubsub extension is more powerful and I don't like the idea of having two systems if there is no good reason for it. It's not hard to extend the pubsub extensions to also log events (on whitelisted topics) in a deque, i.e. we could reuse this mechanism for events but we'd need to break 2.)

@fjetter
Copy link
Member Author

fjetter commented Aug 17, 2021

Took print function and tests from #5220

This now registers default handlers for warning and prints. Do we want to have a distributed.warn similar to distributed.print?

@mrocklin
Copy link
Member

Do we want to have a distributed.warn similar to distributed.print?

I'm not against the idea :) It hasn't come up in recent memory though, which is why I passed on doing it before.

Something that has and might be more interesting would be logging. Do we want something like the following?

from dask.distributed import logger

def f():
    logger.warn("...")

(this should likely be a follow-up though if we do it)

@fjetter
Copy link
Member Author

fjetter commented Aug 17, 2021

Something that has and might be more interesting would be logging. Do we want something like the following?

That suggestion moved me from "hate it" over to "love it" and now I'm stuck in the vicinity of "confused"

The question is what this function can do what a simple import logging; logging.getLogger() cannot (Trick question: the logging module can do almost everything :) )

For reference. I opened an issue a while ago with the intention of aligning our information/log stream a bit #4762 That discussion might fit nicely in over there

@fjetter
Copy link
Member Author

fjetter commented Aug 17, 2021

I'm wondering what a good place would be to burry docs about this. At least the remote print/warn is a bit hidden

@gjoseph92
Copy link
Collaborator

Probably https://docs.dask.org/en/latest/debugging.html? And/or https://distributed.dask.org/en/latest/logging.html?

I also love the idea of a distributed.logger that's a logging.Logger instance, but with the handler set up set up to use this system, and maybe with a formatter attached that automatically includes the current worker address, task key, etc. That would be a value add.

It hasn't come up in recent memory though, which is why I passed on doing it before.

dask/dask#8001 is a good example I think. I imagine there are a few other places where dask or third-party libraries built on dask might like to be able to warn users about performance or usage problems that can only be known at runtime, but just haven't had good infrastructure for sending those warnings back.

@fjetter
Copy link
Member Author

fjetter commented Aug 17, 2021

I also love the idea of a distributed.logger that's a logging.Logger instance, but with the handler set up set up to use this system

In this case, I'd suggest to do a distributed.getLogger instead which calls logging.getLogger and attaches a bunch of handlers to it to not resort to one instance of loggers. I would again like to point this discussion to #4762 since wrapping logging functionality might lock us out of external libs if not done carefully. That might be fine since we're rather hesitant to introduce new deps anyway but it's worth a discussion

, and maybe with a formatter attached that automatically includes the current worker address, task key, etc. That would be a value add.

Even better, it is possible to attach metadata to a log record (see extra kwarg). this way users can also define their own formatters and this info would not be lost. Some of this can be done automatically even and I would actually love to see this kind of information attached to every logger we have. automagically!

@mrocklin
Copy link
Member

I'm liking the enthusiasm here :)

distributed/client.py Outdated Show resolved Hide resolved
distributed/client.py Outdated Show resolved Hide resolved
distributed/client.py Show resolved Hide resolved
distributed/client.py Outdated Show resolved Hide resolved
@fjetter
Copy link
Member Author

fjetter commented Aug 17, 2021

Leading underscores are in and since gabe seems to have already a potential usecase for the warning I added the function as well.

I'll add docs in another PR to the debugging/logging section as proposed

distributed/worker.py Outdated Show resolved Hide resolved
@fjetter fjetter changed the title WIP Subscribe events - Simple handler Allow Client to subscribe to events // Remote printing and warning Aug 18, 2021
@mrocklin
Copy link
Member

Anything left to do here or should we merge on passed tests?

@fjetter
Copy link
Member Author

fjetter commented Aug 18, 2021

Not much to do other than more robust tests

@fjetter
Copy link
Member Author

fjetter commented Aug 18, 2021

Encountered #5227 in some of the failing tests

@fjetter
Copy link
Member Author

fjetter commented Aug 18, 2021

@mrocklin
Copy link
Member

I'm doing a debugging session that could definitely use this right about now :) (but we should probably wait until the release today before merging (if we're also releasing distributed along with dask))

@jrbourbeau
Copy link
Member

There's also a failing test right now. I've got this on my TODO list for next week

@fjetter
Copy link
Member Author

fjetter commented Sep 8, 2021

@mrocklin
Copy link
Member

mrocklin commented Sep 13, 2021 via email

@maxbane
Copy link
Contributor

maxbane commented Oct 15, 2022

Hey guys, I created a gist that shows a proof-of-concept of how to use the subscribe mechanism to forward arbitrary logging statements by tasks running on workers to the client session, in a pretty general and flexible way: https://gist.github.com/maxbane/595bf38e894c49f58e20fb905d24bf30

Would the dask maintainers have any interest in a PR that added forward_logging() (see gist) as an instance method of Client?

@fjetter
Copy link
Member Author

fjetter commented Oct 18, 2022

Would the dask maintainers have any interest in a PR that added

I think there is some appetite for this and this idea was discussed earlier already. Generally I do not mind if this is opt-in but this is something that should not be enabled by default. Depending on the application, log aggregation can be quite an overwhelming task and I would not want the scheduler to be clogged by logs (we're routing all events over the scheduler), i.e. this would be one of the "use at your own risk" features ;)

@maxbane
Copy link
Contributor

maxbane commented Oct 18, 2022

Cool, yeah, agreed there is a risk of bogging the cluster down if every worker starts shipping every log record at high verbosity back to the scheduler. Hence it should be opt-in, and furthermore the idea would be to give the user some knobs to narrow the set of shipped records -- to really bog things down they'd have to enable forwarding on the root logger (rather than some more specific named logger) and write their tasks explicitly to set a high level of verbosity, like INFO or DEBUG on the root logger.

Personally I think the feature would be useful for targeted debugging sessions where your tasks are using some library that produces useful logging when you set the right level on its loggers. You could then get the logs right in your client-side notebook cell / interpreter session without having to go digging elsewhere through worker logs. Or perhaps your client itself is started by some automated job and you want the aggregation of the tasks' logs (at some reasonable level!) collected into that job's local output.

Also, I've seen multiple people asking about task log aggregation e.g. on Stack Overflow, and it would be nice if dask had a self-contained, opt-in solution for it instead of telling everyone to just use whatever their cluster manager's solution is (not everyone is even using a cluster manager).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants