Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enum for worker TaskState names #5444

Closed
wants to merge 5 commits into from
Closed

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Oct 20, 2021

This sets up an enum to be used as the worker task state name. The name of this enum is up for debate, I couldn't find a shorter one since, eventually, we'll have the same for the scheudler and both enums will occasionally be active in the same module. This is, in fact, one reason why this is useful.

This includes a breaking change for the WorkerPlugin since the transition functions will no longer receive strings. We can make this backwards compatible but eventually I'd like to transition this to the enums as well.

Builds on #5426

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @fjetter

This includes a breaking change for the WorkerPlugin since the transition functions will no longer receive strings. We can make this backwards compatible but eventually I'd like to transition this to the enums as well.

Thanks for flagging this. I'd like to add backwards compatibility / deprecation code similar to what was done in #3875

cc @crusaderky as this seems like the kind of this you might enjoy reviewing

return self.name


WTSName = WorkerTaskStateName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does removing Name from WorkerTaskStateName sufficiently reduce the need for this alias? I find using the full name much more readable

@@ -97,16 +99,41 @@

no_value = "--no-value-sentinel--"


class WorkerTaskStateName(Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shorter while, I think, conveying the same information

Suggested change
class WorkerTaskStateName(Enum):
class WorkerTaskState(Enum):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe WorkerTaskStatus?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this same enum is going to be soon used in scheduler.py too, may I suggest moving it to another module (core.py?) and rename it to e.g. TaskStateStatus (imported as TSS everywhere)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkerTaskStatus

well, it's not a status but I don't have a strong opinion there

Since this same enum is going to be soon used

I'm not entirely conviced that this should be shared among scheduler and worker. What I like about using two distinct enums is that the readability improves if there are scheduler and worker states around. The set of allowed states is also different.

Comment on lines +2365 to +2566
v_args = v[1:]
v_state = v[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but I'm curious why this change was needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks cosmetic. I don't mind either way.

Copy link
Member Author

@fjetter fjetter Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy complains otherwise. below with type annotations

v : WTSName | tuple = recs.get(ts, (finish, *args))

v_state: WTSName
v_args: tuple
if isinstance(v, tuple):
    # old; This will raise since the expression *v_args is of type list
    v_state: WTSName, *v_args: list = v

    # new
    v_state: WTSName = v[0]
    v_state: tuple = v[1:]  # indexing a tuple this way yields a tuple
else:
    v_state: WTSName, v_args: tuple = v, ()

"waiting",
WTSName.ready,
WTSName.constrained,
WTSName.waiting,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're enums now, you could perhaps use a Flag enum to encapsulate multiple states?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the benefit. Tasks themselves are not allowed to have multiple states. Therefore, there is the possibility to have a properly typed value assigned which is semantically nonsense / impossible. That's unlikely but doesn't encourage me in using it.

Therefore the only valid application will be when the rules engine verifies if a state is in a given set of states.

For this, it's basically just another syntax. Below the examples played out. set-like __contains__ still works so this will only be affecting the definition of the subsets.

In [1]: from enum import Enum, Flag, auto

In [2]: class TSSFlag(Flag):
   ...:    fetch = auto()
   ...:    flight = auto()
   ...:    released = auto()
   ...:    executing = auto()
   ...:

In [3]: class TSS(Enum):
   ...:    fetch = auto()
   ...:    flight = auto()
   ...:    released = auto()
   ...:    executing = auto()
   ...:

In [4]: to_be_gathered = {TSS.fetch, TSS.flight}

In [6]: to_be_gathered_flag = TSSFlag.fetch | TSSFlag.flight

In [7]: TSS.released in to_be_gathered
Out[7]: False

In [8]: TSS.flight in to_be_gathered
Out[8]: True

In [9]: TSSFlag.released in to_be_gathered_flag
Out[9]: False

In [10]: TSSFlag.flight in to_be_gathered_flag
Out[10]: True

In [11]: bool(TSSFlag.released & to_be_gathered_flag)
Out[11]: False

In [12]: bool(TSSFlag.flight & to_be_gathered_flag)
Out[12]: True

I get that bitwise comparison is faster than hashing but this speed difference is entirely negligible for the worker and I'm not entirely sure if it is relevant for the scheduler.

However, I don't see any significant downsides. If anybody has a reason to do so, I'm open to it but I consider the set-only, plain enum thing to be more "pythonic"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's mostly stylistic. I think checking membership of a set of values is the canonical use case for a flag enum over a regular enum. IIUC Flag has been around since py36 so I think it would be fairly "pythonic" to use it where you don't have to support older Pythons.

Even though it may be negligible now, if there's no downside to doing so I'd always be inclined to go with the more performant solution as it may save you in future. However, your type-checker won't save you from setting a TSSFlag attribute to a combined value (e.g. TSSFlag.fetch | TSSFlag.flight) so that's a fairly big downside 🤔

@abergou
Copy link

abergou commented Oct 20, 2021

Thanks for your work on this @fjetter

This includes a breaking change for the WorkerPlugin since the transition functions will no longer receive strings. We can make this backwards compatible but eventually I'd like to transition this to the enums as well.

Thanks for flagging this. I'd like to add backwards compatibility / deprecation code similar to what was done in #3875

cc @crusaderky as this seems like the kind of this you might enjoy reviewing

@jrbourbeau @fjetter Just a random thought: python 3.10 adds StrEnums which might be backwards compatible with this. You can get StrEnums in python 3.6+ with: https://github.com/irgeek/StrEnum I believe (enums sounds good to me either way).

while key not in dask_worker.tasks or dask_worker.tasks[key].state != state:
await asyncio.sleep(0.005)


async def wait_for_cancelled(key, dask_worker):
async def wait_for_cancelled(key: str, dask_worker: Worker):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def wait_for_cancelled(key: str, dask_worker: Worker):
async def wait_for_cancelled(key: str, dask_worker: Worker) -> None:

distributed/worker.py Outdated Show resolved Hide resolved
distributed/worker.py Outdated Show resolved Hide resolved
distributed/worker.py Outdated Show resolved Hide resolved
distributed/worker.py Outdated Show resolved Hide resolved
distributed/worker.py Outdated Show resolved Hide resolved
distributed/worker.py Outdated Show resolved Hide resolved
@crusaderky
Copy link
Collaborator

Blocked by and incorporates #5426

@fjetter
Copy link
Member Author

fjetter commented Oct 21, 2021

This includes a breaking change for the WorkerPlugin since the transition functions will no longer receive strings. We can make this backwards compatible but eventually I'd like to transition this to the enums as well.

Thanks for flagging this. I'd like to add backwards compatibility / deprecation code similar to what was done in #3875

It's not the same thing. It's not about setting the proper value on the state attribute. If users modify anything on that attribute, they are entirely on their own since I couldn't think of many things which are more low level and internal than this attribute.

I'm rather talking about the _notify_plugins call here

self._notify_plugins("transition", ts.key, start, finish, **kwargs)

Our WorkerPlugin interface currently defines strings. It's very easy for us to simply continue passing string values in there but it is impossible to infer whether the users migrated to anything new or not. We can either raise a deprecation/future warning every time (too much spam; I'm strongly against this) or not warn at all. We might also simply decide to just keep using strings for anything user facing.

A similar problem exists for the transition log. Right now, I'm logging the enums but we could instead log the names which might offer a better UX (although I'm not sure if anybody other than myself is actually using the transition log :) )

@fjetter
Copy link
Member Author

fjetter commented Oct 21, 2021

@jrbourbeau @fjetter Just a random thought: python 3.10 adds StrEnums which might be backwards compatible with this. You can get StrEnums in python 3.6+ with: https://github.com/irgeek/StrEnum I believe (enums sounds good to me either way).

I'm not aware that py3.10 is introducing a new enum type. Looking at https://docs.python.org/3/whatsnew/3.10.html enums are only mentioned in an example regarding the new matching (switch statements).

AFAIK, mixed type enums can be easily created by subclassing, e.g. TSS(str, Enum) which would allow str comparison. However, this is more or less what I would like to forbid by introducing enums since I want to force the code base into using the enum and not make it optional

In [15]: class TSSStr(str, Enum):
    ...:     fetch = "fetch"
    ...:     flight = "flight"
    ...:     released = "released"
    ...:     executing = "executing"
    ...:

In [16]: TSSStr.fetch == "fetch"
Out[16]: True

@fjetter fjetter force-pushed the enum_workerstate branch 2 times, most recently from 3f0600d to 2dadf51 Compare October 26, 2021 12:29
@fjetter fjetter marked this pull request as ready for review October 26, 2021 14:38
@@ -131,6 +131,8 @@ class WorkerPlugin:
>>> client.register_worker_plugin(plugin) # doctest: +SKIP
"""

enum_task_state_names = False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a backwards compatibility toggle for the plugin. There is now a class attribute controlling this behaviour and it default to the old behaviour. the old behaviour will then issue a deprecation warning instructing the user how to migrate.

@fjetter
Copy link
Member Author

fjetter commented Nov 9, 2021

Summarizing

  • Backwards compatibility is addressed
  • There has been a comment about merging the enums of scheduler (to be implemented) and worker but I would like to keep them separate which is why the name needs some kind of Worker identifier.
  • There is still a discussion around naming. The proposal I could live with is WorkerTaskStatus for brevity. Since this should be public, we should settle on something
  • I don't have a strong opinion about the FlagEnum topic. Since this will not really change the functionality and we can change this later on I will pass on changing the type for now.

ready = "ready"
constrained = "constrained"
executing = "executing"
long_running = "long-running"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause headaches with name != value. Can we use _ in both cases?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(may need to change on the scheduler side too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it this way for compat purposes. The value is backwards compatible

@@ -3543,7 +3627,31 @@ def get_call_stack(self, comm=None, keys=None):
def _notify_plugins(self, method_name, *args, **kwargs):
for name, plugin in self.plugins.items():
if hasattr(plugin, method_name):
if method_name == "release_key":
if (
not getattr(plugin, "enum_task_state_names", False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
not getattr(plugin, "enum_task_state_names", False)
not plugin.enum_task_state_names

I think getattr is unnecessary - aren't all customer plugins supposed to subclass WorkerPlugin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically I don't think we've enforced this. That's why, for example, we have hasattr checks like this

teardowns = [
plugin.teardown(self)
for plugin in self.plugins.values()
if hasattr(plugin, "teardown")
]

Enforcing plugins to subclass WorkerPlugin doesn't sound like a bad idea, but would merit a deprecation cycle

distributed/worker.py Outdated Show resolved Hide resolved
distributed/worker.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@crusaderky crusaderky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@fjetter
Copy link
Member Author

fjetter commented Dec 20, 2021

  • I changed the name to directly use WTSS with a comment about the name. I'm not sure if we need an expanded name
  • I kept the long-running for backwards compat purposes. I'm not entirely sure in which circumstances this creates a problem. Code should always use the enum, shouldn't it?
  • I kept duck typing for now
  • Transition logs use the repr of the Enum, i.e. it's name. That's still up for debate but I figured a log should include str-only values

@crusaderky
Copy link
Collaborator

Is there anything outstanding here?

@fjetter
Copy link
Member Author

fjetter commented Mar 10, 2022

Is there anything outstanding here?

Nothing outstanding other than resolving the merge conflicts. So far, this wasn't a priority of mine

@fjetter fjetter closed this Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants