RFC: Add Client.decorator method #7936

mrocklin · 2023-06-21T01:57:30Z

I'm not sure if this is a good idea, and if it is
then it could use a better name. Mostly I was playing with Modal and enjoying it but wanted full Dask semantics around futures. Docstring follows.

Decorate a function to submit tasks to Dask

This converts a normal function to instead return Dask Futures. That function can then be used in parallel.

This takes the same keywords as client.submit

Example

>>> @client.decorate()
... def f(x):
...     return x + 1

>>> futures = [f(x) for x in range(10)]
>>> results = [future.result() for future in futures]

Closes #xxxx

Tests added / passed
Passes pre-commit run --all-files

I'm not sure if this is a good idea, and if it is then it could use a better name. Mostly I was playing with Modal and enjoying it but wanted full Dask semantics around futures. Docstring follows. Decorate a function to submit tasks to Dask This converts a normal function to instead return Dask Futures. That function can then be used in parallel. This takes the same keywords as ``client.submit`` Example ------- ```python >>> @client.decorate() ... def f(x): ... return x + 1 >>> futures = [f(x) for x in range(10)] >>> results = [future.result() for future in futures] ```

github-actions · 2023-06-21T03:16:33Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      19 files ±    0       19 suites ±0 10h 24m 32s ⏱️ - 1h 52m 8s
  3 697 tests +  10   3 584 ✔️ +    9   104 💤 -     2 9 ❌ +3
33 582 runs - 738 32 039 ✔️ - 607 1 534 💤 - 134 9 ❌ +3

For more details on these failures, see this check.

Results for commit 5987da3. ± Comparison against base commit 49437c2.

This pull request removes 2 and adds 12 tests. Note that renamed tests count towards both.

distributed.protocol.tests.test_numpy
distributed.shuffle.tests.test_rechunk

distributed.dashboard.tests.test_scheduler_bokeh ‑ test_FinePerformanceMetrics_no_spans
distributed.tests.test_client ‑ test_submit_decorator
distributed.tests.test_itertools ‑ test_ffill
distributed.tests.test_spans ‑ test_active_cpu_seconds_change_nthreads
distributed.tests.test_spans ‑ test_active_cpu_seconds_merged
distributed.tests.test_spans ‑ test_active_cpu_seconds_not_done[False]
distributed.tests.test_spans ‑ test_active_cpu_seconds_not_done[True]
distributed.tests.test_spans ‑ test_active_cpu_seconds_trivial
distributed.tests.test_spans ‑ test_code
distributed.tests.test_spans ‑ test_merge_all
…

♻️ This comment has been updated with latest results.

fjetter · 2023-06-21T08:20:35Z

Is this really useful? I don't see how this comes up as a use case frequently. It's also a bit weird to have the result call later on.
What happens to the function if the client closes? I also imagine this will hold a ref to the client s.t. our GC/del mechanics wouldn't work.

I'm not very excited about this addition yet. Do you have an example where this is useful?

mrocklin · 2023-06-21T12:33:35Z

Is this really useful?

I mean, it's functionally equivalent to client.submit, so I think the questions isn't "is this useful" but rather "is this a more intuitive way for users to interact with concurrent/remote computing." Whether or not to go in this direction is entirely a question of user empathy I think. We're weighing the benefits of a different way to spell things vs the drawbacks of having multiple ways to do things.

If we look at dask.delayed as an example, I'd say we see a bit over half of users prefer the decorator approach to the inline approach, so it seems that some users find it more intuitive.

Also, to be clear, I'm not saying "I want us to merge this now". I'm saying "we should probably talk about this a bit".

I also imagine this will hold a ref to the client s.t. our GC/del mechanics wouldn't work.

Ah, interesting point.

fjetter · 2023-06-27T07:52:40Z

If we look at dask.delayed as an example, I'd say we see a bit over half of users prefer the decorator approach to the inline approach, so it seems that some users find it more intuitive.

From a syntax/UX perspective I can see the appeal of the decorator but delayed objects are not bound to any object, especially not to a dynamic one like the client. Apart from the lifecycle problems I mentioned earlier, this also limits the usefulness in a way that the client has to exist before the function is defined and the client has to exist in the local scope (the default client mechanism also allows one to not have the client around and just use dask.compute). Executing the function once the client is closed will likely also spew all sorts of weird exceptions.

This experience could likely be made smoother but only at the cost of complexity (marrying this decorator with current/default clients, better exception handling, weakrefs, ...) but I'm not convinced this is worth it.

mrocklin · 2023-06-27T13:40:58Z

We could do something like the following:

def submit(**kwargs):
    def _(func):
        return partial(get_client().submit, func, **kwargs)
    return _

This would resolve the lifecycle issues, and also having to have a client object ahead of time.

fjetter · 2023-06-27T15:30:49Z

This would resolve the lifecycle issues, and also having to have a client object ahead of time.

The partial will still bind to the instance returned by get_client. I assume this will also be evaluated at import time and will crash hard if there is no client available.

mrocklin · 2023-06-27T15:46:30Z

I acknowledge those problems. I think that we can probably get around them. Do you agree?

What do you think about the general API?

fjetter · 2023-06-28T09:57:58Z

I acknowledge those problems. I think that we can probably get around them. Do you agree?

There are already so many problems around client mechanics that I'm not very eager to introduce more since those problems rarely rise to the level of top priority, are not fixed eagerly but are still causing pain or inconveniences. We are already struggling ironing out current semantics. We can possibly work around the problems but I'm not convinced this is worth it.

What do you think about the general API?

I would be more excited if we could remove the result call in the end but this way we're removing client semantics from one of two lines. As it is, we cannot hide fully that there is a client or that there are futures which diminishes value significantly.
This also breaks the symmetry to the concurrent.futures API we are typically striving for. I can see how splitting off the submit call in this way to actually cause confusion since by just observing the code being used, it's not entirely obvious what's going on without understanding how f is defined.

>>> futures = [f(x) for x in range(10)]
>>> results = [future.result() for future in futures]

An explicit submit is more verbose but also less confusing.

The API for a case like this is also much more than just syntax. Default, current and worker client semantics are already confusing as it is and this kind of API is obfuscating this ambiguity even further. There is also the case of async clients that is not handled here at all.

If a user wants to use a decorator, nothing is stopping them to do so. Getting it working is very little work if you are confined to a specific usecase. However, for us as a library to support this we should think about the different edge cases and this is driving the cost to support this.

I see a lot of complexity for rather little value.

mrocklin · 2023-06-28T13:18:18Z

Yeah, I see where you're coming from about wanting to avoid complexity, both in terms of client lifecycle dynamics and multiple APIs. I think I generally assume that the client lifecycle dynamics we can handle and that we haven't yet found a great API here, and so it makes sense to keep experimenting.

This also breaks the symmetry to the concurrent.futures API we are typically striving for

To be clear, I'm not striving for that API in particular. I like that API, but find that relatively few users know of it.

If a user wants to use a decorator, nothing is stopping them to do so. Getting it working is very little work if you are confined to a specific usecase
I see a lot of complexity for rather little value

My guess is that if you gave more tutorials or did more things with beginning users that you would see more value.

mrocklin · 2023-06-28T13:18:31Z

Anyway, thanks for sharing your thoughts. I appreciate it.

mrocklin · 2023-07-21T14:23:07Z

Something that I suspect @fjetter will dislike even more, what if we combined this with #8028 and used module-level methods. This could become very magical.

# myscript.py
import dask

@dask()
def process(filename):
    ...

tasks = [process(fn) for fn in filenames]
for task in tasks:
    task.result()

We could call Client() if no client exists. Maximum magic ✨. 😆

I don't actually expect us to get here, but I think that thinking in this direction is probably fruitful for us. It may give us ideas on how to make Dask more accessible to less sophisticated users.

fjetter · 2023-07-25T07:24:51Z

How would the above differ from delayed (API/UX wise, I know the backend differences, of course)? IIUC the the only real difference is that your version would start computing right away whereas the delayed version only kicks off once we call compute. Do users care about this difference when using this magical API?

mrocklin · 2023-07-25T12:04:41Z

Yes, the real operational difference is eager vs lazy execution. My sense is that eager is more intuitive for folks.

mrocklin · 2023-07-25T12:08:49Z

More broadly, my sense is that futures are just more modern than delayed. There are plenty of small things where they're more supported (priorities, annotations, as_completed, ...). I'd like people to switch to futures more generally, but they stay with delayed, I suspect because of the decorator syntax.

fjetter · 2023-07-25T13:19:41Z

My reason to stick with delayed, back in the days, was because it was much easier to test and reason about. Mostly the ability to switch between sync/threading/distributed schedulers was what motivated me to use delayed.

fkaleo · 2023-11-16T21:17:32Z

very similar to the @ray.remote decorator, lovely idea!

I had to make my own:

def dask_submit(_func: Optional[Callable] = None, **submit_kwargs: Any) -> Callable[[Callable], Callable[..., Future]]:
    def decorator(func) -> Callable[..., Future]:
        @wraps(func)
        def wrapper(*args, **kwargs):
            client = get_client()
            future = client.submit(func, *args, **submit_kwargs, **kwargs)
            fire_and_forget(future)
            return future
        return wrapper

    if _func:
        return decorator(_func)

    return decorator

It supports passing to submit all the options it supports:

@dask_submit(resources={"GPU": 1})
def my_func():
    ...

and using it without calling it with ()

@dask_submit
def my_func():
    ...

mrocklin requested a review from fjetter as a code owner June 21, 2023 01:57

Use get_client to avoid lifecycle issue

5987da3

mrocklin mentioned this pull request Jul 20, 2023

RFC: Rename Future to Task in public facing interface #8028

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Add Client.decorator method #7936

RFC: Add Client.decorator method #7936

mrocklin commented Jun 21, 2023

github-actions bot commented Jun 21, 2023 •

edited

fjetter commented Jun 21, 2023

mrocklin commented Jun 21, 2023

fjetter commented Jun 27, 2023

mrocklin commented Jun 27, 2023

fjetter commented Jun 27, 2023

mrocklin commented Jun 27, 2023

fjetter commented Jun 28, 2023

mrocklin commented Jun 28, 2023

mrocklin commented Jun 28, 2023

mrocklin commented Jul 21, 2023

fjetter commented Jul 25, 2023

mrocklin commented Jul 25, 2023

mrocklin commented Jul 25, 2023

fjetter commented Jul 25, 2023

fkaleo commented Nov 16, 2023 •

edited

RFC: Add Client.decorator method #7936

Are you sure you want to change the base?

RFC: Add Client.decorator method #7936

Conversation

mrocklin commented Jun 21, 2023

Example

github-actions bot commented Jun 21, 2023 • edited

Unit Test Results

fjetter commented Jun 21, 2023

mrocklin commented Jun 21, 2023

fjetter commented Jun 27, 2023

mrocklin commented Jun 27, 2023

fjetter commented Jun 27, 2023

mrocklin commented Jun 27, 2023

fjetter commented Jun 28, 2023

mrocklin commented Jun 28, 2023

mrocklin commented Jun 28, 2023

mrocklin commented Jul 21, 2023

fjetter commented Jul 25, 2023

mrocklin commented Jul 25, 2023

mrocklin commented Jul 25, 2023

fjetter commented Jul 25, 2023

fkaleo commented Nov 16, 2023 • edited

github-actions bot commented Jun 21, 2023 •

edited

fkaleo commented Nov 16, 2023 •

edited