New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add Client.decorator method #7936
base: main
Are you sure you want to change the base?
Conversation
I'm not sure if this is a good idea, and if it is then it could use a better name. Mostly I was playing with Modal and enjoying it but wanted full Dask semantics around futures. Docstring follows. Decorate a function to submit tasks to Dask This converts a normal function to instead return Dask Futures. That function can then be used in parallel. This takes the same keywords as ``client.submit`` Example ------- ```python >>> @client.decorate() ... def f(x): ... return x + 1 >>> futures = [f(x) for x in range(10)] >>> results = [future.result() for future in futures] ```
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 19 files ± 0 19 suites ±0 10h 24m 32s ⏱️ - 1h 52m 8s For more details on these failures, see this check. Results for commit 5987da3. ± Comparison against base commit 49437c2. This pull request removes 2 and adds 12 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
Is this really useful? I don't see how this comes up as a use case frequently. It's also a bit weird to have the I'm not very excited about this addition yet. Do you have an example where this is useful? |
I mean, it's functionally equivalent to If we look at Also, to be clear, I'm not saying "I want us to merge this now". I'm saying "we should probably talk about this a bit".
Ah, interesting point. |
From a syntax/UX perspective I can see the appeal of the decorator but delayed objects are not bound to any object, especially not to a dynamic one like the client. Apart from the lifecycle problems I mentioned earlier, this also limits the usefulness in a way that the client has to exist before the function is defined and the client has to exist in the local scope (the default client mechanism also allows one to not have the client around and just use dask.compute). Executing the function once the client is closed will likely also spew all sorts of weird exceptions. This experience could likely be made smoother but only at the cost of complexity (marrying this decorator with current/default clients, better exception handling, weakrefs, ...) but I'm not convinced this is worth it. |
We could do something like the following: def submit(**kwargs):
def _(func):
return partial(get_client().submit, func, **kwargs)
return _ This would resolve the lifecycle issues, and also having to have a client object ahead of time. |
The partial will still bind to the instance returned by |
I acknowledge those problems. I think that we can probably get around them. Do you agree? What do you think about the general API? |
There are already so many problems around client mechanics that I'm not very eager to introduce more since those problems rarely rise to the level of top priority, are not fixed eagerly but are still causing pain or inconveniences. We are already struggling ironing out current semantics. We can possibly work around the problems but I'm not convinced this is worth it.
I would be more excited if we could remove the >>> futures = [f(x) for x in range(10)]
>>> results = [future.result() for future in futures] An explicit submit is more verbose but also less confusing. The API for a case like this is also much more than just syntax. Default, current and worker client semantics are already confusing as it is and this kind of API is obfuscating this ambiguity even further. There is also the case of async clients that is not handled here at all. If a user wants to use a decorator, nothing is stopping them to do so. Getting it working is very little work if you are confined to a specific usecase. However, for us as a library to support this we should think about the different edge cases and this is driving the cost to support this. I see a lot of complexity for rather little value. |
Yeah, I see where you're coming from about wanting to avoid complexity, both in terms of client lifecycle dynamics and multiple APIs. I think I generally assume that the client lifecycle dynamics we can handle and that we haven't yet found a great API here, and so it makes sense to keep experimenting.
To be clear, I'm not striving for that API in particular. I like that API, but find that relatively few users know of it.
My guess is that if you gave more tutorials or did more things with beginning users that you would see more value. |
Anyway, thanks for sharing your thoughts. I appreciate it. |
Something that I suspect @fjetter will dislike even more, what if we combined this with #8028 and used module-level methods. This could become very magical. # myscript.py
import dask
@dask()
def process(filename):
...
tasks = [process(fn) for fn in filenames]
for task in tasks:
task.result() We could call I don't actually expect us to get here, but I think that thinking in this direction is probably fruitful for us. It may give us ideas on how to make Dask more accessible to less sophisticated users. |
How would the above differ from |
Yes, the real operational difference is eager vs lazy execution. My sense is that eager is more intuitive for folks. |
More broadly, my sense is that futures are just more modern than delayed. There are plenty of small things where they're more supported (priorities, annotations, as_completed, ...). I'd like people to switch to futures more generally, but they stay with delayed, I suspect because of the decorator syntax. |
My reason to stick with delayed, back in the days, was because it was much easier to test and reason about. Mostly the ability to switch between sync/threading/distributed schedulers was what motivated me to use delayed. |
very similar to the @ray.remote decorator, lovely idea! I had to make my own:
It supports passing to
and using it without calling it with
|
I'm not sure if this is a good idea, and if it is
then it could use a better name. Mostly I was playing with Modal and enjoying it but wanted full Dask semantics around futures. Docstring follows.
Decorate a function to submit tasks to Dask
This converts a normal function to instead return Dask Futures. That function can then be used in parallel.
This takes the same keywords as
client.submit
Example
Closes #xxxx
pre-commit run --all-files