ActorFuture vs pure Future to submit #7115
-
Hi there, Why is it forbidden to pass from distributed.client import Client
c = Client()
future1 = c.submit(lambda: (1, 2))
futures = [c.submit(lambda l: l[i], future1) for i in range(2)]
futures[0].result()
1
futures[1].result()
2
class Actor:
def foo(self):
return (1, 2)
actor_future = c.submit(Actor, actor=True)
actor = actor_future.result()
future2 = actor.foo()
futures = [c.submit(lambda l: l[i], future2) for i in range(2)]
TypeError: cannot pickle '_thread.lock' object Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
cc @jsignell |
Beta Was this translation helpful? Give feedback.
-
In the dask model, futures denote tasks or values that are held in the cluster or due to be run in the cluster, and their results are stateless (i.e., you would get the same if you ran it again, on whichever worker). Actors sit outside of this model - each is an instance on a specific worker, and maintains internal state. Running methods on an actor uses a completely different code-path, because the communication is directly from the client to the worker, and the scheduler is not involved. This is by design, to give minimum latency and stateful operation - since it's an arbitrary method, calling it twice might give new results. Note that Your example could be made to work by passing the actor itself and calling the method within the function you submit - yes, you can call actors from within tasks or even from within other actors. Totally agree that all of this API is niche and incomplete. I have two PRs on the matter, and no one to review them! |
Beta Was this translation helpful? Give feedback.
-
I get the sense that you're trying to make Modin work well on Dask using
Actors. I recommend zooming out a bit first and first engaging in an
architectural discussion on what approach is best. It may be that you are
in a rabbit hole here.
…On Mon, Feb 8, 2021 at 7:16 AM Martin Durant ***@***.***> wrote:
In the dask model, futures denote tasks or values that are held in the
cluster or due to be run in the cluster, and their results are stateless
(i.e., you would get the same if you ran it again, on whichever worker).
Actors sit outside of this model - each is an instance on a specific
worker, and maintains internal state. Running methods on an actor uses a
completely different code-path, because the communication is directly from
the client to the worker, and the scheduler is not involved. This is by
design, to give minimum latency and stateful operation - since it's an
arbitrary method, calling it twice might give new results. Note that
type(future2) is not a normal future.
Your example could be made to work by passing the actor itself and calling
the method within the function you submit - yes, you can call actors from
within tasks or even from within other actors. Totally agree that all of
this API is niche and incomplete. I have two PRs on the matter, and no one
to review them!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7115 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTHIM77Z7JL2PVU7XFTS5753FANCNFSM4WTIUTHA>
.
|
Beta Was this translation helpful? Give feedback.
In the dask model, futures denote tasks or values that are held in the cluster or due to be run in the cluster, and their results are stateless (i.e., you would get the same if you ran it again, on whichever worker).
Actors sit outside of this model - each is an instance on a specific worker, and maintains internal state. Running methods on an actor uses a completely different code-path, because the communication is directly from the client to the worker, and the scheduler is not involved. This is by design, to give minimum latency and stateful operation - since it's an arbitrary method, calling it twice might give new results. Note that
type(future2)
is not a normal future.Your example …