Scattering Functions That Use Large Arrays #4322
Unanswered
elliottperryman
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Have you tried scatter on import numpy as np
from dask.distributed import Client, as_completed
def my_code(func, x, client):
# x is not big, but the func calls a big array
# is there a way to scatter func?
futures = client.map(func, x)
# just does some stupid work
max_val = -1
for res in as_completed(futures):
if res.result() > max_val:
max_val = res.result()
return max_val
def user_code():
big_data = np.random.random((10**3,10**3))
def big_func(x):
if np.sum(big_data.result())<0: print('note .result() to convert the future')
return x**2
client = Client()
print('dashboard:',client.dashboard_link)
big_data = client.scatter(big_data) # if every worker needs it, you can use broadcast=True
best = my_code(big_func, range(101), client)
print('best:',best)
client.close()
def main():
user_code()
if __name__ == "__main__":
main() As a backup strategy, it's possible to change the work to write |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I've got a question that may turn out to be an easy problem, but I'm not sure. My use case is this - I'm using dask distributed to map work across many nodes and be flexible with where it can run (which dask has been really great for!). However, I'm not sure how to scatter a large function across a client. Here's the code that would reproduce this effect (increase the size of big_data or number of processors in the client to see this more):
What do you all think? I am not sure if it is possible, since my code doesn't know about big_data. In its real use case, client would spread across many machines, so the effect of each parallel application referencing the same memory would be even bigger (at least that's my interpretation).
Thanks,
Elliott
Beta Was this translation helpful? Give feedback.
All reactions