Replies: 1 comment 1 reply
-
This might help:
https://docs.dask.org/en/stable/best-practices.html#avoid-calling-compute-repeatedly
…On Mon, Oct 23, 2023 at 1:29 PM Noah S. Prime ***@***.***> wrote:
So I've been working on a project that uses Dask to apply a custom
function over chunks in my data using dask.distributed to parallelize the
process.
I'd been using da.map_blocks() to do this. However I now want to add an
additional return object which contains diagnostic information, so I've
attempted to use da.apply_gufunc essentially like
result, diagnostic_result = da.apply_gufunc(
my_custom_function,
'(i),(i),(i),(i)->(),()',
da_one,
da_two,
da_three,
da_four,
output_dtypes=(float,float)
)
However, I need to .compute() on both return values, triggering two
computations, even though it's completely redundant (effectively every
function call required to create the diagnostic result is called when
creating the primary results). Is there a way to avoid this.
Alternatively, is there another proven way for saving diagnostic results
(specifically in this case I want an n x n array of floats output) while
using Dask distributed and da.map_blocks?
—
Reply to this email directly, view it on GitHub
<#10591>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTDJ2GTDM3HDGQBLLRTYA2ZQDAVCNFSM6AAAAAA6MQCWYSVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZVG43DSMRVHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So I've been working on a project that uses Dask to apply a custom function over chunks in my data using
dask.distributed
to parallelize the process.I'd been using
da.map_blocks()
to do this. However I now want to add an additional return object which contains diagnostic information, so I've attempted to useda.apply_gufunc
essentially likeHowever, I need to
.compute()
on both return values, triggering two computations, even though it's completely redundant (effectively every function call required to create the diagnostic result is called when creating the primary results). Is there a way to avoid this.Alternatively, is there another proven way for saving diagnostic results (specifically in this case I want an
n x n
array of floats output) while using Dask distributed andda.map_blocks
?Beta Was this translation helpful? Give feedback.
All reactions