dask.Array
copy behaviour
#9248
Unanswered
davidhassell
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I was wondering why
Array.copy
behaves differently when there is only 1 partition compared with when there are multiple partitions. In the single partition case, the numpy array appears to be replaced with an in-memory copy of itself (during the compute), but not so in multiple partition case:The code clearly intends this (https://github.com/dask/dask/blob/main/dask/array/core.py#L2772-L2779)
Is this copy really happening just in the single partition case? and if so I'd be very interested to know why, as it would affect performance.
Edit: I have an additional use case, aside from performance, in which we are creating an implementation that interfaces dask with "active storage", where reductions can be carried out on the server where the data is, rather than locally by dask itself, and the results for each chunk fed back into a standard dask workflow.
Our initial approach requires knowledge of whether or not a dask graph only contains a data definition, and no further operations. A copied dask array doesn't logically have any further operations, but the presence of a copy layer, makes it much harder to determine if I have this situation.
Many thanks,
David
Beta Was this translation helpful? Give feedback.
All reactions