`dask.Array` copy behaviour #9248

davidhassell · 2022-07-07T09:39:13Z

davidhassell
Jul 7, 2022

Hello,

I was wondering why Array.copy behaves differently when there is only 1 partition compared with when there are multiple partitions. In the single partition case, the numpy array appears to be replaced with an in-memory copy of itself (during the compute), but not so in multiple partition case:

>>> import dask.array as da
>>> x = da.from_array(list(range(10)), chunks=-1).copy()   # npartitions = 1
>>> x.dask
HighLevelGraph with 2 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f6481de3df0>
0. array-5b75f23fcdbb58b6fb4d5bf579904cdc
1. copy-f858750688dfe3c2a90b611d5f3c9339
>>> x = da.from_array(list(range(10)), chunks=5).copy()   # npartitions = 2
>>> x.dask
HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f648a3011f0>
0. array-7dc27a28b597e5b932909e62d8d97007

The code clearly intends this (https://github.com/dask/dask/blob/main/dask/array/core.py#L2772-L2779)

    def copy(self):
        if self.npartitions == 1:
            return self.map_blocks(M.copy)
        else:
            return Array(self.dask, self.name, self.chunks, meta=self)

Is this copy really happening just in the single partition case? and if so I'd be very interested to know why, as it would affect performance.

Edit: I have an additional use case, aside from performance, in which we are creating an implementation that interfaces dask with "active storage", where reductions can be carried out on the server where the data is, rather than locally by dask itself, and the results for each chunk fed back into a standard dask workflow.

Our initial approach requires knowledge of whether or not a dask graph only contains a data definition, and no further operations. A copied dask array doesn't logically have any further operations, but the presence of a copy layer, makes it much harder to determine if I have this situation.

Many thanks,
David

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`dask.Array` copy behaviour #9248

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

dask.Array copy behaviour #9248

davidhassell Jul 7, 2022

Replies: 0 comments

`dask.Array` copy behaviour #9248

davidhassell
Jul 7, 2022