Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sliding window computations #4659

Closed
pentschev opened this issue Apr 1, 2019 · 8 comments · Fixed by #7234
Closed

Support sliding window computations #4659

pentschev opened this issue Apr 1, 2019 · 8 comments · Fixed by #7234

Comments

@pentschev
Copy link
Member

Consider the following code:

import dask.array as da
import numpy as np

d = da.arange(8, chunks=4)

g = d.map_overlap(np.mean, depth=0, boundary=0, trim=False, keepdims=True)

After computing g, this will return the mean of the two chunks [0:4] and [4:8]. Currently, there seems to be no way to specify a step size different (smaller) than the chunk size for each dimension. One could want a step size of 2, result in the mean for ranges [0:4], [2:6], [4:8].

Being able to specify a step size is very useful for filtering images, for example, as the user may be fine with computing every the filter for only every second or third pixel to reduce compute time. The user may want an arbitrary step size, which I think will normally be anything from 1..len(chunk[dim]).

I would be interesting in knowing what options do we currently have (if any), otherwise, what people think may be a good direction for the implementation of a step size.

cc @mrocklin @jakirkham

@mrocklin mrocklin changed the title Allow stepping in map_overlap Support sliding window computations Apr 1, 2019
@mrocklin
Copy link
Member

mrocklin commented Apr 1, 2019

I've renamed this issue to "Support sliding window comptutations" (I hope that that's ok). The map_overlap function is more specifically for mapping a function over chunks of data, my guess is that it would be used internally by some sort of sliding window function, but may not be the user level API directly.

Today, NumPy lacks a sliding window API. This is discussed in more depth here: numpy/numpy#7753

If such an API existed, then we might consider copying it. Given that it doesn't exist, we might consider creating our own, and collaborating with upstream NumPy to make sure that they're ok with it in case they ever go in that direction.

If I were to do this today I would probably make a block-wise function that did a sliding window computation, probably using either NumPy stride tricks (a hack described in that issue and known to expert numpy users) or just for loops with Numba, and then I would apply that function with map_overlap

@pentschev
Copy link
Member Author

Thanks for your input @mrocklin. I'm definitely gonna take a look at the NumPy discussion in detail.

I have a feeling you're right, we're probably gonna need the block-wise function, or Numba as a last resort. I'm particularly concerned about the memory footprint, certainly making a copy of every window+boundaries to use it as a chunk will be too expensive.

@mrocklin
Copy link
Member

mrocklin commented Apr 1, 2019

I think that we can use either Numba or stride tricks to do an efficient sliding window computation in low memory.

I think that we can use map_overlap to handle the boundaries between large blocks.

I think that with both of these we can achieve what you want in small space.

@pentschev
Copy link
Member Author

I'll continue to investigate this, thanks for the suggestions @mrocklin.

@pentschev
Copy link
Member Author

There's a sliding window PR numpy/numpy#10771, it seems to be what we want. The progress has stalled for quite some time, the author mentioned he would pick it up again, but that hasn't happened yet, presumable he's got no time for that. I can probably suggest there that I continue with it. Do you think that looks good @mrocklin ?

I'd nevertheless have to find a more immediate solution, since we'll need the same for CuPy, and I don't think we can get that PR in NumPy, do the work for CuPy as well and have it merged in less than 1-2 months.

@mrocklin
Copy link
Member

mrocklin commented Apr 2, 2019

I think it would be good for that PR to be finished. In principle having anyone finish it would be good.

Putting on my NVIDIA hat I don't currently know where this task fits into our current priorities. It's not clearly tied to any particular task that I know of, so I would prioritize is somewhat low. That being said, if you're excited about doing it then please don't let me get in the way :)

@theXYZT
Copy link

theXYZT commented Mar 4, 2021

I am quite interested in having this functionality in Dask. Now that Numpy has implemented a sliding_window_view (numpy/numpy#17394) which is now available in Numpy 1.20, I was wondering what the status of this issue is.

@pentschev
Copy link
Member Author

@theXYZT see #7234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants