-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: implement sliding_window_view #10771
Conversation
Test cases are shown in the issue page.
numpy/lib/stride_tricks.py
Outdated
@@ -111,6 +111,35 @@ def as_strided(x, shape=None, strides=None, subok=False, writeable=True): | |||
|
|||
return view | |||
|
|||
def sliding_window_view(x, shape=None): | |||
""" | |||
Create rolling window views of the 2D array with the given shape. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason to limit to 2D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. I plan to implement beyond 2d and strides > 2 soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like what you have already works fine for > 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually not.
This is a 3D test case:
arr = np.arange(27).reshape(3,3,3)
print(arr)
shape = [2,2,2]
sliding_window_view(arr, shape)
input:
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
output:
array([[[[[[ 0, 1],
[ 3, 4]],
[[ 9, 10],
[12, 13]]],
[[[ 1, 2],
[ 4, 5]],
[[10, 11],
[13, 14]]]],
[[[[ 3, 4],
[ 6, 7]],
[[12, 13],
[15, 16]]],
[[[ 4, 5],
[ 7, 8]],
[[13, 14],
[16, 17]]]]],
[[[[[ 9, 10],
[12, 13]],
[[18, 19],
[21, 22]]],
[[[10, 11],
[13, 14]],
[[19, 20],
[22, 23]]]],
[[[[12, 13],
[15, 16]],
[[21, 22],
[24, 25]]],
[[[13, 14],
[16, 17]],
[[22, 23],
[25, 26]]]]]])
You can see there is some repeated elements. The expected output dims should be (2, 2, 2).
I am still thinking about how to implement it beyond 2D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That output looks correct to me - (2, 2, 2, 2, 2, 2)
is the right shape for a sliding window of shape (2, 2, 2)
over an array of shape (3, 3, 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work on N(>3) dim.
numpy/lib/stride_tricks.py
Outdated
""" | ||
Create rolling window views of the 2D array with the given shape. | ||
|
||
.. warning:: This function has to be used with extreme care, see notes from 'as_strided'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should design this function so that this is not true. Perhaps just set the readonly flag on the returned array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the warning? This warning comes from the 'as_strided', which this function depends on. I don't know if it's a good idea to remove this warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mean the warning. This warning is aimed at you, the caller of as_strided
. You should be the one taking extreme care, so that the caller of sliding_window_view
does not have to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add shape and step_size check. Warning can be safety removed.
numpy/lib/stride_tricks.py
Outdated
x : ndarray | ||
2D array to create rolling window views. | ||
shape : sequence of int, optional | ||
The shape of the new array. Defaults to ``x.shape``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not accurate - it's the shape of the window, right?
This default doesn't seem useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the same default setting as it is in 'as_strided'. Any change suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the default entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shape default removed.
Test cases should be part of this PR, to prove that they pass |
Test cases now added in PR. |
I mean, the test cases should be in the source code itself. There's probably a |
Sorry about that. This is my first contribution. So I need to create a new py file for the test cases? |
You should be able to find an existing file somewhere |
numpy/lib/stride_tricks.py
Outdated
[[[ 8, 9, 10], | ||
[12, 13, 14]], | ||
[[ 9, 10, 11], | ||
[13, 14, 15]]]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you lost a level of indent here
numpy/lib/stride_tricks.py
Outdated
strides = x.strides | ||
|
||
view_shape = np.concatenate((o, shape), axis=0) | ||
view_strides = np.concatenate((strides, strides), axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can just do strides + strides
numpy/lib/stride_tricks.py
Outdated
---------- | ||
x : ndarray | ||
2D array to create rolling window views. | ||
shape : sequence of int, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well force this to be a tuple
numpy/lib/stride_tricks.py
Outdated
|
||
view_shape = np.concatenate((o, shape), axis=0) | ||
view_strides = np.concatenate((strides, strides), axis=0) | ||
return np.lib.stride_tricks.as_strided(x ,view_shape, view_strides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comma in the wrong place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
I think this function should have the signature |
@shoyer I think the axis parameter does not have a very good interpretation in this case, especially on N(>2) dim. And if someone want to overlook sliding on particular dim, they can set the window shape in that dim as the shape of input array of that dim. What do you think? |
Interestingly setting it to |
Just add Sliding step. Now it also should work on N(>2) dimensions. Working on test cases. Sorry for late reply, finished my final yesterday. |
Add step_size default's description.
numpy/lib/stride_tricks.py
Outdated
Array to create rolling window views. | ||
shape : sequence of int | ||
The shape of the new array. | ||
step: sequence of int, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this argument is useful - it's meaning is clearer when done via slicing after the fact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that description is a mistake. It should be corrected as
shape : sequence of int
The shape of the sliding window.
Thanks for pointing out.
numpy/lib/stride_tricks.py
Outdated
shape : sequence of int | ||
The shape of the sliding window. | ||
step: sequence of int, optional | ||
The sliding step of view window for each dimension of input. Defaults to 1 on all dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reiterating my comment earlier - I don't think the step
argument belongs here: #7753 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think sliding_window(x, shape, step) is more efficient than sliding_window(x, shape)[::step], since the latter one requires creating view first, then take steps, which could be problematic in large input array. Also, the 'step' parameter (also mentioned as 'stepsize', 'stride') is suggested by the first comment(Here). I think it would be handy to keep this optional parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which could be problematic in large input array
The cost of a view is constant, and does not depend on the .shape
of the array - so "large" is irrelevant. Let's discuss the step argument in the issue, not in the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, the step argument is ambiguous - does it specify the step within a window, or the step between windows?
In deep learning world(where I came from), step within a window is usually called 'dilation', step between windows is usually called 'stride'. However, in numpy 'strides' usually means step by bytes in each dimension. I agree that the naming causes confusion here, but I think 'step' (or whatever it called) is used quite frequently. Maybe we can come up with a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss the step argument in the issue, not in the implementation.
@eric-wieser I clarify the document by adding ' window shifts' to this parameter description. I think this could avoid ambiguity and this parameter is quite useful. Do you still think we should remove this parameter?
This PR seems to have stalled after much work. |
@mattip Thanks for reminding me the existence of this PR, I have completely forgotten. 😵 |
+ Update writeable description. + Fix a few parameter checks. + Other minor improvements.
Is there any other updates needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work the same as skimage.util.view_as_windows
? that has a check on flags.contiguous
that's not present here. EDIT: never mind, that's an issue in skimage
@@ -113,6 +113,101 @@ def as_strided(x, shape=None, strides=None, subok=False, writeable=True): | |||
return view | |||
|
|||
|
|||
def sliding_window_view(x, shape, subok=False, writeable=False): | |||
""" | |||
Creates sliding window views of the N dimensional array with the given window |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a single summary line, followed by a blank line.
Returns | ||
------- | ||
view : ndarray | ||
Sliding window views of the array. (view.shape[i] = x.shape[i] - shape[i] + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need double backticks around the code snippet for correct formatting
view : ndarray | ||
Sliding window views of the array. (view.shape[i] = x.shape[i] - shape[i] + 1) | ||
|
||
See also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Also
(capital A)
|
||
Examples | ||
-------- | ||
>>> i, j = np.ogrid[:3,:4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a more understandable example. Say a 1-D one to start with. Also, to create a 2-D array it's better to use a more regularly used function, e.g. np.arange(12).reshape((3, 4))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current example is a nice one, but agreed we should start with a 1d example.
Since ogrid
isn't as well used, I think we should just add an >>> x
line that shows the resulting array, rather than switching to arange - having digits correspond to positions makes the output very easy to follow.
|
||
""" | ||
# first convert input to array, possibly keeping subclass | ||
x = np.array(x, copy=False, subok=subok) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.asanyarray
is the normal way to convert to an array while preserving subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done this way for consistency with as_strided
except: | ||
raise TypeError('`shape` must be a sequence of integer') | ||
else: | ||
if shape.ndim != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally int
is allowed for shape
as well. It's awkward to have to write (5,)
when 5
would work too. Compare e.g. with np.zeros
if np.any(shape < 0): | ||
raise ValueError('`shape` cannot contain negative value') | ||
|
||
o = np.array(x.shape) - shape + 1 # output shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o
--> output
(or some other name, please no one-letter o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_shape
would be best
@@ -336,6 +336,33 @@ def test_as_strided(): | |||
assert_equal(a.dtype, a_view.dtype) | |||
assert_array_equal([r] * 3, a_view) | |||
|
|||
|
|||
def test_sliding_window_view(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like tests with 1/2/3-D arrays at least are needed here.
also a test with non-default strides.
and in the issue there's a comment that this implementation is more numerically stable than the pandas rolling window implementation. would be nice if that could be shown/guaranteed with a test that's numerically challenging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "numerically stable" mean for an procedure that doesn't perform any arithmetic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s certainky more numerically stable to calculate a rolling window mean separately for each window. Pandas uses an algorithm that involves keeping track of a single “current” sum for the rolling window, so every element gets added and then subtracted. Of course, this can be way faster....
I'm starting to think we have the wrong API here. Here's an alternate set of arguments that we could use: # name is only for clarity in this comment, the current name is fine
def new_sliding(x, shape, axis):
# add all the type checking and perhaps convenience shorthands as this PR has here
w_ndim = len(shape)
assert w_ndim == len(axis)
out_strides = x.strides + tuple(x.strides[ax] for ax in axis)
# note: same axis can be windowed repeatedly
x_shape_trimmed = list(x.shape)
for ax, dim in zip(axis, shape):
x_shape_trimmed[ax] -= dim - 1
out_shape = tuple(x_shape_trimmed) + shape
return np.lib.index_tricks.as_strided(x, strides=out_strides, shape=out_shape) Going through some use cases:
Note how that in all but the first case (edit: fixable with a sensible default), def toeplitz(a):
n = a.shape[-1]
assert n%2
return sliding_window_view_v2(a, ((n+1)//2, (n+1)//2), axis=(-1, -1))[...,::-1,:] >>> toeplitz(np.arange(5))
array([[[2, 3, 4],
[1, 2, 3],
[0, 1, 2]]]) |
Nice comparison @eric-wieser. |
Is this coming in an upcoming release? |
@Dan-Patterson nothing very specific. I suppose this mostly needs a champion to see things through. The main thing being coming to a decision for the final API. Which maybe means looking at the different suggestions here and writing a mail to the mailing list making a proposition and providing some background. |
A minor comment about |
@jni: Yes, that seems like a good idea to me. |
Contrastring def view_as_windows(arr_in, window_shape, step=1):
if not isinstance(window_shape, tuple):
window_shape = (window_shape,) * arr_in.ndim
if not isinstance(step, tuple):
step = (step,) * arr_in.ndim
all_windows = new_sliding(arr_in, window_shape)
return all_window[np.s_[...,] + tuple(np.s_[::s,] for s in step)] and def view_as_blocks(arr_in, block_shape):
# missing all the skimage sanity checks
all_windows = new_sliding(arr_in, block_shape)
return all_window[np.s_[...,] + tuple(np.s_[::s,] for s in block_shape)] |
@eric-wieser I don't quite follow the first example. You generate the overlapping blocks from the non-overlapping blocks view? |
|
This was the line I was looking at:
But, now I see, this is to introduce steps other than 1. |
This seems so close to the finishing line. Can we push it over? @Fnjn:
|
@eric-wieser, @rgommers, the original author @Fnjn seems to have lost interest. Is it ok to push forward regardless? What are the missing steps? |
@zklaus yes it's okay to pick this up and continue in a new PR. Please just make sure to keep the old commits intact, so authorship gets preserved. |
I pick this up in the replacement PR #17394. |
@zklaus Thank you for picking up this PR. I don't really have time to work on this. Wish you luck! |
Thanks, @Fnjn! |
- Change api to follow suggestion by Eric Wieser in numpy#10771 (comment) - Update docstring - Add more tests
* implement sliding_window_view #7753 Test cases are shown in the issue page. * Add Example Cases * Add step_size and N-dim support * Add shape and step_size check. Remove warning. * Remove shape default Add step_size default's description. * Give proper parameter name 'step' * fix a parameter description mistake * implement test function for sliding_window_view() * implement test function for sliding_window_view() * Fix according to @eric-wieser comments * Change arange to ogrid in Examples * remove np.squeeze on return line * Clarify document to avoid parameter confusion. * add `writable` and more explanation in docs * resolve a write conflit * fixes according to @seberg review * resolve write hazard * remove outdated docs. * change referring according to @mattip. change 'writeable' to 'readonly' as @seberg suggest. remove 'step' as @eric-wieser request * fix test minor error * DOC: Grammar fixes * STY: Add missing line break required by PEP8 * + Change readonly parameter to writeable. + Update writeable description. + Fix a few parameter checks. + Other minor improvements. * Move to new api as proposed by @eric-wieser - Change api to follow suggestion by Eric Wieser in #10771 (comment) - Update docstring - Add more tests * Improve documentation * Add sliding_window_view to documentation index * Apply suggestions from code review Co-authored-by: Eric Wieser <wieser.eric@gmail.com> * Fix window shape check * Add `sliding_window_view` to __all__ * Add tests for error cases * Add array_function dispatching * Change dispatcher argument defaults to None * Simplify array function dispatching * Added "np." prefix to doctests * Split tests * Improved docstring * Add release note * Fix docstring formatting * Fix doctest * Remove namespacing in documentation indexing * Improve docstring * Improved docstring * Simplified docstring * Improve docstring to make pseudo code stand out * Improve docstring * Add simple application example * Correct release note * Improve link with as_strides * Add note about performance * Tidy up main doc string * Make language on performance warning stronger * Bugfix: pass subok and writeable to as_strided * Add writeable test * Add subok test * Change subok test to use custom array subclass instead of unsupported MaskedArray * Add version added information Co-authored-by: Eric Wieser <wieser.eric@gmail.com> Co-authored-by: Fanjin <fjzeng@ucsd.edu> Co-authored-by: Fanjin Zeng <Fnjn@users.noreply.github.com> Co-authored-by: Eric Wieser <wieser.eric@gmail.com> Co-authored-by: fanjin <fjzeng@outlook.com> Closes gh-7753
Superseded by gh-17394, please do have a look if it seems good if you like. Thanks for this PR! |
Fixes #7753
Test Cases for PR #10771 :
Test Case 1
Test Case 2:
Test Case 3