Add .split_at() methods for AxisChunksIter/Mut #691

jturner314 · 2019-08-20T23:08:29Z

This adds .split_at() methods for AxisChunksIter and AxisChunksIterMut. Once this is merged, it will be straightforward to implement #639 in terms of .split_at().

IMO, it's easier to understand and work with the implementation of these iterators using `partial_chunk_index` and `partial_chunk_dim` than `n_whole_chunks` and `last_dim`.

LukeMathWalker · 2019-08-21T20:01:00Z

src/iterators/mod.rs

+    /// size due to the axis length not being evenly divisible). If the axis
+    /// length is evenly divisible by the chunk size, this index is larger than
+    /// the maximum valid index.
+    partial_chunk_index: usize,


Would it be beneficial to rephrase this as an Option, to make it clearer that we might (or might not) have a partial chunk? Something along the lines of:

pub struct AxisChunksIter<'a, A, D> { iter: AxisIterCore<A, D>, partial_chunk: Option<PartialChunk>, life: PhantomData<&'a A> } struct PartialChunk { partial_chunk_index: usize, partial_chunk_dim: D }

I don't think it makes sense to use both the Option variant and the value of partial_chunk_index to represent whether or not there's a partial chunk. (The biggest reason is that I prefer data structures where there's a single source of truth, rather than having to keep multiple things in sync. There might also be a small performance cost to accessing partial_chunk_index through the Option (since accessing it requires checking whether the Option is the Some variant), but we'd need to test to determine if that would really be noticeable.) IMO, putting the fields in an Option would be additional complication over the current approach without much benefit.

It would be reasonable to eliminate partial_chunk_index and just use the Option variant to represent the presence of a partial chunk, like this:

pub struct AxisChunksIter<'a, A, D> { iter: AxisIterCore<A, D>, partial_chunk: Option<D>, life: PhantomData<&'a A> }

or to always store the shape of the last chunk (regardless of whether or not it's a partial chunk):

pub struct AxisChunksIter<'a, A, D> { iter: AxisIterCore<A, D>, last_chunk_dim: D, life: PhantomData<&'a A> }

These approaches have two disadvantages since they rely on checking whether the iterator is at its end to handle the partial chunk instead of checking whether the current index is equal to partial_chunk_index:

.split_at() needs to check whether or not the partial chunk is in the left piece and determine partial_chunk or last_chunk_dim of the left piece accordingly. (The partial chunk is in the left piece when index == self.iter.len().)

.next_back() needs to set partial_chunk to None or last_chunk_dim to self.iter.inner_dim each time it's called.

So, I'd rather keep the current approach and add more comments if necessary to make it clear.

bluss · 2019-09-03T20:37:13Z

src/iterators/mod.rs

+                    },
+                    Self {
+                        iter: right,
+                        partial_chunk_index: self.partial_chunk_index,


I haven't read the whole code unfortunately (what's not visible in the diff) - why doesn't this partial_chunk_index require adjusting - the right part of the iter now starts at index, so I'd expect this to be offset by - index?

Here's an example:

use ndarray::prelude::*; fn main() { let a: Array1<i32> = (0..13).collect(); let mut iter = a.axis_chunks_iter(Axis(0), 3); iter.next(); // skip the first element so that we consider a partially-consumed iterator println!("before_split = {:#?}", iter); let (left, right) = iter.split_at(2); println!("left = {:#?}", left); println!("right = {:#?}", right); }

which gives the output

before_split = AxisChunksIter { iter: AxisIterCore { index: 1, end: 5, stride: 3, inner_dim: [3], inner_strides: [1], ptr: 0x00005634af728b40, }, partial_chunk_index: 4, partial_chunk_dim: [1], life: PhantomData, } left = AxisChunksIter { iter: AxisIterCore { index: 1, end: 3, stride: 3, inner_dim: [3], inner_strides: [1], ptr: 0x00005634af728b40, }, partial_chunk_index: 4, partial_chunk_dim: [1], life: PhantomData, } right = AxisChunksIter { iter: AxisIterCore { index: 3, end: 5, stride: 3, inner_dim: [3], inner_strides: [1], ptr: 0x00005634af728b40, }, partial_chunk_index: 4, partial_chunk_dim: [1], life: PhantomData, }

We can visualize the situation like this:

0 1 2 3 4 before split: ^ | after split: ^ |^ |

The ^s represent the indexes and the |s represent the ends of the iterators. (The |s appear just before the corresponding end indices.) There are 4 full chunks (indices 0..=3) and 1 partial chunk (index 4). Note that all indices are relative to the start of the axis, so any given index value represents the same location before and after the split. This is why partial_chunk_index is the same before and after splitting. Before splitting, the index of the partial chunk is 4, and it stays 4 in the split pieces. (The left piece will never actually reach index 4 since its end is 3; that's the desired behavior.)

Thanks. If the only use of index is counting up to the partial_chunk_index, it makes total sense.

index is also used in AxisIterCore (which AxisChunksIter wraps) to compute the pointer of each element/chunk and to check for the end of the iterator; see AxisIterCore's implementation of .next(). (.split_at() on AxisIterCore doesn't change the ptr value; ptr always corresponds to the start of the axis. This was part of #669.)

bluss

Nice! Remember that I trust your judgment @jturner314. I have read the PR - it's not that 🙂 - I mean that I trust you to review and merge your own PRs, so you can do that when you think it is appropriate (which is probably almost all the time).

jturner314 · 2019-09-04T14:49:40Z

Thanks for reviewing this @bluss! I generally like to get a review from someone before merging, but thanks for the vote of confidence. I'm comfortable merging my own PRs without a review when necessary.

jturner314 added 4 commits August 20, 2019 18:51

Clarify behavior of AxisChunksIter/Mut

269865c

IMO, it's easier to understand and work with the implementation of these iterators using `partial_chunk_index` and `partial_chunk_dim` than `n_whole_chunks` and `last_dim`.

Move some logic into AxisIterCore

af9d949

Add split_at methods for AxisChunksIter/Mut

b26ec6c

Add more tests for AxisChunksIter

4bee214

jturner314 mentioned this pull request Aug 20, 2019

Parallel Iterator for AxisChunksIter #639

Merged

jturner314 added the enhancement label Aug 20, 2019

LukeMathWalker reviewed Aug 21, 2019

View reviewed changes

bluss reviewed Sep 3, 2019

View reviewed changes

bluss approved these changes Sep 4, 2019

View reviewed changes

jturner314 merged commit c916203 into rust-ndarray:master Sep 4, 2019

jturner314 deleted the split-chunks branch September 4, 2019 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add .split_at() methods for AxisChunksIter/Mut #691

Add .split_at() methods for AxisChunksIter/Mut #691

jturner314 commented Aug 20, 2019

LukeMathWalker Aug 21, 2019

jturner314 Sep 3, 2019

bluss Sep 3, 2019

jturner314 Sep 3, 2019

bluss Sep 3, 2019

jturner314 Sep 3, 2019

bluss left a comment

jturner314 commented Sep 4, 2019

Add .split_at() methods for AxisChunksIter/Mut #691

Add .split_at() methods for AxisChunksIter/Mut #691

Conversation

jturner314 commented Aug 20, 2019

LukeMathWalker Aug 21, 2019

Choose a reason for hiding this comment

jturner314 Sep 3, 2019

Choose a reason for hiding this comment

bluss Sep 3, 2019

Choose a reason for hiding this comment

jturner314 Sep 3, 2019

Choose a reason for hiding this comment

bluss Sep 3, 2019

Choose a reason for hiding this comment

jturner314 Sep 3, 2019

Choose a reason for hiding this comment

bluss left a comment

Choose a reason for hiding this comment

jturner314 commented Sep 4, 2019