Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add .flatten_ok() #527

Merged
merged 7 commits into from Apr 21, 2021
Merged

add .flatten_ok() #527

merged 7 commits into from Apr 21, 2021

Conversation

vadixidav
Copy link
Contributor

Closes #522

} else {
self.inner = None;
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need this else? The only case where it isn't executed is if self.inner is Some(_) and inner.next() is also Some(_), but in that case you have an early return. I would execute the contents of the else in any case, both if the if let succeeds or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SkiFire13 Your point is sound. In fact, we shouldn't even need to assign it to None in that case as it will just automatically fall through. That might hurt performance in the case where next() is called on an empty iterator repeatedly, but I don't think that actually matters. So yes, let me make that change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting it to None might have a reason if you consider consistency in case next is called after the iterator ended. Should it try to advance the last inner iterator? Or the outer iterator? Currently the stdlib fuse both of them, but I guess it would be reasonable if you advance the outer iterator (in that case you need to set the self.inner to None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize now that I should not change setting the inner to None as it might incur a performance penalty when hitting several Err values in a row.

@SkiFire13
Copy link
Contributor

A couple of other things you could also add:

  • Specialized methods of Iterator, for example size_hint and fold
  • Some integration tests
  • Implementations of DoubleEndedIterator, FusedIterator and Debug to match the std's FlatMap
  • Since the std's FlatMap fuses the outer iterator you might consider that as well for consistency. Note however that this is undocumented behaviour, so this might change in the future.

Comment on lines 55 to 60
fn clone(&self) -> Self {
Self {
iter: self.iter.clone(),
inner: self.inner.clone(),
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use clone_fields for consistency here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Fixed and will push shortly.

@vadixidav
Copy link
Contributor Author

@SkiFire13

* Specialized methods of `Iterator`, for example `size_hint` and `fold`

I don't think we can implement size_hint because even if the inner iterator had thousands of Ok elements, there is no guarantee that we will even get a single element.

I could add a specialized implementation of fold, but that could be done in another PR if someone runs into a performance issue with it. I think it should be alright as-is.

* Some integration tests

I can add those.

* Implementations of `DoubleEndedIterator`, `FusedIterator` and `Debug` to match the std's `FlatMap`

I couldn't think of a way to implement DoubleEndedIterator effectively. It would require me to store the last Ok and the next Ok at the same time. If this is acceptable, I can do it, but it will make the iterator larger and add more logic to retrieve the final elements of the inner iterator.

I can definitely implement FusedIterator and Debug, so I will do that.

* Since the std's `FlatMap` fuses the outer iterator you might consider that as well for consistency. Note however that this is undocumented behaviour, so this might change in the future.

I think I will just allow the user to call .fuse() should they need that behavior. I will implement FusedIterator if the input iterator is fused, which should make it maximally flexible for the time being.

@SkiFire13
Copy link
Contributor

I don't think we can implement size_hint because even if the inner iterator had thousands of Ok elements, there is no guarantee that we will even get a single element.

I think you can use the current inner iterator's size_hint for the lower bound. Far from perfect, but better than nothing.

I could add a specialized implementation of fold, but that could be done in another PR if someone runs into a performance issue with it. I think it should be alright as-is.

I think it's good practice to provide specialized implementations of fold, especially if next contains complex logic because that can prevent LLVM from autovectorizing the loops. Anyway I can add it later.

I couldn't think of a way to implement DoubleEndedIterator effectively. It would require me to store the last Ok and the next Ok at the same time. If this is acceptable, I can do it, but it will make the iterator larger and add more logic to retrieve the final elements of the inner iterator.

The stdlib does exactly that. I don't think it's a big deal if the iterator is larger since it's supposed to be short lived and probably inlined. The additional check shouldn't matter that much since if you never call next_back it's only for the very last element.

@vadixidav
Copy link
Contributor Author

@SkiFire13 Okay, in that case I will add a size_hint impl. It wont help in the case that the iterator hasn't already had next called on it at least once, since the inner iterator is None, but perhaps if someone partially consumes an iterator and then tries to collect it then it might help to a degree. I will hold off on adding a specialized fold impl for now, and that can be added later. I will also implement DoubleEndedIterator now based on your comment, since that is something that matters to the API.

@vadixidav
Copy link
Contributor Author

The size_hint impl actually was able to glean more information than I originally thought, but only slightly. It should be helpful in some limited cases.

@vadixidav
Copy link
Contributor Author

@SkiFire13 Okay, I believe I am done now. I did not add tests for size_hint because I couldn't think of any good way to test that, but if you would like me to do that, I can add some tests for size_hint. If there is some fuzzer I can hook it up to for testing size_hint, that would be great. I can produce random sequences of iterators that could be validated by the fuzzer to not violate size_hint's contract, if we have that (I only took a cursory look at other tests).

@vadixidav
Copy link
Contributor Author

vadixidav commented Feb 7, 2021

Okay, I just looked over my code and I think there may be an issue if the outer iterator is not fused. Notably, the call to next() on the outer iterator now happens multiple times in the case that you have to traverse the back iterator. I am going to fix that right now.

@vadixidav
Copy link
Contributor Author

vadixidav commented Feb 7, 2021

I actually made the "fix" which fuses the outer iterator, and then realized it's technically still correct the way it is because if the outer iterator isn't fused, neither is FlattenOk, so this is actually fine. I was just confused 😅. I am going to leave this as is. Let me know if its good. I will save the commit which fuses the iterator in another branch on my fork, but its a bit ugly.

Branch: https://github.com/vadixidav/itertools/tree/fused_flatten_ok

@SkiFire13
Copy link
Contributor

Okay, I just looked over my code and I think there may be an issue if the outer iterator is not fused. Notably, the call to next() on the outer iterator now happens multiple times in the case that you have to traverse the back iterator. I am going to fix that right now.

I guess that's why the stdlib decided to always fuse the outer iterator too.

its a bit ugly

You could use the Fuse adapter instead of that bool flag

@vadixidav
Copy link
Contributor Author

@SkiFire13 That's alright, I think this is good to go if you think so. If you do want it to be fused to merge though let me know, but for now its only fused if the input iterator is fused.

@SkiFire13
Copy link
Contributor

SkiFire13 commented Feb 8, 2021

For merging you should ask @phimuemue or @jswrenn

Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there. I think there's really progress been made on this. Thank you for taking the time.

Unfortunately, I do not have enough capacity to review pull requests that ample right now.

However, one thing that might simplify review: As far as I can see, this somewhat resembles FlattenCompat - as suggested here (btw thanks @SkiFire13). Maybe we could document this somewhere and even adapt FlattenCompats structure (i.e. naming of fields, match vs if let) so that a reviewer can basically diff your implementation against std's FlattenCompat.

Comment on lines +151 to +157
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("FlattenOk")
.field("iter", &self.iter)
.field("inner_front", &self.inner_front)
.field("inner_back", &self.inner_back)
.finish()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could use debug_fmt_fields for consistency here.

@jswrenn jswrenn added this to the next milestone Apr 21, 2021
Copy link
Member

@jswrenn jswrenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! (And sorry for taking so long to merge it!)

bors r+

@bors
Copy link
Contributor

bors bot commented Apr 21, 2021

Build succeeded:

@bors bors bot merged commit 1255025 into rust-itertools:master Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flatten_ok
4 participants