Add `next_array` and `collect_array` #560

orlp · 2021-07-15T15:00:01Z

With this pull request I add two new functions to the Itertools trait:

fn next_array<T, const N: usize>(&mut self) -> Option<[T; N]>
where Self: Sized + Iterator<Item = T>;

fn collect_array<T, const N: usize>(mut self) -> Option<[T; N]>
where Self: Sized + Iterator<Item = T>;

These behave exactly like next_tuple and collect_tuple, however they return arrays instead. Since these functions require min_const_generics, I added a tiny build script that checks if Rust's version is 1.51 or higher, and if yes to set the has_min_const_generics config variable. This means that Itertools does not suddenly require 1.51 or higher, only these two functions do.

In order to facilitate this I did have to bump the minimum required Rust version to 1.34 from the (documented) 1.32, since Rust 1.32 and 1.33 have trouble parsing the file even if stuff is conditionally compiled. However, this should not result in any (new) breakage, because Itertools actually already requires Rust 1.34 for 9+ months, since 83c0f04 uses saturating_pow which wasn't stabilized until 1.34.

As for rationale, I think these functions are useful, especially for pattern matching and parsing. I don't think there's a high probability they get added to the standard library either, so that's why I directly make a pull request here. When/if TryFromIterator stabilizes we can simplify the implementation, but even then I believe these functions remain a good addition similarly how collect_vec is nice to have despite .collect::<Vec<_>> existing.

Note that this was already the case since 83c0f04 since it uses saturating_pow which was only stabilized in 1.34.

This also allows us to automatically detect support for min const generics.

orlp · 2021-07-15T16:02:54Z

A possible enhancement might be to return Option<A> where A: FromArray<Self::Item, N> instead, and adding the FromArray trait, something similar to this:

trait FromArray<T, const N: usize> {
    fn from_array(array: [T; N]) -> Self;
}

impl<T, const N: usize> FromArray<T, N> for [T; N] { /* .. */ }
impl<T, const N: usize> FromArray<Option<T>, N> for Option<[T; N]> { /* .. */ }
impl<T, E, const N: usize> FromArray<Result<T, E>, N> for Result<[T; N], E> { /* .. */ }

In fact, I think this is highly useful because it allows things like

let ints = line.split_whitespace().map(|n| n.parse());
if let Ok([x, y, z]) = ints.collect_array() {
    ...
}

This would be completely in line with FromIterator.

orlp · 2021-07-16T10:30:12Z

So I have a working implementation of the above idea here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9dba690b0dfc362971635e21647a4c19.

It makes this compile:

fn main() {
    let line = "32 -12 24";
    let nums = line.split_whitespace().map(|n| n.parse::<i32>());
    if let Some(Ok([x, y, z])) = nums.collect_array() {
        println!("x: {} y: {} z: {}", x, y, z);
    }
}

It would change the interface to:

trait ArrayCollectible<T>: Sized {
    fn array_from_iter<I: IntoIterator<Item = T>>(iterable: I) -> Option<Self>;
}

trait Itertools: Iterator {
    fn collect_array<A>(self) -> Option<A>
    where
        Self: Sized,
        A: ArrayCollectible<Self::Item>;
}

where

ArrayCollectible<T> is implemented for [T; N];
ArrayCollectible<Option<T>> is implemented for Option<[T; N]>;
ArrayCollectible<Result<T, E>> is implemented for Result<[T; N], E>.

phimuemue

Hi there! Thanks for this. I particularly like that you thought about a way of enabling const-generic stuff without raising the minimum required rust version (even if I would imagine something else due to having an aversion against depending on other crates too much).

There has been some discussion recently about basically supporting not only tuples, but also arrays. I just want to make sure that we do not loose input from these discussions when actually settling with your solution:

On top of that, I think there are some changes in there that are not directly related to this issue. If you'd like to have them merged, could you possibly factor them out into separate PRs/commits?

phimuemue · 2021-07-16T14:30:21Z

build.rs

+fn main() {
+    let is_nightly = version_check::is_feature_flaggable() == Some(true);
+    let is_at_least_1_34 = version_check::is_min_version("1.34.0").unwrap_or(false);
+    let is_at_least_1_51 = version_check::is_min_version("1.51.0").unwrap_or(false);
+
+    if !is_at_least_1_34 && !is_nightly {
+        println!("cargo:warning=itertools requires rustc => 1.34.0");
+    }
+
+    if is_at_least_1_51 || is_nightly {
+        println!("cargo:rustc-cfg=has_min_const_generics");
+    }
+}


Usually, I like the idea of having everything automated, but I am not sure if we should go with a build.rs and an additional dependency. My first idea was to use a feature flag (that would probably be off by default) that the user can enable if desired.

My first idea was to use a feature flag (that would probably be off by default) that the user can enable if desired.

I think feature flags to enable things that are already available in the latest stable Rust and have no further compile time or dependency drawbacks makes no sense.

You could end up with dozens of feature flags for minor features, or hold back progress because a feature that would otherwise be merged would now be too minor to accept due to introducing a new feature flag.

You end up with useless features that you can't remove without a breaking change as your MSRV goes up.

It's un-ergonomic as it adds an extra step for the user.

These drawbacks while it could be done completely and correctly automatically are unacceptable in my opinion. If you are hesitant regarding the version-check dependency, I'd just like to note that it's tiny, has no further downstream dependencies, and is already relied on by crates such as time, nom, rocket, fd-find among others.

@Philippe-Cholet, @phimuemue, perhaps it's time we updated our MSRV to 1.51 (which is two years old at this point).

While I don't mind the version-detection approach, I would like us to adopt it in tandem with changes to our CI that ensure we are testing on all detected versions. I'd also like to perhaps avoid taking the dependency on rust_version. This would all be a substantial change, and outside the scope of this PR.

My vote is that we increase our MSRV. We can aways decrease it in the future.

@jswrenn I sure don't mind increasing the MSRV but I would suggest we release the 0.13.0 first, and then increase the MSRV in 0.14.0 to not require the build script (in which case orlp will have enough time to work on this).

phimuemue · 2021-07-16T14:30:52Z

src/lib.rs

-//! This version of itertools requires Rust 1.32 or later.
+//! This version of itertools requires Rust 1.34 or later.


If your assessment is correct, we could possibly increment the minimum rust version in a separate commit?

Do you mean a pull request? It's already a separate commit.

phimuemue · 2021-07-16T14:32:29Z

src/lib.rs

-        match self.next_tuple() {
-            elt @ Some(_) => match self.next() {
-                Some(_) => None,
-                None => elt,
-            },
-            _ => None
-        }
+        self.next_tuple().filter(|_| self.next().is_none())


Is this really relevant to this PR? If not, could we separate it into another PR?

It was mostly to be consistent with the other implementation. As it's just a stylistic change I don't think it's worth a pull request by itself to be honest.

phimuemue · 2021-07-16T14:34:45Z

src/next_array.rs

@@ -0,0 +1,80 @@
+use core::mem::MaybeUninit;


I think there was some discussion about building arrays:

implement arrays and next_array #549 (comment)

Array combinations #546

Const generics iterator candidates #547

orlp · 2021-12-21T14:02:00Z

@phimuemue Any update on this?

phimuemue · 2021-12-29T20:48:58Z

@phimuemue Any update on this?

I appreciate your effort, but unfortunately nothing substantial from my side: I changed my mind regarding version-check (so we could use it as a dev-dependency), but I do not have enough time right now to review and merge PRs that ample.

orlp · 2021-12-30T01:33:13Z

@phimuemue Just for posterity's sake, version-check would be a build-dependency, not dev-dependency.

orlp · 2022-10-04T22:54:24Z

@phimuemue Just checking in what the status is, I feel very strongly about the usefulness of collect_array. I miss it very often in itertools.

scottmcm · 2022-10-04T23:00:42Z

Note that if you want collect_array, you can use https://lib.rs/crates/arrayvec, as the usual way to collect into an array.

I'll also mention Iterator::next_chunk (rust-lang/rust#98326) as a nightly API that'll be next_array.

Expurple · 2024-03-27T16:24:04Z

This is a very useful feature. Today there was a thread on Reddit where the author basically asks if there's a crate that provides collect_array(). IMO, itertools should be the crate to do it

Philippe-Cholet · 2024-03-27T18:54:56Z

@Expurple
I sure would like to do use const generics and collect_array is one of them.
Our MSRV is quite old (1.43.1 currently) while min-const-generics is 1.51 but I do not think it's the main blocker.
The fact is that there is not much available in recent stable Rust yet which is sad. Iterator::next_chunk and core::array::try_from_fn would be nice to have.
Plus, we currently don't really use unsafe ourselves (only in EitherOrBoth::insert* with obvious unfaillable patterns). I guess we prefer that the std does the heavy work.

phimuemue · 2024-03-27T20:55:14Z

I sometimes think about adding arrayvec as a dependency - and falling back to std as soon it's possible. I think it might also solve some other issues (e.g. ExactlyOneError having a manual two-element-arrayvec). Would require Rust 1.51.

Another option I just saw: Crates can offer "nightly-only experimental API" (see https://docs.rs/arrayvec/latest/arrayvec/struct.ArrayVec.html#method.first_chunk for an example) - maybe this would help some users.

I personally would lean towards arrayvec. @jswrenn @Philippe-Cholet Opinions?

Philippe-Cholet · 2024-03-28T07:36:55Z

@phimuemue

Another option I just saw: Crates can offer "nightly-only experimental API" (see https://docs.rs/arrayvec/latest/arrayvec/struct.ArrayVec.html#method.first_chunk for an example) - maybe this would help some users.

ArrayVec<T, CAP> implements Deref<Target = [T]> so (nightly-available) slice methods are directly accessible, that seems to be it.

I sometimes think about adding arrayvec as a dependency - and falling back to std as soon it's possible. I think it might also solve some other issues (e.g. ExactlyOneError having a manual two-element-arrayvec). Would require Rust 1.51.

I'm definitely not opposed to the idea but the ExactlyOneError use case is quite small.
I did not give thoughts before, do you have other examples in mind? (with private usage, in order to fall back to std ASAP).

EDIT: ArrayVec has a maximal capacity of u32::MAX, could it be an issue?

EDIT: Well I have some. With tail and k_smallest (and its variants), I had thoughts of extending them to const where I dreamt of unstable Iterator::next_chunk but I guess we could use arrayvec in the meantime.

(My idea would be that .k_smallest(50) could also support .k_smallest(Const/*::<50> if not inferred elsewhere*/) so that we don't multiply method names too much but merely add a new zero-sized type struct Const<const N: usize>; at places we only gave usize before. Then no allocation.
It's not a magic bullet for every usage though but I see a real usage for it, such as .combinations(Const): internal Vec buffer but would return arrays, so no repetitive slow allocations.)

@scottmcm Small discussion about temporarily adding arrayvec as dependency once we move to const-generics. I just saw a comment of yours related to this. Could you elaborate?

jswrenn · 2024-03-28T16:25:06Z

For collect_array, I think I'd prefer just taking the time myself to write the unsafe code. We can vendor the not-yet-stabilized helper functions from the standard library that we'll need.

I can allocate some time to this next week.

orlp · 2024-03-28T16:34:20Z

@jswrenn Please don't forget that we are discussing this on a PR that already has a working implementation without adding dependencies...

scottmcm · 2024-03-28T16:39:27Z

src/next_array.rs

+    fn drop(&mut self) {
+        unsafe {
+            // SAFETY: we only loop over the initialized portion.
+            for el in &mut self.arr[..self.i] {


Shouldn't need a loop here -- it's generally better to drop-in-place a whole slice rather than items individually.

scottmcm · 2024-03-28T16:45:20Z

src/next_array.rs

+            // SAFETY: the take(N) guarantees we never go out of bounds.
+            builder.push_unchecked(el);


I'm not sure this is sound -- there might be a way for me to override take (or one of the things it calls) in safe code such that this can return more than N things.

Maybe have it be something like

it.try_for_each(|x| builder.try_push(x));

with try_push returning an Option?

That's a nasty one, I think you're right.

There should still be a take in there though, when using try_for_each.

It's right on the edge of soundness. There's no easy demo that I can come up with -- if you try to override take you'll find that that doesn't actually work, for example, because you can't make something of the right type without unsafe.

So it's possible that it's actually sound today, but there's so many nuances to that argument that I think it's probably better to consider it unsound. For example, if Rust one day added a way to "call super" -- which seems like an entirely plausible feature -- then it'd immediately be obviously-unsound as someone could implement take as super.take(N+1).

There should still be a take in there though, when using try_for_each.

Oh, right, because otherwise you'll consume an extra element. Good catch.

Seconding @scottmcm's comment: For our MVP, does push_unchecked really need to be _unchecked?

jswrenn · 2024-03-28T17:09:43Z

@orlp, thanks, I had forgotten that this was a PR and not an issue when I made my reply. Still, we're talking about adding some extremely subtle unsafe code to Itertools. I'd like us to take extreme care to avoid accidentally introducing UB.

A PR adding unsafe to itertools should:

rigorously document the safety and panicking conditions of every unsafe function it introduces
prove that every invocation of an unsafe function (even invocations occurring within other unsafe functions) satisfies the safety precondition of that invocation, with citations to official Rust documentation
rigorously document why any potentially panicking function within an unsafe function does not create invalid state that would cause UB upon panicking unwinds
intensively test its API with miri

If you can update this PR to do those things, I can see a path forward to merging it.

jswrenn

Thanks for this PR! I like the ArrayBuilder abstraction quite a bit. As I mentioned, this will need additional documentation and testing before it can be merged. See the recent safety comments in my other project, zerocopy for a sense of the ~~paranoia~~ rigor I'd like these safety comments to take.

jswrenn · 2024-03-28T17:12:44Z

src/next_array.rs

+/// Helper struct to build up an array element by element.
+struct ArrayBuilder<T, const N: usize> {
+    arr: [MaybeUninit<T>; N],
+    i: usize


What is the safety invariant of i with relation to arr?

jswrenn · 2024-03-28T17:14:27Z

src/next_array.rs

+        Self { arr: maybe_uninit::uninit_array(), i: 0 }
+    }
+
+    pub unsafe fn push_unchecked(&mut self, x: T) {


Needs a safety comment, in the format of:

Suggested change

pub unsafe fn push_unchecked(&mut self, x: T) {

/// Does XYZ.

///

/// # Safety

///

/// Callers promises that blah blah blah.

///

/// # Panics

///

/// This method does (or does not) panic.

pub unsafe fn push_unchecked(&mut self, x: T) {

jswrenn · 2024-03-28T17:16:06Z

src/next_array.rs

+
+    pub unsafe fn push_unchecked(&mut self, x: T) {
+        debug_assert!(self.i < N);
+        *self.arr.get_unchecked_mut(self.i) = MaybeUninit::new(x);


Needs a safety comment in the form:

Suggested change

*self.arr.get_unchecked_mut(self.i) = MaybeUninit::new(x);

// SAFETY: By contract on the caller, the safety condition on `get_unchecked_mut` that BLAH BLAH BLAH is satisfied.

*self.arr.get_unchecked_mut(self.i) = MaybeUninit::new(x);

jswrenn · 2024-03-28T17:18:25Z

src/next_array.rs

+            unsafe {
+                // SAFETY: prevent double drop.
+                self.i = 0;
+                // SAFETY: [MaybeUninit<T>; N] and [T; N] have the same layout.


Could this safety comment cite the standard library documentation? While it's true that these two types have the same size and alignment, it's not true that they have the same bit validity.

jswrenn · 2024-03-28T17:18:48Z

src/next_array.rs

+
+    pub fn take(mut self) -> Option<[T; N]> {
+        if self.i == N {
+            unsafe {


Could you scope this unsafe { ... } block down to just the ptr::read?

jswrenn · 2024-03-28T17:19:25Z

src/next_array.rs

+                self.i = 0;
+                // SAFETY: [MaybeUninit<T>; N] and [T; N] have the same layout.
+                let init_arr_ptr = &self.arr as *const _ as *const [T; N];
+                Some(core::ptr::read(init_arr_ptr))


This needs a SAFETY comment citing what the preconditions of ptr::read are, and proving why they are satisfied.

jswrenn · 2024-03-28T17:22:20Z

src/next_array.rs

+        unsafe { MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() }
+    }
+
+    pub unsafe fn assume_init_drop<T>(u: &mut MaybeUninit<T>) {


You don't need to replicate the entire stdlib doc comment here, but could you document the safety preconditions of assume_init_drop?

jswrenn · 2024-03-28T17:25:34Z

src/next_array.rs

+            // SAFETY: the take(N) guarantees we never go out of bounds.
+            builder.push_unchecked(el);


Seconding @scottmcm's comment: For our MVP, does push_unchecked really need to be _unchecked?

jswrenn · 2024-03-28T17:32:54Z

build.rs

+fn main() {
+    let is_nightly = version_check::is_feature_flaggable() == Some(true);
+    let is_at_least_1_34 = version_check::is_min_version("1.34.0").unwrap_or(false);
+    let is_at_least_1_51 = version_check::is_min_version("1.51.0").unwrap_or(false);
+
+    if !is_at_least_1_34 && !is_nightly {
+        println!("cargo:warning=itertools requires rustc => 1.34.0");
+    }
+
+    if is_at_least_1_51 || is_nightly {
+        println!("cargo:rustc-cfg=has_min_const_generics");
+    }
+}


@Philippe-Cholet, @phimuemue, perhaps it's time we updated our MSRV to 1.51 (which is two years old at this point).

While I don't mind the version-detection approach, I would like us to adopt it in tandem with changes to our CI that ensure we are testing on all detected versions. I'd also like to perhaps avoid taking the dependency on rust_version. This would all be a substantial change, and outside the scope of this PR.

My vote is that we increase our MSRV. We can aways decrease it in the future.

orlp · 2024-03-28T17:35:05Z

@jswrenn I will be busy the upcoming week but I'm willing to bring this up to standards after that. If before then you could decide on whether or not to bump the MSRV to 1.51 I could include that in the rewrite.

orlp added 4 commits July 15, 2021 16:32

Bump minimum required Rust version to 1.34.

430fa49

Note that this was already the case since 83c0f04 since it uses saturating_pow which was only stabilized in 1.34.

Check for Rust version with build script.

4a54576

This also allows us to automatically detect support for min const generics.

Simplify next_tuple impl.

7e95955

Added next_array and collect_array.

30ef273

phimuemue reviewed Jul 16, 2021

View reviewed changes

phimuemue added the const-generics Require Rust 1.51 or newer label Aug 20, 2021

ejmount mentioned this pull request Oct 20, 2022

Adding _by, by_key, largest variants of k_smallest #654

Merged

scottmcm reviewed Mar 28, 2024

View reviewed changes

jswrenn requested changes Mar 28, 2024

View reviewed changes

Philippe-Cholet mentioned this pull request Apr 27, 2024

Add k_smallest_relaxed and variants #925

Open

		//! This version of itertools requires Rust 1.32 or later.
		//! This version of itertools requires Rust 1.34 or later.

		// SAFETY: the take(N) guarantees we never go out of bounds.
		builder.push_unchecked(el);

-    pub unsafe fn push_unchecked(&mut self, x: T) {
+    /// Does XYZ.
+    ///
+    /// # Safety
+    ///
+    /// Callers promises that blah blah blah.
+    ///
+    /// # Panics
+    ///
+    /// This method does (or does not) panic.
+    pub unsafe fn push_unchecked(&mut self, x: T) {

	*self.arr.get_unchecked_mut(self.i) = MaybeUninit::new(x);
	// SAFETY: By contract on the caller, the safety condition on `get_unchecked_mut` that BLAH BLAH BLAH is satisfied.
	*self.arr.get_unchecked_mut(self.i) = MaybeUninit::new(x);

Add next_array and collect_array #560

Are you sure you want to change the base?

Add next_array and collect_array #560

Conversation

orlp commented Jul 15, 2021

orlp commented Jul 15, 2021 • edited

orlp commented Jul 16, 2021 • edited

phimuemue left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orlp commented Dec 21, 2021

phimuemue commented Dec 29, 2021

orlp commented Dec 30, 2021

orlp commented Oct 4, 2022

scottmcm commented Oct 4, 2022

Expurple commented Mar 27, 2024

Philippe-Cholet commented Mar 27, 2024

phimuemue commented Mar 27, 2024

Philippe-Cholet commented Mar 28, 2024 • edited

jswrenn commented Mar 28, 2024

orlp commented Mar 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jswrenn commented Mar 28, 2024

jswrenn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jswrenn Mar 28, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orlp commented Mar 28, 2024

Add `next_array` and `collect_array` #560

Add `next_array` and `collect_array` #560

orlp commented Jul 15, 2021 •

edited

orlp commented Jul 16, 2021 •

edited

Philippe-Cholet commented Mar 28, 2024 •

edited

jswrenn Mar 28, 2024 •

edited