Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add {min,max}_set(_by{_key)?)? functions #323

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
42 changes: 42 additions & 0 deletions src/extrema_set.rs
@@ -0,0 +1,42 @@
/// Implementation guts for `min_set`, `min_set_by`, and `min_set_by_key`.
pub fn min_set_impl<I, K, F, L>(mut it: I,
mut key_for: F,
mut lt: L) -> Option<Vec<I::Item>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about returning an empty Vec instead of None? I think it would make things a bit more clear, especially since the types do not indicate that a Some always contains a non-empty vector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could/should be a bit more generic than Vec here? I.e. should we try to be like collect (which returns something FromIterator<Self::Item> and lets the caller choose)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that returning an Option is a good choice. In application code that I write, when using max or min functionality of sequences, an empty sequence is often an indication that something has gone awry and should be forced to be handled in another way. I imagine that in most cases, the Option-ness can be handled by the ?-operator. Also, it matches the signature for max/min from the standard library, which returns an Option of an Item. I can naturally change it, but I do think that Option makes sense here.

As for why Vec and not some kind of iterator. The number of results is unknown before computing the result, and the result is unknown before going through the whole input iterator, which implies that some kind of temporary storage area is needed. Since the data is already in a Vec, I thought that the least convoluted thing was to just give the user back that. We can of course wrap the already constructed Vec in something that exposes it like a FromIterator, I have no real opinion here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the Vec: I naively thought about substituting the whole Vec even within the implementation (basically allowing to customize the kind of temporary storage), so that no conversion would be required upon returning it.

However, I see that FromIterator alone is possibly a bit too narrow to acchieve that. Maybe any container satisfying Default + Extend or FromIterator + Extend could be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the code by neccesity (unless double iteration is used) provisionally accumulates an unbounded number of result items, and discards these when a new provisionally best results is found, something more than FromIterator + Extend would be needed AFAIK.

My guess would be that anything that also has the possibility to clear the results is no better than the interal Vec, and would also be less clear. I'd be glad to learn I'm wrong though!

where I: Iterator,
F: FnMut(&I::Item) -> K,
L: FnMut(&I::Item, &I::Item, &K, &K) -> bool,
{
let (mut result, mut current_key) = match it.next() {
None => return None,
Some(element) => {
let key = key_for(&element);
(vec![element], key)
}
};

for element in it {
let key = key_for(&element);
if lt(&element, &result[0], &key, &current_key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to replace lt (returning bool) by cmp (returning Ordering) so that we could avoid the second call to lt in line 23?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work nicely.

As it is, I essentially copied the structure from the minimax set of functions, where I see now that multiple comparisons are also made.

I'm happy to make the change, but will probably not have the time to do so in the next couple of weeks (currently on vacation).

Copy link
Member

@phimuemue phimuemue Jul 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. That didn't occur to me when I looked at minmax last time. I could, however, imagine that things are a bit more complicated there because it needs to look for minimum and maximum simultaneously.

Maybe I should go back and have a look at minmax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at minmax and think that it needs to do more stuff since it searches for both the minimum and the maximum. Here, I think we can use less calls to compare elements.

result.clear();
phimuemue marked this conversation as resolved.
Show resolved Hide resolved
result.push(element);
current_key = key;
} else if !lt(&result[0], &element, &current_key, &key) {
result.push(element);
}
}

Some(result)
}

/// Implementation guts for `ax_set`, `max_set_by`, and `max_set_by_key`.
pub fn max_set_impl<I, K, F, L>(it: I,
phimuemue marked this conversation as resolved.
Show resolved Hide resolved
key_for: F,
mut lt: L) -> Option<Vec<I::Item>>
where I: Iterator,
F: FnMut(&I::Item) -> K,
L: FnMut(&I::Item, &I::Item, &K, &K) -> bool,
{
min_set_impl(it, key_for, |it1, it2, key1, key2| lt(it2, it1, key2, key1))
}


190 changes: 190 additions & 0 deletions src/lib.rs
Expand Up @@ -168,6 +168,8 @@ mod combinations;
mod combinations_with_replacement;
mod exactly_one_err;
mod diff;
#[cfg(feature = "use_std")]
mod extrema_set;
mod format;
#[cfg(feature = "use_std")]
mod group_map;
Expand Down Expand Up @@ -2142,6 +2144,194 @@ pub trait Itertools : Iterator {
group_map::into_group_map(self)
}

/// Return all minimum elements of an iterator.
///
/// # Examples
///
/// ```
/// use itertools::Itertools;
///
/// let a: [i32; 0] = [];
/// assert_eq!(a.iter().min_set(), None);
///
/// let a = [1];
/// assert_eq!(a.iter().min_set(), Some(vec![&1]));
///
/// let a = [1, 2, 3, 4, 5];
/// assert_eq!(a.iter().min_set(), Some(vec![&1]));
///
/// let a = [1, 1, 1, 1];
/// assert_eq!(a.iter().min_set(), Some(vec![&1, &1, &1, &1]));
/// ```
///
/// The elements can be floats but no particular result is guaranteed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I often run into situations where I like to simply compare floats. But most of the time, I am happy that Rust tells me that f64 is not Ord.

Iterator::min also requires Ord. Do you think we should stick to that requirement and require the caller to resolve the float issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, floats not being Ord is annoying, but also great.

In this case, I again copied the style and wording from minmax.

/// if an element is NaN.
#[cfg(feature = "use_std")]
fn min_set(self) -> Option<Vec<Self::Item>>
where Self: Sized, Self::Item: PartialOrd
{
extrema_set::min_set_impl(self, |_| (), |x, y, _, _| x < y)
}

/// Return all minimum elements of an iterator, as determined by
/// the specified function.
///
/// # Examples
///
/// ```
/// # use std::cmp::Ordering;
/// use itertools::Itertools;
///
/// let a: [(i32, i32); 0] = [];
/// assert_eq!(a.iter().min_set_by(|_, _| Ordering::Equal), None);
///
/// let a = [(1, 2)];
/// assert_eq!(a.iter().min_set_by(|&&(k1,_), &&(k2, _)| k1.cmp(&k2)), Some(vec![&(1, 2)]));
///
/// let a = [(1, 2), (2, 2), (3, 9), (4, 8), (5, 9)];
/// assert_eq!(a.iter().min_set_by(|&&(_,k1), &&(_,k2)| k1.cmp(&k2)), Some(vec![&(1, 2), &(2, 2)]));
///
/// let a = [(1, 2), (1, 3), (1, 4), (1, 5)];
/// assert_eq!(a.iter().min_set_by(|&&(k1,_), &&(k2, _)| k1.cmp(&k2)), Some(vec![&(1, 2), &(1, 3), &(1, 4), &(1, 5)]));
/// ```
///
/// The elements can be floats but no particular result is guaranteed
/// if an element is NaN.
#[cfg(feature = "use_std")]
fn min_set_by<F>(self, mut compare: F) -> Option<Vec<Self::Item>>
where Self: Sized, F: FnMut(&Self::Item, &Self::Item) -> Ordering
{
extrema_set::min_set_impl(
self,
|_| (),
|x, y, _, _| Ordering::Less == compare(x, y)
)
}

/// Return all minimum elements of an iterator, as determined by
/// the specified function.
///
/// # Examples
///
/// ```
/// use itertools::Itertools;
///
/// let a: [(i32, i32); 0] = [];
/// assert_eq!(a.iter().min_set_by_key(|_| ()), None);
///
/// let a = [(1, 2)];
/// assert_eq!(a.iter().min_set_by_key(|&&(k,_)| k), Some(vec![&(1, 2)]));
///
/// let a = [(1, 2), (2, 2), (3, 9), (4, 8), (5, 9)];
/// assert_eq!(a.iter().min_set_by_key(|&&(_, k)| k), Some(vec![&(1, 2), &(2, 2)]));
///
/// let a = [(1, 2), (1, 3), (1, 4), (1, 5)];
/// assert_eq!(a.iter().min_set_by_key(|&&(k, _)| k), Some(vec![&(1, 2), &(1, 3), &(1, 4), &(1, 5)]));
/// ```
///
/// The elements can be floats but no particular result is guaranteed
/// if an element is NaN.
#[cfg(feature = "use_std")]
fn min_set_by_key<K, F>(self, key: F) -> Option<Vec<Self::Item>>
where Self: Sized, K: PartialOrd, F: FnMut(&Self::Item) -> K
{
extrema_set::min_set_impl(self, key, |_, _, kx, ky| kx < ky)
}

/// Return all maximum elements of an iterator.
///
/// # Examples
///
/// ```
/// use itertools::Itertools;
///
/// let a: [i32; 0] = [];
/// assert_eq!(a.iter().max_set(), None);
///
/// let a = [1];
/// assert_eq!(a.iter().max_set(), Some(vec![&1]));
///
/// let a = [1, 2, 3, 4, 5];
/// assert_eq!(a.iter().max_set(), Some(vec![&5]));
///
/// let a = [1, 1, 1, 1];
/// assert_eq!(a.iter().max_set(), Some(vec![&1, &1, &1, &1]));
/// ```
///
/// The elements can be floats but no particular result is guaranteed
/// if an element is NaN.
#[cfg(feature = "use_std")]
fn max_set(self) -> Option<Vec<Self::Item>>
where Self: Sized, Self::Item: PartialOrd
{
extrema_set::max_set_impl(self, |_| (), |x, y, _, _| x < y)
}

/// Return all maximum elements of an iterator, as determined by
/// the specified function.
///
/// # Examples
///
/// ```
/// # use std::cmp::Ordering;
/// use itertools::Itertools;
///
/// let a: [(i32, i32); 0] = [];
/// assert_eq!(a.iter().max_set_by(|_, _| Ordering::Equal), None);
///
/// let a = [(1, 2)];
/// assert_eq!(a.iter().max_set_by(|&&(k1,_), &&(k2, _)| k1.cmp(&k2)), Some(vec![&(1, 2)]));
///
/// let a = [(1, 2), (2, 2), (3, 9), (4, 8), (5, 9)];
/// assert_eq!(a.iter().max_set_by(|&&(_,k1), &&(_,k2)| k1.cmp(&k2)), Some(vec![&(3, 9), &(5, 9)]));
///
/// let a = [(1, 2), (1, 3), (1, 4), (1, 5)];
/// assert_eq!(a.iter().max_set_by(|&&(k1,_), &&(k2, _)| k1.cmp(&k2)), Some(vec![&(1, 2), &(1, 3), &(1, 4), &(1, 5)]));
/// ```
///
/// The elements can be floats but no particular result is guaranteed
/// if an element is NaN.
#[cfg(feature = "use_std")]
fn max_set_by<F>(self, mut compare: F) -> Option<Vec<Self::Item>>
where Self: Sized, F: FnMut(&Self::Item, &Self::Item) -> Ordering
{
extrema_set::max_set_impl(
self,
|_| (),
|x, y, _, _| Ordering::Less == compare(x, y)
)
}

/// Return all minimum elements of an iterator, as determined by
/// the specified function.
///
/// # Examples
///
/// ```
/// use itertools::Itertools;
///
/// let a: [(i32, i32); 0] = [];
/// assert_eq!(a.iter().max_set_by_key(|_| ()), None);
///
/// let a = [(1, 2)];
/// assert_eq!(a.iter().max_set_by_key(|&&(k,_)| k), Some(vec![&(1, 2)]));
///
/// let a = [(1, 2), (2, 2), (3, 9), (4, 8), (5, 9)];
/// assert_eq!(a.iter().max_set_by_key(|&&(_, k)| k), Some(vec![&(3, 9), &(5, 9)]));
///
/// let a = [(1, 2), (1, 3), (1, 4), (1, 5)];
/// assert_eq!(a.iter().max_set_by_key(|&&(k, _)| k), Some(vec![&(1, 2), &(1, 3), &(1, 4), &(1, 5)]));
/// ```
///
/// The elements can be floats but no particular result is guaranteed
/// if an element is NaN.
#[cfg(feature = "use_std")]
fn max_set_by_key<K, F>(self, key: F) -> Option<Vec<Self::Item>>
where Self: Sized, K: PartialOrd, F: FnMut(&Self::Item) -> K
{
extrema_set::max_set_impl(self, key, |_, _, kx, ky| kx < ky)
}

/// Return the minimum and maximum elements in the iterator.
///
/// The return type `MinMaxResult` is an enum of three variants:
Expand Down
48 changes: 48 additions & 0 deletions tests/test_std.rs
Expand Up @@ -670,6 +670,54 @@ fn diff_shorter() {
});
}

#[test]
fn extrema_set() {
use std::cmp::Ordering;

// A peculiar type: Equality compares both tuple items, but ordering only the
// first item. Used to distinguish equal elements.
#[derive(Clone, Debug, PartialEq, Eq)]
struct Val(u32, u32);

impl PartialOrd<Val> for Val {
fn partial_cmp(&self, other: &Val) -> Option<Ordering> {
self.0.partial_cmp(&other.0)
}
}

impl Ord for Val {
fn cmp(&self, other: &Val) -> Ordering {
self.0.cmp(&other.0)
}
}

assert_eq!(None::<Option<u32>>.iter().min_set(), None);
assert_eq!(None::<Option<u32>>.iter().max_set(), None);

assert_eq!(Some(1u32).iter().min_set(), Some(vec![&1]));
assert_eq!(Some(1u32).iter().max_set(), Some(vec![&1]));

let data = vec![Val(0, 1), Val(2, 0), Val(0, 2), Val(1, 0), Val(2, 1)];

let min_set = data.iter().min_set().unwrap();
assert_eq!(min_set, vec![&Val(0, 1), &Val(0, 2)]);

let min_set_by_key = data.iter().min_set_by_key(|v| v.1).unwrap();
assert_eq!(min_set_by_key, vec![&Val(2, 0), &Val(1, 0)]);

let min_set_by = data.iter().min_set_by(|x, y| x.1.cmp(&y.1)).unwrap();
assert_eq!(min_set_by, vec![&Val(2, 0), &Val(1, 0)]);

let max_set = data.iter().max_set().unwrap();
assert_eq!(max_set, vec![&Val(2, 0), &Val(2, 1)]);

let max_set_by_key = data.iter().max_set_by_key(|v| v.1).unwrap();
assert_eq!(max_set_by_key, vec![&Val(0, 2)]);

let max_set_by = data.iter().max_set_by(|x, y| x.1.cmp(&y.1)).unwrap();
assert_eq!(max_set_by, vec![&Val(0, 2)]);
}

#[test]
fn minmax() {
use std::cmp::Ordering;
Expand Down