Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New function: Count number of occurences of each item in an iterator #467

Closed
JarredAllen opened this issue Aug 10, 2020 · 8 comments
Closed

Comments

@JarredAllen
Copy link
Contributor

It'd be nice if there could be a function in the Itertools trait which counts the number of times each element appears in the iterator and returns a HashMap of that information (maybe named count_each? I'm not too sure on the name).

I recently had to do this in code that I was writing and was surprised that I couldn't find this functionality in here. I could add this myself pretty easily, if this is wanted (not sure how that process works as I'm new to this project), as I've just written it in my own code.

@phimuemue
Copy link
Member

If we decide to merge #465, this would be solved by into_grouping_map(...).count().

@JarredAllen
Copy link
Contributor Author

I fail to see how into_grouping_map is applicable here. I want something like the following behavior:

let values = [0, 0, 0, 1, 1, 2];
let counts = values.into_iter().count_each();
assert_eq!(3, counts[0]);
assert_eq!(2, counts[1]);
assert_eq!(1, counts[2]);

The into_grouping_map function can't be called here, because the items of the iterator aren't (K, V) tuples.

@phimuemue
Copy link
Member

You may have to use into_grouping_map_by (or something similar)... In any case, I think we should not both introduce a count_each and have the into_grouping_map infrastructure - but instead rely on and re-use one of the alternatives.

@JarredAllen
Copy link
Contributor Author

I'd say that your inability to figure out how to use the GroupingMap to implement this functionality is evidence that a method should exist to just do this in the Itertools trait. If you, a contributor to this project, can't figure out how to correctly implement the solution (or even which method to call), then I fail to see how regular users of the API could be expected to figure it out for themselves.

It would be easy enough to write this count_each function to be a thin wrapper around the functionality implemented in the GroupingMap, so that it reuses the other one and there's no code duplication.

@phimuemue
Copy link
Member

phimuemue commented Aug 10, 2020

That's an interesting take. Actually, when I saw the first proposal of into_grouping_map, I suggested to flatten the method call and go with something closer to your idea. So, just to make that clear: I do not want to advocate one alternative over the other, but I really believe that having both variants in there does not really provide any value.

Still, we should possibly investigate whether we can make into_grouping_map more convenient to use for the (possibly prevalent) simple case where the iterator element is used as the key.

Would you agree that - if hypothetically possible - values.into_iter().count_each() would be similar to values.into_iter().into_grouping_map().count() in terms of user complexity? Would you consider it worthwhile to have both in Itertools?

@JarredAllen
Copy link
Contributor Author

That other idea would be similar in complexity, so long as the documentation makes it clear that this method can be used (I'm not sure if I'd personally be able to figure out from the documentation in the PR that those methods are what I want to use to count distinct things in an iterator, if it had been already merged and I was staring at the documentation).

Although, I'm not sure that approach would be any better for people's uses. I can't think of any use cases where you'd want into_grouping_map_by(|x| x) (or whatever name we end up giving it) except for immediately calling count on it. Assuming the item type implements Eq in such a way that distinct items compare as unequal, anything you pass into the accumulate function would be equivalent to calling count and then mapping the resulting counts in some way (and I can't think of such a way to map it that makes sense).

@jswrenn
Copy link
Member

jswrenn commented Aug 10, 2020

Yes, I'd very much like to see a method like this in itertools! Something like this would do the trick:

pub trait Itertools: Iterator {
    ...

    fn foo(self) -> HashMap<Self::Item, usize>
    where
        Self: Sized,
        Self::Item: Eq + Hash,
    {
        let mut counts = HashMap::new();
        self.for_each(|elt| *counts.entry(elt).or_insert(0) += 1);
        counts
    }

    ...
}

I don't know what the name ought to be; perhaps:

  • count(s)
  • unique_counts
  • frequency
  • item_frequency
  • item_count(s)

I think counts might be my favorite. It's brief and I can't think of anything else a method named "counts" would reasonably do.

@SkiFire13
Copy link
Contributor

Author of #465 here. I agree with you that a separate function would be better, IMO because:

  • Using my PR it would translate to .map(|v| (v, ())).into_grouping_map().count() or .into_grouping_map_by(|&v| v).count() (shorter version for Copy types), which is more verbose

  • Your solution is a bit more efficient (for now) because my API does 2 lookups in the HashMap per item while your only does 1. Mine also has an additional branch. Note that this could be changed (see this comment).

If your PR lands I'm in favour of removing count from mine.

bors bot added a commit that referenced this issue Aug 20, 2020
468: Create counts method and tests for it r=jswrenn a=JarredAllen

Creates the function described in #467 (I went with the name `counts`).

It currently uses `for_each` and makes a `HashMap` manually because the PR for the `into_grouping_map` function is still being worked on, but this function could easily be changed to call it once that PR is merged, if desired.

Co-authored-by: JarredAllen <jarredallen73@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants