Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Itertools::dedup_with_count() and Itertools::dedup_by_with_count() #423

Merged
merged 2 commits into from May 18, 2020

Conversation

orium
Copy link
Contributor

@orium orium commented Mar 9, 2020

Fixes #393.

@orium
Copy link
Contributor Author

orium commented Mar 17, 2020

@jswrenn Can you take a look when you have some time?

@jswrenn
Copy link
Member

jswrenn commented May 12, 2020

Sorry for the delay and for the contribution! My only nit is the name—it suggests that count will be the second component of the tuple. Perhaps count_dedup and count_dedup_by?

(This is yet another case where having rust-lang/rfcs#2584 would be helpful!)

@phimuemue
Copy link
Member

phimuemue commented May 12, 2020

Hi there! Should we try to implement this in terms of (a generalized) Dedup/DedupBy?

As far as I can see, it (naturally) shares quite a bit of implementation with Dedup, and we could possibly express these similarities in code. (This way, all the specializations for Dedup (such as fold) would automatically be inherited by DedupWithCount. In addition, it would unify the implementations. (E.g. as it stands now, does Dedup uses CoalesceIter internally, while DedupWithCount goes with Peekable - could this lead to subtle differences in behavior?))

If we actually want to implement it in terms of Dedup, one way I could imagine is to equip DedupBy with an additional customization point (read: type parameter) that can be used to do a side-computation in next_with resp. fold. In the plain old Dedup/DedupBy-case, these side-computations could just be no-ops, while for DedupWithCount/DedupByWithCount it could do the counting.

@orium
Copy link
Contributor Author

orium commented May 16, 2020

@jswrenn

My only nit is the name—it suggests that count will be the second component of the tuple. Perhaps count_dedup and count_dedup_by?

I think it makes sense to start the function name with the fundamental thing that is going on, which is deduplication. The count is the extra metadata. I also like how an alphabetical order of the names of the functions naturally group these functions together (and also that autocompletion on dedup will list all these).

@phimuemue

Yeah, it makes sense to try to re-use code. I've re-wrote it in terms of CoalesceCore, which I generalize slightly. If I try to change DedupBy to accommodate a dedup with count it will either look like CoalesceCore or it feels a bit forced. Also, the fold specialization for DedupBy doesn't fit very well with a dedup with count.

@phimuemue
Copy link
Member

Kudos for going with CoalesceCore. Imho it looks better.

Imho, we should still strive to unify the implementations further, but your observation

If I try to change DedupBy to accommodate a dedup with count it will either look like CoalesceCore or it feels a bit forced. Also, the fold specialization for DedupBy doesn't fit very well with a dedup with count.

made me think why we even have CoalesceCore as an extra struct in the first place. I think we could even try to unify the implementations of Coalesce and Dedup(By), simplifying the implementations quite a bit. This would essentially mean that Dedup(By) is just a special case of Coalesce (which - intuitively - seems reasonable to me), and (I think) Dedup(By)WithCount could then be specializations, too.

Doing so, however, may be a bit too much for this ticket.

@orium
Copy link
Contributor Author

orium commented May 17, 2020

Yes, I think that's perfectly possible. I can look into it, but, as you suggest, I prefer to do that in a later PR.

@phimuemue
Copy link
Member

Yes, I think that's perfectly possible. I can look into it, but, as you suggest, I prefer to do that in a later PR.

I sketched an implementation for this in #440. As stated there, I will rebase it if this PR is accepted.

@orium orium requested a review from jswrenn May 18, 2020 23:24
@jswrenn
Copy link
Member

jswrenn commented May 18, 2020

Looks good to me! Thanks!

bors r+

@jswrenn jswrenn added this to the next milestone May 18, 2020
@bors
Copy link
Contributor

bors bot commented May 18, 2020

Build succeeded:

@bors bors bot merged commit 2d22ff9 into rust-itertools:master May 18, 2020
bors bot added a commit that referenced this pull request Jun 18, 2020
440: Implement DedupBy in terms of Coalesce r=jswrenn a=phimuemue

During the discussion of #423 (comment), we found that `DedupBy`/`DedupByWithCount` could actually be implemented in terms of `Coalesce`. This simplifies the implementation quite a bit and lets derived adaptors inherit specializations.

* Implement `fold` on `Coalesce` so that all the adaptors derived from it inherit the specialization (currently, only `DedupBy` has this specialization).
* Introduce `CoalescePredicate` (similar to `DedupPredicate`) to allow for different customizations of `Coalesce`.
* Introduce parametrizable `CoalesceBy`: Base for `Coalesce`, `DedupBy`, `DedupByWithCount`.
* Implement `DedupBy`/`DedupByWithCount` in terms of `CoalesceBy` (using particular `impl`s).
* At last, the indirection through `CoalesceCore` is not needed anymore.

Co-authored-by: philipp <descpl@yahoo.de>
bors bot added a commit that referenced this pull request Jul 17, 2020
450: Introduce module coalesce r=jswrenn a=phimuemue

Hi! As #440 (thanks for reviewing and merging, btw) presumably conflicts with any existing PRs touching `coalesce` et al, I thought we could just as well seize the opportunity and move these types into an own module (so to speak as a starting point for the problem mentioned in #423 (comment)).

I cut-pasted `Coalesce` et al into an own module, rearranged the code so that specialized types follow their generic variants, and ran `cargo fmt` on the new module.

I suggest to import the `coalesce` types into `adaptors` similarly to what we already have for `multi_product`.

Co-authored-by: philipp <descpl@yahoo.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] dedup()/dedup_by() with count
3 participants