Implement `DuplicatesBy::fold` #921

kinto-b · 2024-04-18T04:35:07Z

Relates to #755. I didn't grab the output earlier (is there a way to print without re-benchmarking?) but this makes no difference to the performance. Submitting a PR for record keeping sake

{
  "mean": {
    "confidence_interval": {
      "confidence_level": 0.95,
      "lower_bound": -0.0081767782683812,
      "upper_bound": 0.0038785180687818605
    },
    "point_estimate": -0.00312222985963162,
    "standard_error": 0.003180882676320771
  },
  "median": {
    "confidence_interval": {
      "confidence_level": 0.95,
      "lower_bound": -0.009367663685134739,
      "upper_bound": -0.003824443197890548
    },
    "point_estimate": -0.006249808601540896,
    "standard_error": 0.0014046028043885176
  }
}

codecov · 2024-04-18T04:37:00Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.44%. Comparing base (6814180) to head (73d78c5).
Report is 54 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #921      +/-   ##
==========================================
+ Coverage   94.38%   94.44%   +0.06%     
==========================================
  Files          48       48              
  Lines        6665     6882     +217     
==========================================
+ Hits         6291     6500     +209     
- Misses        374      382       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Philippe-Cholet · 2024-04-18T12:57:52Z

Same for #918 and #919, it uses a HashMap and therefore heap allocates and (securely but slowly) hash things, those two things are the bottleneck. There is not much to expect of such specialization without improving that front:

The heap allocation is unavoidable and I don't think it can be optimized (know the iterator size would not even help).
On the hash front, I've been working in More generic maps (in GroupingMap) #901 & More generic maps in GroupingMap (v2) #906 (on hold right now) in order for our users to be able to use other (faster) maps than HashMap<_, _, RandomState>.

With a faster hasher, maybe that specialize fold would a bit more beneficial for the benchmark to show some improvement. We might want to revisit those PRs once I make a more general DuplicatesByIn.

I referenced your PRs in #755 (comment) to avoid people from making duplicates PRs.

There would also be rfold to consider alongside fold.

Side note:
I'm aware specialization benchmarks are slow to build (2 minutes for me), and 8 seconds per benchmark to run (which is ok).
I usually comment out (or simply remove) benchmarks from the bench_specializations macro declaration (without committing changes) to have fast benchmarks. With that, I don't mind re-run benchmarks.
It's a pain but it's a huge file with a lot of iterators and methods to bench, there is no simple fix to avoid building benchmarks we are not gonna use.

Implement DuplicatesBy::fold

73d78c5

Philippe-Cholet mentioned this pull request Apr 18, 2024

Audit for missing fold implementations. #755

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `DuplicatesBy::fold` #921

Implement `DuplicatesBy::fold` #921

kinto-b commented Apr 18, 2024

codecov bot commented Apr 18, 2024

Philippe-Cholet commented Apr 18, 2024 •

edited

Implement DuplicatesBy::fold #921

Are you sure you want to change the base?

Implement DuplicatesBy::fold #921

Conversation

kinto-b commented Apr 18, 2024

codecov bot commented Apr 18, 2024

Codecov Report

Philippe-Cholet commented Apr 18, 2024 • edited

Implement `DuplicatesBy::fold` #921

Implement `DuplicatesBy::fold` #921

Philippe-Cholet commented Apr 18, 2024 •

edited