Optimisation and benchmarking of backing data structure for exponential histogram #3848

jamesmoessis · 2021-11-09T22:22:32Z

For the exponential histogram, we decided to start with the simplest possible implementation for the backing data structure, and then beat it from there. Some context here: #3724 (comment). This is the ticket to beat it.

This ticket is to track the work for the optimisation, testing, and benchmarking of these data structures. There are notable reference implementations such as NrSketch and DynaHist that the author may draw from.

To implement the backing data structure, the author should implement ExponentialCounter.

The text was updated successfully, but these errors were encountered:

jamesmoessis · 2021-12-13T00:47:04Z

benchmarking PR: #3986

jsuereth · 2021-12-13T13:21:28Z

A few things I found when benchmarking and investigating the initial prototype:

MapCounter uses both ConcurrentMap and AtomicLong. However, the current Handle puts both of these behind a strict lock/synchronize (as is done for explicit bucket histogram). You can probably use a more efficient IntMap implementation with no-concurrency in the current impl.
MapCounter uses a LOT of memory as you take samples. The original NrSketch algorithm that's outlined as the baseline for Exponential histogram makes a good balance of "usually byte is enough to count measurements" because you're spread over a lot of buckets. I've been running some tests myself on memory usage and performance and the current MapCounter seems to really miss the expected use case. I'll need some time to get these into PR form, but I recommend running some here.
Copying of data is ONLY done on the collection path or when re-scaling.
- The first, we can optimise later, but note it's usually done at a 1-minute interval, so the performance impact is a lot lower and usually out-of-band.
- For the second (re-scaling), we should probably look into one of two things: (1) preserve the last scale value per-stream or (2) Verify whether our assumptions on bucket-size + scale match common use cases (e.g. do we "fast fall" into a scale that accounts for most data, or do we see uniform scale-increase throughout a collection cycle).

jsuereth · 2021-12-13T22:40:33Z

Here's a quick comparison of using a Circular-Buffer (with preallocated array of integers) vs. MapCounter: main...jsuereth:wip-exponential-counter-perf

It's a 2x performance differential. I think (in practice) the # of measurements per-histogram likely means we'll be better off pre-allocating a bounded array for buckets vs. using a Map in almost all cases. Going to add in benchmarks for the byte->short->int->long conversions after I have a chance to push some better shaped input data. (Ideally we'd use incoming measurements froma live server we've recorded somewhere, but for now just going to synthesize similarly shaped data with assumed distributions from what I see in histograms)

jamesmoessis · 2021-12-14T00:26:08Z

@jsuereth

Yeah, the MapCounter has unnecessary safeguards for thread safety. Since it's purely behind Handle some fo those can be taken away. Anyway, I doubt the most efficient solution uses a map anyway. As we can see from your bechmarks, an array seems more efficient. The map was just a baseline initial implementation that we knew would be inefficient.

I initially had used the variable sized counter + circular array that nrsketch had in the aggregator, but took it out due to review and to reduce the scope of the initial aggregation PR. I do have some doubts of whether the extra code to convert byte->short->int->long is worth it though. It's a CPU/memory trade off I guess.

jsuereth · 2021-12-14T23:31:55Z

320*8*2 = 5120 byes per histogram metric stream vs. 320*1*2 = 640 bytes. It's a big memory differential for histograms....

But yeah I sw the comments and understand the current state.

jamesmoessis · 2021-12-15T01:23:09Z

True, worst case the memory difference is quite high. It's even doubly worse than that since there's 320*postivebuckets and 320*negativebuckets. I imagine most user won't be using the maximum 320 buckets so the memory difference won't ordinarily be the worst case as you stated. But I do tend to agree that saving memory here is probably worth the CPU time to convert between types.

jack-berg · 2022-12-22T19:31:46Z

I've done a fair amount of work on this recently, resulting in additional benchmarks (#4989 and #4912) and a variety of enhancements (#4700, #5023, #5020, #5047).

I'm going to close this. If there are additional specific enhancements to be made, we can open new issues to discuss and track.

jamesmoessis added the Feature Request Suggest an idea for this project label Nov 9, 2021

jsuereth added the metrics label Nov 11, 2021

jack-berg closed this as completed Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimisation and benchmarking of backing data structure for exponential histogram #3848

Optimisation and benchmarking of backing data structure for exponential histogram #3848

jamesmoessis commented Nov 9, 2021

jamesmoessis commented Dec 13, 2021

jsuereth commented Dec 13, 2021

jsuereth commented Dec 13, 2021

jamesmoessis commented Dec 14, 2021

jsuereth commented Dec 14, 2021 •

edited

jamesmoessis commented Dec 15, 2021 •

edited

jack-berg commented Dec 22, 2022

Optimisation and benchmarking of backing data structure for exponential histogram #3848

Optimisation and benchmarking of backing data structure for exponential histogram #3848

Comments

jamesmoessis commented Nov 9, 2021

jamesmoessis commented Dec 13, 2021

jsuereth commented Dec 13, 2021

jsuereth commented Dec 13, 2021

jamesmoessis commented Dec 14, 2021

jsuereth commented Dec 14, 2021 • edited

jamesmoessis commented Dec 15, 2021 • edited

jack-berg commented Dec 22, 2022

jsuereth commented Dec 14, 2021 •

edited

jamesmoessis commented Dec 15, 2021 •

edited