Stabilize exponential histograms #2832

jack-berg · 2022-09-23T22:08:42Z

Exponential histograms have implementations in Java, Dotnet, Go, and Python.

There's still the idea of the zero threshold, which needs to be defined in the proto and likely needs to be configurable in the SDK in some fashion. However, this can be added in a backwards compatible way.

If there are no other issues, we should stabilize the exponential histogram data model and aggregation.

reyang · 2022-09-30T21:25:31Z

+1 we should push this forward.

There are some potential improvements / issues that we can discuss in the PR.
https://cloud-native.slack.com/archives/C01NP3BV26R/p1663690523370849

jack-berg · 2022-12-14T21:07:18Z

I've been analyzing the exponential histogram java implementation to determine if the design has any inherent performance issues. Some takeaways:

Computing exponential bucket indexes is not materially different than explicit bucket histogram. The log function needed at scales > 10 is slower than bit shifting strategies, but only the most demanding applications will take issue with a single log call being on the hot path. We can add a MaxScale to accommodate these users, but don't necessarily need it initially.
Exponential histogram memory allocation is not materially different than explicit bucket histograms. This makes sense given that the circular array data structure used to track exponential bucket counts is not too different than the array used to track explicit bucket counts. The java implementation was doing some naive things that caused excess allocation which were addressed here and here.

Here are a some benchmarks to support these takeaways:

Computing bucket index

HistogramBenchmark (source): characterizes bucket indexing times while trying to control against things unlikely to occur recording each measurement. The exponential histograms in this example use a max scale of 0 to control against different scales impacting results. Summary of most relevant results:

Benchmark                                                                       (aggregation)                (valueGen)  Mode  Cnt      Score      Error   Units
HistogramBenchmark.aggregate_5Threads                                 EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10  38317.681 ±  333.509   ns/op
HistogramBenchmark.aggregate_5Threads                                  EXPLICIT_SINGLE_BUCKET          GAUSSIAN_LATENCY  avgt   10  27149.226 ±  412.018   ns/op
HistogramBenchmark.aggregate_5Threads                       EXPONENTIAL_SMALL_CIRCULAR_BUFFER          GAUSSIAN_LATENCY  avgt   10  27005.325 ± 1958.334   ns/op
HistogramBenchmark.aggregate_5Threads                             EXPONENTIAL_CIRCULAR_BUFFER          GAUSSIAN_LATENCY  avgt   10  26878.814 ±  306.520   ns/op

ExponentialHistogramIndexerBenchmark (source): characterizes the difference in time to compute exponential bucket indexes when scale is positive, negative, or zero. Real applications perform much more work when recording a measurement (e.g. build attributes, find relevant timeseries for attributes, aggregate parts of the histogram like sum, min, max, etc) so the difference will be much less pronounced. This benchmark zooms in on the difference between computing bucket index with a logarithm vs. bitshifting. Summary of most relevant results:

Benchmark                                                              (scale)  Mode  Cnt       Score       Error   Units
ExponentialHistogramIndexerBenchmark.computeIndex                            1  avgt    5  137272.227 ± 12231.299   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex                            0  avgt    5   15225.018 ±   448.921   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex                           -1  avgt    5   15376.533 ±   527.138   ns/op

Memory allocation

HistogramCollectBenchmark (source): characterizes the memory allocation associated with collecting from different histogram aggregations, showing differences between exponential bucket vs. explicit bucket and cumulative vs. delta. Summary of most relevant results:

Benchmark                                                                                     (aggregationGenerator)  (aggregationTemporality)  Mode  Cnt           Score           Error   Units
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                             EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5    26925732.800 ±    433897.702    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                             EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    31928064.000 ±    214185.867    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                  DEFAULT_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5    30989452.800 ±    784163.837    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                  DEFAULT_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    30526232.000 ±    847271.704    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm           ZERO_MAX_SCALE_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5    20794800.000 ±    450113.512    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm           ZERO_MAX_SCALE_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    20323672.000 ±    515137.009    B/op

In summary, I can't find any reason to recommend using explicit bucket histograms over exponential histograms on the basis of performance in the java implementation. I don't think there are any performance reasons to hold stabilizing exponential histograms.

jmacd · 2022-12-14T23:51:56Z

I support the call to stabilize the exponential histogram as we have it. I agree there are probably useful additions that we can find, but we should be able to make them from where we are. @gouthamve any comment?

oertl · 2022-12-15T06:48:03Z

@jack-berg

Real applications perform much more work when recording a measurement (e.g. build attributes, find relevant timeseries for attributes, aggregate parts of the histogram like sum, min, max, etc) so the difference will be much less pronounced.

Your results showed a computation time of 7-8ns (results divided by the number of repetitions = 2000) for indexers which are not based on the logarithm (scale = 0, -1). Our benchmarks (see #2987 (comment)) showed that fast histogram implementations like DDSketch or DynaHist achieve insertion times (including updating min/max, checking for NaNs, and resizing arrays for counting) that are below 10ns. You got 68ns for just the log-based index computation (index = 1). I am not sure if this overhead can really be neglected in practice.

jmacd · 2022-12-15T15:12:41Z

I would add a similar comment. The benchmarks that I did in Go show the logarithm costing in the 10-20ns range, while a table lookup could save half of that cost. That's why the other costs associated with the histogram data structure outweigh the benefits of table lookup, to me.

jack-berg · 2022-12-15T15:55:09Z

@oertl are you suggesting the design of exponential histograms be adjusted in some way before stabilizing?

Here's how I see it:

The vast majority of applications won't care about the performance of the log approach. As slow as the log approach is relative to the bit shifting and table lookup approaches, it's still just one log call and 68ns.
SDKs can choose to cater to the small slice of users that care by offering better performing indexing implementation when the scale allows it. I.e. log method when scale > 10; table lookup method when 10 >= scale > 0; bit shifting method when scale <= 0.
Users that care about performance can use the proposed MaxScale to force an optimal indexing implementation to be used if available in the SDK.

What more can we do?

This feature is incredibly useful and continuing to hold off on stabilizing it is a disservice to users. Unless there's a change that we can't make incrementally, we should stabilize.

oertl · 2022-12-15T17:26:17Z

@jack-berg please do not get stopped. I just realized that your benchmark is testing an inefficient implementation. If the difference in indexing is really 7ns vs 68ns, that could be problematic from a performance standpoint. But probably the difference is smaller if the implementation gets optimized. In my opinion, the benchmarks presented are not suitable for assessing the runtime contribution of indexing to the overall histogram insertion.

jack-berg · 2022-12-15T18:01:02Z

In my opinion, the benchmarks presented are not suitable for assessing the runtime contribution of indexing to the overall histogram insertion.

Ok, I understand your comment now and agree. Thanks for clarifying!

jack-berg added the spec:metrics Related to the specification/metrics directory label Sep 23, 2022

github-actions bot assigned yurishkuro Sep 23, 2022

rbailey7210 added the triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal label Sep 30, 2022

reyang assigned jsuereth, jmacd and reyang and unassigned yurishkuro Sep 30, 2022

reyang added the area:sdk Related to the SDK label Sep 30, 2022

philkuz mentioned this issue Oct 18, 2022

Support exporting OpenTelemetry Exponential Histograms from Pixie pixie-io/pixie#627

Open

jack-berg mentioned this issue Dec 15, 2022

Mark base-2 exponential histogram as stable #3041

Merged

reyang closed this as completed in #3041 Jan 13, 2023

lenin-jaganathan mentioned this issue May 27, 2023

Add support to export ExponentialHistogram in OTLP micrometer-metrics/micrometer#3861

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize exponential histograms #2832

Stabilize exponential histograms #2832

jack-berg commented Sep 23, 2022

reyang commented Sep 30, 2022

jack-berg commented Dec 14, 2022

jmacd commented Dec 14, 2022

oertl commented Dec 15, 2022

jmacd commented Dec 15, 2022

jack-berg commented Dec 15, 2022

oertl commented Dec 15, 2022

jack-berg commented Dec 15, 2022

Stabilize exponential histograms #2832

Stabilize exponential histograms #2832

Comments

jack-berg commented Sep 23, 2022

reyang commented Sep 30, 2022

jack-berg commented Dec 14, 2022

jmacd commented Dec 14, 2022

oertl commented Dec 15, 2022

jmacd commented Dec 15, 2022

jack-berg commented Dec 15, 2022

oertl commented Dec 15, 2022

jack-berg commented Dec 15, 2022