Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize exponential histograms #2832

Closed
jack-berg opened this issue Sep 23, 2022 · 8 comments · Fixed by #3041
Closed

Stabilize exponential histograms #2832

jack-berg opened this issue Sep 23, 2022 · 8 comments · Fixed by #3041
Assignees
Labels
area:sdk Related to the SDK spec:metrics Related to the specification/metrics directory triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal

Comments

@jack-berg
Copy link
Member

Exponential histograms have implementations in Java, Dotnet, Go, and Python.

There's still the idea of the zero threshold, which needs to be defined in the proto and likely needs to be configurable in the SDK in some fashion. However, this can be added in a backwards compatible way.

If there are no other issues, we should stabilize the exponential histogram data model and aggregation.

@jack-berg jack-berg added the spec:metrics Related to the specification/metrics directory label Sep 23, 2022
@rbailey7210 rbailey7210 added the triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal label Sep 30, 2022
@reyang
Copy link
Member

reyang commented Sep 30, 2022

+1 we should push this forward.

There are some potential improvements / issues that we can discuss in the PR.
https://cloud-native.slack.com/archives/C01NP3BV26R/p1663690523370849

image

@jack-berg
Copy link
Member Author

I've been analyzing the exponential histogram java implementation to determine if the design has any inherent performance issues. Some takeaways:

  1. Computing exponential bucket indexes is not materially different than explicit bucket histogram. The log function needed at scales > 10 is slower than bit shifting strategies, but only the most demanding applications will take issue with a single log call being on the hot path. We can add a MaxScale to accommodate these users, but don't necessarily need it initially.
  2. Exponential histogram memory allocation is not materially different than explicit bucket histograms. This makes sense given that the circular array data structure used to track exponential bucket counts is not too different than the array used to track explicit bucket counts. The java implementation was doing some naive things that caused excess allocation which were addressed here and here.

Here are a some benchmarks to support these takeaways:

Computing bucket index

HistogramBenchmark (source): characterizes bucket indexing times while trying to control against things unlikely to occur recording each measurement. The exponential histograms in this example use a max scale of 0 to control against different scales impacting results. Summary of most relevant results:

Benchmark                                                                       (aggregation)                (valueGen)  Mode  Cnt      Score      Error   Units
HistogramBenchmark.aggregate_5Threads                                 EXPLICIT_DEFAULT_BUCKET          GAUSSIAN_LATENCY  avgt   10  38317.681 ±  333.509   ns/op
HistogramBenchmark.aggregate_5Threads                                  EXPLICIT_SINGLE_BUCKET          GAUSSIAN_LATENCY  avgt   10  27149.226 ±  412.018   ns/op
HistogramBenchmark.aggregate_5Threads                       EXPONENTIAL_SMALL_CIRCULAR_BUFFER          GAUSSIAN_LATENCY  avgt   10  27005.325 ± 1958.334   ns/op
HistogramBenchmark.aggregate_5Threads                             EXPONENTIAL_CIRCULAR_BUFFER          GAUSSIAN_LATENCY  avgt   10  26878.814 ±  306.520   ns/op

ExponentialHistogramIndexerBenchmark (source): characterizes the difference in time to compute exponential bucket indexes when scale is positive, negative, or zero. Real applications perform much more work when recording a measurement (e.g. build attributes, find relevant timeseries for attributes, aggregate parts of the histogram like sum, min, max, etc) so the difference will be much less pronounced. This benchmark zooms in on the difference between computing bucket index with a logarithm vs. bitshifting. Summary of most relevant results:

Benchmark                                                              (scale)  Mode  Cnt       Score       Error   Units
ExponentialHistogramIndexerBenchmark.computeIndex                            1  avgt    5  137272.227 ± 12231.299   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex                            0  avgt    5   15225.018 ±   448.921   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex                           -1  avgt    5   15376.533 ±   527.138   ns/op

Memory allocation

HistogramCollectBenchmark (source): characterizes the memory allocation associated with collecting from different histogram aggregations, showing differences between exponential bucket vs. explicit bucket and cumulative vs. delta. Summary of most relevant results:

Benchmark                                                                                     (aggregationGenerator)  (aggregationTemporality)  Mode  Cnt           Score           Error   Units
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                             EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5    26925732.800 ±    433897.702    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                             EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    31928064.000 ±    214185.867    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                  DEFAULT_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5    30989452.800 ±    784163.837    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                  DEFAULT_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    30526232.000 ±    847271.704    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm           ZERO_MAX_SCALE_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5    20794800.000 ±    450113.512    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm           ZERO_MAX_SCALE_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    20323672.000 ±    515137.009    B/op

In summary, I can't find any reason to recommend using explicit bucket histograms over exponential histograms on the basis of performance in the java implementation. I don't think there are any performance reasons to hold stabilizing exponential histograms.

@jmacd
Copy link
Contributor

jmacd commented Dec 14, 2022

I support the call to stabilize the exponential histogram as we have it. I agree there are probably useful additions that we can find, but we should be able to make them from where we are. @gouthamve any comment?

@oertl
Copy link

oertl commented Dec 15, 2022

@jack-berg

Real applications perform much more work when recording a measurement (e.g. build attributes, find relevant timeseries for attributes, aggregate parts of the histogram like sum, min, max, etc) so the difference will be much less pronounced.

Your results showed a computation time of 7-8ns (results divided by the number of repetitions = 2000) for indexers which are not based on the logarithm (scale = 0, -1). Our benchmarks (see #2987 (comment)) showed that fast histogram implementations like DDSketch or DynaHist achieve insertion times (including updating min/max, checking for NaNs, and resizing arrays for counting) that are below 10ns. You got 68ns for just the log-based index computation (index = 1). I am not sure if this overhead can really be neglected in practice.

@jmacd
Copy link
Contributor

jmacd commented Dec 15, 2022

I would add a similar comment. The benchmarks that I did in Go show the logarithm costing in the 10-20ns range, while a table lookup could save half of that cost. That's why the other costs associated with the histogram data structure outweigh the benefits of table lookup, to me.

@jack-berg
Copy link
Member Author

@oertl are you suggesting the design of exponential histograms be adjusted in some way before stabilizing?

Here's how I see it:

  • The vast majority of applications won't care about the performance of the log approach. As slow as the log approach is relative to the bit shifting and table lookup approaches, it's still just one log call and 68ns.
  • SDKs can choose to cater to the small slice of users that care by offering better performing indexing implementation when the scale allows it. I.e. log method when scale > 10; table lookup method when 10 >= scale > 0; bit shifting method when scale <= 0.
  • Users that care about performance can use the proposed MaxScale to force an optimal indexing implementation to be used if available in the SDK.

What more can we do?

This feature is incredibly useful and continuing to hold off on stabilizing it is a disservice to users. Unless there's a change that we can't make incrementally, we should stabilize.

@oertl
Copy link

oertl commented Dec 15, 2022

@jack-berg please do not get stopped. I just realized that your benchmark is testing an inefficient implementation. If the difference in indexing is really 7ns vs 68ns, that could be problematic from a performance standpoint. But probably the difference is smaller if the implementation gets optimized. In my opinion, the benchmarks presented are not suitable for assessing the runtime contribution of indexing to the overall histogram insertion.

@jack-berg
Copy link
Member Author

In my opinion, the benchmarks presented are not suitable for assessing the runtime contribution of indexing to the overall histogram insertion.

Ok, I understand your comment now and agree. Thanks for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK spec:metrics Related to the specification/metrics directory triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants