Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ExponentialHistogramIndexerBenchmark #4989

Merged

Conversation

jack-berg
Copy link
Member

I was analyzing the HistogramBenchmark and struggling to make sense of the results. Somewhat randomly, certain exponential histogram trials are 5-10x slower than not only explicit bucket histogram trials, but also other exponential histogram trials. For example, here's a group of exponential results:

HistogramBenchmark.aggregate_10Threads                               EXPONENTIAL_SMALL_CIRCULAR_BUFFER   FIXED_BUCKET_BOUNDARIES  avgt   10   29181.673 ±   667.898   ns/op
HistogramBenchmark.aggregate_10Threads                               EXPONENTIAL_SMALL_CIRCULAR_BUFFER  UNIFORM_RANDOM_WITHIN_2K  avgt   10   31704.614 ±   556.680   ns/op
HistogramBenchmark.aggregate_10Threads                               EXPONENTIAL_SMALL_CIRCULAR_BUFFER          GAUSSIAN_LATENCY  avgt   10  195109.475 ±  2487.755   ns/op
HistogramBenchmark.aggregate_10Threads                                     EXPONENTIAL_CIRCULAR_BUFFER   FIXED_BUCKET_BOUNDARIES  avgt   10   29654.776 ±   289.403   ns/op
HistogramBenchmark.aggregate_10Threads                                     EXPONENTIAL_CIRCULAR_BUFFER  UNIFORM_RANDOM_WITHIN_2K  avgt   10  196956.611 ±  2838.807   ns/op
HistogramBenchmark.aggregate_10Threads                                     EXPONENTIAL_CIRCULAR_BUFFER          GAUSSIAN_LATENCY  avgt   10  193478.876 ±  2165.116   ns/op

The GAUSSIAN_LATENCY takes ~6x longer than UNIFORM_RANDOM_WITHIN_2K and FIXED_BUCKET_BOUNDARIES for the EXPONENTIAL_SMALL_CIRCULAR_BUFFER aggregation, but both GAUSSIAN_LATENCY and UNIFORM_RANDOM_WITHIN_2K are ~6x slower for the EXPONENTIAL_CIRCULAR_BUFFER aggregation. What's going on here? Why does the distribution sometimes have such a big impact and other times not?

Well I got to the bottom of it: The code path is different for positive scales than for zero and negative scales. Positive scales compute the bucket using Math.ceil(Math.log(value) * scaleFactor) - 1, while zero and negative scales use bit manipulation. It turns out that Math.log is really slow compared to bit manipulation. Going back to the benchmark, subtle differences in the distribution of values cause the scale to stabilize at different values, sometimes positive, sometimes zero or negative. The scale isn't consistent across runs of the benchmark, but certain distributions (i.e. gaussian) seem to be more prone to resulting in positive scales than others.

What do we do about this?

  • We need to update the benchmark to be more consistent. I've done this by making the starting scale for the exponential histograms zero. scale can be adjusted lower if needed to accommodate the range of measurements but not higher. Starting it at zero means that the performance hit from Math.log bucketing isn't a factor in these benchmarks.
  • Make starting scale configurable. Performance sensitive applications may want to avoid the Math.log codepath altogether. Positive scales correspond to small buckets. If you know your app doesn't need small buckets, you should be able to set your starting scale to zero, avoiding Math.log and also reducing the number of rescales.
  • Add benchmark for bucket indexing. I've added a new ExponentialHistogramIndexerBenchmark benchmark in this PR which specifically illustrates the performance difference between positive, negative, and zero scales. If we think its possible to make performance improvements to bucket indexing for positive scales, this will provide a more focussed place to analyze the results. The baseline results show very clearly the ~9x performance reduction when the scale is positive.
Benchmark                                                              (scale)  Mode  Cnt       Score       Error   Units
ExponentialHistogramIndexerBenchmark.computeIndex                            1  avgt    5  140595.875 ± 21489.356   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate             1  avgt    5      ≈ 10⁻⁴              MB/sec
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate.norm        1  avgt    5       0.057 ±     0.008    B/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.count                  1  avgt    5         ≈ 0              counts
ExponentialHistogramIndexerBenchmark.computeIndex                            0  avgt    5   15333.799 ±   225.434   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate             0  avgt    5      ≈ 10⁻⁴              MB/sec
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate.norm        0  avgt    5       0.006 ±     0.001    B/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.count                  0  avgt    5         ≈ 0              counts
ExponentialHistogramIndexerBenchmark.computeIndex                           -1  avgt    5   15263.419 ±   676.713   ns/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate            -1  avgt    5      ≈ 10⁻⁴              MB/sec
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.alloc.rate.norm       -1  avgt    5       0.006 ±     0.001    B/op
ExponentialHistogramIndexerBenchmark.computeIndex:·gc.count                 -1  avgt    5         ≈ 0              counts

cc @jsuereth

@jack-berg jack-berg requested a review from a team as a code owner November 25, 2022 14:31
@jack-berg
Copy link
Member Author

Forgot to mention but here's the results for HistogramBenchmark after making the starting scale 0. Notice that the results no longer vary wildly for different value distributions, and most of the performance difference between explicit and exponential histograms has gone away.

@codecov
Copy link

codecov bot commented Nov 26, 2022

Codecov Report

Base: 91.26% // Head: 91.26% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (9d30887) compared to base (568bdb4).
Patch coverage: 87.50% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #4989   +/-   ##
=========================================
  Coverage     91.26%   91.26%           
  Complexity     4886     4886           
=========================================
  Files           552      552           
  Lines         14431    14433    +2     
  Branches       1373     1373           
=========================================
+ Hits          13170    13172    +2     
  Misses          874      874           
  Partials        387      387           
Impacted Files Coverage Δ
.../autoconfigure/LogRecordExporterConfiguration.java 98.50% <ø> (ø)
.../sdk/autoconfigure/MeterProviderConfiguration.java 77.27% <ø> (ø)
...y/sdk/autoconfigure/SpanExporterConfiguration.java 100.00% <ø> (ø)
...internal/aggregator/ExponentialBucketStrategy.java 87.50% <80.00%> (-12.50%) ⬇️
...onfigure/spi/internal/DefaultConfigProperties.java 94.11% <100.00%> (ø)
...nfigure/AutoConfiguredOpenTelemetrySdkBuilder.java 97.31% <100.00%> (ø)
.../opentelemetry/exporter/prometheus/Serializer.java 86.01% <0.00%> (-0.43%) ⬇️
...metry/sdk/metrics/export/PeriodicMetricReader.java 90.00% <0.00%> (+2.85%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jack-berg jack-berg merged commit e732328 into open-telemetry:main Dec 1, 2022
dmarkwat pushed a commit to dmarkwat/opentelemetry-java that referenced this pull request Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants