Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse MetricData #5178

Closed
wants to merge 9 commits into from
Closed

Conversation

jack-berg
Copy link
Member

@jack-berg jack-berg commented Feb 5, 2023

This is POC that pushes the effort to reduce memory allocation to its limit by reusing all data carrier classes on repeated collections (i.e. MetricData, PointData, supporting arrays etc). I've prototyped this on the exponential histogram aggregation, which is the most complicated.

We can arguably do this safely because readers aren't allowed to perform concurrent reads, so if they synchronously consume all the data during collection and export, there's no risk of the data being updated out from under them. Could also make this explicit by adding a method to MetricReader / MetricExporter that indicated the desired memory behavior, where the default is to make immutable data carriers as we do today, while allowing for opting in to this improved alternative.

The memory allocation pretty close to as low as possible with this change. The only remaining allocations I see when profiling are allocations for iterators like this, which would be hard to get rid of.

Performance results before:

Benchmark                                                                                  (aggregationGenerator)  (aggregationTemporality)  Mode  Cnt           Score           Error   Units
HistogramCollectBenchmark.recordAndCollect                                              EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5  3520507291.800 ± 120427365.649   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate                               EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           2.612 ±         0.099  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                          EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5     9642284.800 ±     38984.668    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                                    EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                                     EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           2.000                      ms
HistogramCollectBenchmark.recordAndCollect                                              EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  3587937850.200 ±  44279700.645   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate                               EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           2.789 ±         0.146  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                          EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    10495027.200 ±    508130.695    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                                    EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                                     EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                      ms
HistogramCollectBenchmark.recordAndCollect                             DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5  9513074758.000 ± 595008484.127   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate              DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5           0.608 ±         0.039  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm         DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5     6060252.800 ±     37585.400    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                   DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                             DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  9063879408.400 ± 194774415.736   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate              DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           0.897 ±         0.089  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm         DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5     8527204.800 ±    992718.242    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                   DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                    DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           2.000                      ms
HistogramCollectBenchmark.recordAndCollect                      ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5  2609093750.200 ±  27543850.838   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate       ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5           2.214 ±         0.024  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm  ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5     6057052.800 ±     37585.400    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count            ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                      ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  2584224741.400 ±  66418902.656   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate       ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           3.186 ±         0.109  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm  ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5     8633201.600 ±    167908.737    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count            ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time             ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                      ms

And after:

Benchmark                                                                                  (aggregationGenerator)  (aggregationTemporality)  Mode  Cnt           Score           Error   Units
HistogramCollectBenchmark.recordAndCollect                                              EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5  3473301742.000 ± 136836882.403   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate                               EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           2.588 ±         0.111  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                          EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5     9424992.000 ±     38035.843    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                                    EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                                     EXPLICIT_BUCKET_HISTOGRAM                     DELTA    ss    5           2.000                      ms
HistogramCollectBenchmark.recordAndCollect                                              EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  3415919933.600 ±  17967140.478   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate                               EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           2.979 ±         0.155  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm                          EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5    10671091.200 ±    507025.668    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                                    EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           1.000                  counts
HistogramCollectBenchmark.recordAndCollect:·gc.time                                     EXPLICIT_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           2.000                      ms
HistogramCollectBenchmark.recordAndCollect                             DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5  9312673500.000 ±  18521603.321   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate              DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5           0.066 ±         0.004  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm         DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5      640233.600 ±     37420.249    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                   DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                             DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  9148826575.000 ± 140145663.125   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate              DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           0.011 ±         0.010  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm         DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5      110198.400 ±    100198.845    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count                   DEFAULT_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                      ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5  2666786225.000 ±  22839718.733   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate       ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5           0.228 ±         0.011  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm  ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5      637870.400 ±     37356.180    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count            ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                     DELTA    ss    5             ≈ 0                  counts
HistogramCollectBenchmark.recordAndCollect                      ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5  2590788883.400 ± 129472502.049   ns/op
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate       ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5           0.036 ±         0.009  MB/sec
HistogramCollectBenchmark.recordAndCollect:·gc.alloc.rate.norm  ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5       98403.200 ±     21687.413    B/op
HistogramCollectBenchmark.recordAndCollect:·gc.count            ZERO_MAX_SCALE_BASE2_EXPONENTIAL_BUCKET_HISTOGRAM                CUMULATIVE    ss    5             ≈ 0                  counts

The aggregate reduction of memory between this and the other changes is quite impressive. The default exp histogram aggregation with cumulative temporality has reduced from an original bytes / op of 46_466_259 to 110_198 with this PR. A 99.8% reduction, and 420x improvement!

I've run this locally with an app that produces 1_000_000 unique series, and its pretty impressive how little memory is allocated on collect. Something like 25mb per collection, or 25 bytes per series. Immutability is great, but it's hard to ignore these performance gains!

@codecov
Copy link

codecov bot commented Feb 5, 2023

Codecov Report

Patch coverage: 92.80% and project coverage change: -0.05 ⚠️

Comparison is base (3d5424a) 90.97% compared to head (3db1e81) 90.93%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5178      +/-   ##
============================================
- Coverage     90.97%   90.93%   -0.05%     
- Complexity     4907     4941      +34     
============================================
  Files           552      556       +4     
  Lines         14489    14593     +104     
  Branches       1372     1374       +2     
============================================
+ Hits          13182    13270      +88     
- Misses          907      919      +12     
- Partials        400      404       +4     
Impacted Files Coverage Δ
...rnal/aggregator/AdaptingCircularBufferCounter.java 86.79% <ø> (ø)
...nal/data/MutableExponentialHistogramPointData.java 78.37% <78.37%> (ø)
.../opentelemetry/sdk/internal/PrimitiveLongList.java 95.83% <88.88%> (-4.17%) ⬇️
...io/opentelemetry/sdk/metrics/SdkMeterProvider.java 95.58% <100.00%> (+0.20%) ⬆️
...tor/DoubleBase2ExponentialHistogramAggregator.java 98.71% <100.00%> (+0.08%) ⬆️
...egator/DoubleBase2ExponentialHistogramBuckets.java 63.30% <100.00%> (-6.15%) ⬇️
...ernal/data/MutableExponentialHistogramBuckets.java 100.00% <100.00%> (ø)
...internal/data/MutableExponentialHistogramData.java 100.00% <100.00%> (ø)
...y/sdk/metrics/internal/data/MutableMetricData.java 100.00% <100.00%> (ø)
...nternal/state/DefaultSynchronousMetricStorage.java 93.10% <100.00%> (+0.12%) ⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

if (reset) {
buckets.clear();
}
return copy;
return mutableBuckets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one idea I entertained for awhile performance gain. How are you avoiding multi-threads touching this data?

Is it because you're only returning this to ONE metric-reader at a time and the "hot path" of writes is still writing to the underlying data allocated in this handle?

If so, VERY clever. We should document this in the handle class how it works and why it's safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because you're only returning this to ONE metric-reader at a time and the "hot path" of writes is still writing to the underlying data allocated in this handle?

Yes exactly. While we support multiple readers, we don't support concurrent reads. As long as readers don't hold on to references to MetricData and try to read after they're done reading, they shouldn't get any weird behavior. Right now this won't work with multiple readers since once PeriodicMetricReader calls MetricProducer#collectAllMetrics(), another reader will be able to start reading and MetricData will be mutated out from under the PeriodicMetricReader. Ouch. But this is solvable by providing readers a way to communicate to MetricProducer that they're done consuming the data. For example, by adjusting collectAllMetrics to accept a CompletableResultCode which the reader completes when finished consuming the data, i.e. MetricProducer#collectAllMetrics(CompleteableResultCode).

As you noticed, this also relies on different objects for writes vs. reads (writes use AggregationHandle, reads use some some mutuable MetricData).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that part about readers needing to communicate when they're finished consuming the data. Each reader has its own copies of metric storage, and the mutable MetricData, so its much simpler: It should be safe as long as a MetricReader doesn't hold on to the MetricData references and try to consume them during a subsequent collect.

@jack-berg
Copy link
Member Author

Closing since #5709 has been merged.

@jack-berg jack-berg closed this Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants