Add API support for bound instruments #1444

KallDrexx · 2023-12-14T22:06:32Z

Fixes #
Design discussion issue (if applicable) #1374

Changes

This PR adds API support for bound instruments for Counter and UpDownCounter. Bound instruments are counters which always change the value for a specific and predefined attribute set. The focus on this change is creating the initial API surface for bound instruments, and the mechanisms to allow non-bound instruments to become bound (since there is no direct path from aggregates to counters, it's all through closures).

To facilitate counters creating bound instruments that can refer back to their source aggregates, aggregates return a BoundMeasureGenerator<T> closure that is provided alongside the Measure<T> for the aggregate. This closure can then be invoked in order to return a BoundMeasure<T>, which provides the mechanism to invoke a measure for the cached attribute set.

This PR is not performance focused. It should not cause regressions but using bound instruments from this PR may not cause increased throughput. Performance enhancements that take advantage of bound instruments will come in a follow-up PR once we have an agreed upon API and generation path.

Benchmarks/Stress Tests

Main

Counter_Add_Sorted      time:   [717.94 ns 723.06 ns 729.17 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

Counter_Add_Unsorted    time:   [711.21 ns 714.89 ns 719.15 ns]
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe

stress test: metrics
Number of threads: 6
Throughput: 6,367,800 iterations/sec
Throughput: 6,581,800 iterations/sec
Throughput: 6,736,800 iterations/sec
Throughput: 6,720,200 iterations/sec
Throughput: 6,804,600 iterations/sec

PR

Counter_Add_Sorted      time:   [712.98 ns 715.33 ns 717.99 ns]
                        change: [-0.7779% +0.3569% +1.5923%] (p = 0.60 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Counter_Add_Unsorted    time:   [705.13 ns 707.71 ns 710.82 ns]
                        change: [-2.0962% -0.9670% +0.0233%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe

Counter_Bounded         time:   [133.96 ns 134.82 ns 135.86 ns]
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

stress test: metrics
Number of threads: 6
Throughput: 6,173,200 iterations/sec
Throughput: 6,381,600 iterations/sec
Throughput: 6,250,000 iterations/sec
Throughput: 6,434,000 iterations/sec

stress test: metrics_bounded
Number of threads: 6
Throughput: 6,473,400 iterations/sec
Throughput: 6,459,600 iterations/sec
Throughput: 6,339,200 iterations/sec
Throughput: 6,369,200 iterations/sec

Merge requirement checklist

CONTRIBUTING guidelines followed
Unit tests added/updated (if applicable)
Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
Changes in public API reviewed (if applicable)

# Conflicts: # opentelemetry-sdk/src/metrics/internal/exponential_histogram.rs

KallDrexx · 2023-12-14T22:08:13Z

The api will change slightly when/if #1421 get in to take an attribute set instead of a slice of key value pairs, but I didn't want to couple both together. Bounded performance might end up getting a bit better since cloning() becomes cheaper, but performance isn't the point of this PR :)

codecov · 2023-12-14T22:15:36Z

Codecov Report

Attention: 73 lines in your changes are missing coverage. Please review.

Comparison is base (8688e7e) 61.8% compared to head (cd3be86) 62.2%.
Report is 4 commits behind head on main.

Files	Patch %	Lines
opentelemetry/src/metrics/noop.rs	0.0%	12 Missing ⚠️
...-sdk/src/metrics/internal/exponential_histogram.rs	52.6%	9 Missing ⚠️
...pentelemetry-sdk/src/metrics/internal/histogram.rs	0.0%	9 Missing ⚠️
...entelemetry-sdk/src/metrics/internal/last_value.rs	0.0%	9 Missing ⚠️
opentelemetry-sdk/src/metrics/internal/sum.rs	50.0%	9 Missing ⚠️
opentelemetry-sdk/src/metrics/instrument.rs	80.5%	7 Missing ⚠️
...lemetry/src/metrics/instruments/up_down_counter.rs	46.1%	7 Missing ⚠️
opentelemetry/src/metrics/instruments/counter.rs	60.0%	4 Missing ⚠️
...pentelemetry-sdk/src/metrics/internal/aggregate.rs	95.3%	3 Missing ⚠️
opentelemetry-sdk/src/metrics/pipeline.rs	89.4%	2 Missing ⚠️
... and 2 more

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #1444     +/-   ##
=======================================
+ Coverage   61.8%   62.2%   +0.3%     
=======================================
  Files        144     144             
  Lines      19810   20152    +342     
=======================================
+ Hits       12256   12538    +282     
- Misses      7554    7614     +60

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lalitb · 2023-12-18T19:37:10Z

opentelemetry/src/metrics/instruments/counter.rs

@@ -10,6 +10,15 @@ use std::{any::Any, convert::TryFrom};
 pub trait SyncCounter<T> {
    /// Records an increment to the counter.
    fn add(&self, value: T, attributes: &[KeyValue]);
+
+    /// Creates a child counter that's bound to a specific attribute set
+    fn bind(&self, attributes: &[KeyValue]) -> Arc<dyn BoundSyncCounter<T> + Send + Sync>;


As Bound Instrument is not in specs (yet), should we have them under the feature flag (otel_unstable)? Not sure if this is going to add more noise with cfg macros across.

I think as long as on the API side we mark it then we should be ok ?
I don't believe we need to add everywhere. Mainly on the pub interfaces of API and SDK.

This is where having specific feature flags is useful.

Mainly on the pub interfaces of API and SDK.

Agree, that's what we followed for sync gauge support in #1410 - only the pub interfaces are encapsulated within the feature flag.

lalitb · 2023-12-18T19:39:00Z

opentelemetry/src/metrics/instruments/counter.rs

+
+    /// Creates a child counter that's bound to a specific attribute set
+    fn bind(&self, attributes: &[KeyValue]) -> Arc<dyn BoundSyncCounter<T> + Send + Sync>;
+}


Not sure how relevant is this for the current implementation, but the earlier bound instrument specs also had an unbind() method to release (any) allocated resources - https://github.com/open-telemetry/opentelemetry-specification/blob/v1.1.0/specification/metrics/api.md#bound-instrument-calling-convention

As a consequence of their performance advantage, bound instruments also consume resources in the SDK. Bound instruments MUST support an Unbind() method for users to indicate they are finished with the binding and release the associated resources. Note that Unbind() does

I saw that in the spec, but figured we didn't need it in Rust because UnBind() is no different than drop.

How does this affect the Bounded Measure Generator being 'static ?
If at all.
Maybe a test is worthwhile.

'static is a lifetime, it's not the static keyword. A lifetime only defines the scope of how long a value can last for. In other words it defines how long a value can last, not how long it will last. Almost all owned values have a 'static lifetime because they can be passed to other threads and can be kept alive for the rest of the program's lifetime if you want. However, when they go out of scope they get drop() called, just like any other owned value.

fn do_stuff(counter: Counter<u64>) { // pre_do_stuff { let bound_counter = counter.Bind(some_attributes); bound_counter.add(5); } // <-------------------------------------------------------- Drop is here // other stuff }

Since the bound_counter variable goes out of scope on the Drop is here line, the compiler calls drop() for it. Since drop() doesn't have an explicit implementation, it just calls drop() on all of it's fields, which will drop the Arc, which will either modify the reference count, or drop the underlying bound measure if it's the last instance of it.

I'm not really sure what you would test for. Not only is this all built-in and is fundamental to the whole Rust lifetime system, there's no observable behavior in this PR (there is in the performance oriented PR I have ready to go, but not in this PR).

The only reason 'static is defined in this code is because the Arc is holding a trait, not a concrete object. That means someone could have implemented the BoundSyncCounter<T> trait on a shared reference, in which case it's not valid to pass around because the compiler would not be able to guarantee the shared reference doesn't outlive it's owner. Thus 'static is required to tell the compiler "this is safe to pass around in any context, as this can live up to the end of the application if the consumer wants".

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=618322dc1237665746a99dd934347ffb

This fully utilizes the trait to make sure it's not getting dropped due to non-use: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=87650c8e735c22878046af4c5d567e69

hdost

Like the general direction, kinda surprised not to see much of a performance gain over standard counters. I feel like (not related to this change) we have a lot of Arc spread through the code that maybe we could avoid. A bit of locking our locks so we can lock more locks.

KallDrexx · 2023-12-19T14:00:38Z

Like the general direction, kinda surprised not to see much of a performance gain over standard counters. I feel like (not related to this change) we have a lot of Arc spread through the code that maybe we could avoid. A bit of locking our locks so we can lock more locks.

Arc contains no locking. It's an atomic number that tracks how many references there are. When you clone() it increases the atomic count, and when you drop() it decrements the atomic count (and if it's the last reference it drops the underlying value). There's no locking involved in cloning, dropping, or accessing an Arc, and a lock is only involved if you access an internal function of the held type that invokes a lock (e.g. a mutex held by the arc).

The reason this doesn't provide any performance benefits is partially because #1421 is not yet merged. Since we don't have that change in, we must do a hard clone of the attribute set every time a bound measure is called (and for each measure in the ResolvedMeasure. Without #1421 it's cloning the whole vector and doing extra allocations.

There's no performance benefit too because this PR has bound instruments as a thin wrapper around normal measurements. Once this PR is merged, the next PR fully takes advantage of bound instruments by creating an atomic that's incremented by the bound measure function. This allows the BoundCounter.add() calls to increment with zero mutex locking and zero hashmap lookups. That's what this PR is setting the stage for.

KallDrexx added 10 commits December 13, 2023 14:45

Creation of bounded measure generators

e478fe9

Pass bounded measure generators down the stack

bbf71ea

Succinct measure wrapper type

0a26523

fmt

ea14e2b

Merge branch 'main' into bounded_instruments

d911cb5

# Conflicts: # opentelemetry-sdk/src/metrics/internal/exponential_histogram.rs

Clippy complexity warning

4ae0c0b

Bind() api calls

19d3be9

Added bounded counter test

d350de7

Integratoin tests and benchmark

070d65e

Bound up down counter support

ae98ade

KallDrexx requested a review from a team as a code owner December 14, 2023 22:06

lalitb reviewed Dec 18, 2023

View reviewed changes

hdost reviewed Dec 19, 2023

View reviewed changes

KallDrexx added 2 commits January 5, 2024 11:08

Merge branch 'main' into bounded_instruments

09a4ed8

Clippy lint fix

09039f6

KallDrexx force-pushed the bounded_instruments branch from 748d5ff to e88a102 Compare January 29, 2024 16:14

Merge branch 'main' into bounded_instruments

cd3be86

KallDrexx force-pushed the bounded_instruments branch from e88a102 to cd3be86 Compare February 8, 2024 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API support for bound instruments #1444

Add API support for bound instruments #1444

KallDrexx commented Dec 14, 2023

KallDrexx commented Dec 14, 2023

codecov bot commented Dec 14, 2023 •

edited

lalitb Dec 18, 2023

hdost Dec 19, 2023

lalitb Dec 19, 2023

lalitb Dec 18, 2023

KallDrexx Dec 18, 2023

hdost Dec 19, 2023 •

edited

KallDrexx Dec 19, 2023

cijothomas Dec 19, 2023

KallDrexx Dec 19, 2023

hdost left a comment

KallDrexx commented Dec 19, 2023 •

edited

Add API support for bound instruments #1444

Are you sure you want to change the base?

Add API support for bound instruments #1444

Conversation

KallDrexx commented Dec 14, 2023

Changes

Benchmarks/Stress Tests

Main

PR

Merge requirement checklist

KallDrexx commented Dec 14, 2023

codecov bot commented Dec 14, 2023 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hdost Dec 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hdost left a comment

Choose a reason for hiding this comment

KallDrexx commented Dec 19, 2023 • edited

codecov bot commented Dec 14, 2023 •

edited

hdost Dec 19, 2023 •

edited

KallDrexx commented Dec 19, 2023 •

edited