Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probability sampling in tracestate specification #2047

Merged
merged 67 commits into from Jan 26, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
c17d841
draft tracestate probability sampling spec
jmacd Oct 19, 2021
37616be
14 res
jmacd Oct 20, 2021
a6318b9
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Oct 21, 2021
882c33d
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Oct 25, 2021
780a7a2
require samplers
jmacd Oct 25, 2021
64146a6
move requirements
jmacd Oct 26, 2021
68fb444
note
jmacd Oct 26, 2021
541e2f4
Merge branch 'main' into jmacd/sampling_spec2
carlosalberto Nov 1, 2021
f02863e
add test spec
jmacd Nov 2, 2021
d9b97a6
Merge branch 'jmacd/sampling_spec2' of github.com:jmacd/opentelemetry…
jmacd Nov 2, 2021
93fdd0c
update test spec with more clarity
jmacd Nov 2, 2021
18c2904
use r:62 in examples (which is in-range)
jmacd Nov 2, 2021
29e3430
reformat test spec table
jmacd Nov 2, 2021
269bdc5
toc
jmacd Nov 2, 2021
ed02c37
give each requirement a name
jmacd Nov 2, 2021
14dac18
add test guidance
jmacd Nov 2, 2021
cc2e858
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Nov 4, 2021
30b4f34
table origin=1
jmacd Nov 4, 2021
67dd0d6
remove the probability-of-r-value column, it causes confusion
jmacd Nov 4, 2021
1c510d7
Reformat probability of r-value; add r-value requirement
jmacd Nov 4, 2021
b04c56d
reorganize (partly) as suggested by yuri
jmacd Nov 4, 2021
cf6c2af
reorder sections
jmacd Nov 4, 2021
11fdaae
typeo
jmacd Nov 4, 2021
961fbef
revise power-of-two sampling intro
jmacd Nov 4, 2021
e3f9f0b
toc
jmacd Nov 4, 2021
8a01692
clarify producer recommendations re: use of non-powers-of-two
jmacd Nov 5, 2021
a38e321
Rephrase consistency guarantees
jmacd Nov 8, 2021
514545a
Apply suggestions from code review
jmacd Nov 16, 2021
93a9b37
one keyword
jmacd Nov 16, 2021
c497172
minor rephrasing
jmacd Nov 16, 2021
85c8939
one paragraph from Peter F and correct the definition of complete sub…
jmacd Nov 16, 2021
9e2c71c
Merge branch 'main' into jmacd/sampling_spec2
jmacd Nov 16, 2021
cbc3852
Apply suggestions from code review
jmacd Nov 17, 2021
fb2827d
Merge branch 'main' into jmacd/sampling_spec2
jmacd Nov 17, 2021
7bef7a4
Merge branch 'main' into jmacd/sampling_spec2
carlosalberto Nov 23, 2021
b46d058
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Nov 29, 2021
58d8981
Merge branch 'jmacd/sampling_spec2' of github.com:jmacd/opentelemetry…
jmacd Nov 29, 2021
2d9d7ff
more producer/consumer recommendations
jmacd Nov 30, 2021
3c0dfd8
lint
jmacd Nov 30, 2021
ddf4fdf
lint
jmacd Nov 30, 2021
86f50dd
eliminate API compatibility with ParentBased
jmacd Nov 30, 2021
f7414d0
Merge branch 'main' into jmacd/sampling_spec2
jmacd Dec 1, 2021
0c2b1cd
add examples
jmacd Dec 7, 2021
318bdd2
categorize and summarize adjusted count
jmacd Dec 7, 2021
5879cfa
move table of r-value probabilities to an appendix
jmacd Dec 7, 2021
73622fe
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Dec 7, 2021
299d2ac
Merge branch 'jmacd/sampling_spec2' of github.com:jmacd/opentelemetry…
jmacd Dec 7, 2021
0da323d
add composition examples
jmacd Dec 8, 2021
a3534ec
Merge branch 'main' into jmacd/sampling_spec2
jmacd Dec 8, 2021
bdf2b03
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Dec 9, 2021
54209de
Merge branch 'jmacd/sampling_spec2' of github.com:jmacd/opentelemetry…
jmacd Dec 9, 2021
7c374c2
clarify the zero special case
jmacd Dec 9, 2021
81a6b07
sampler examples
jmacd Dec 9, 2021
9f951de
TOC
jmacd Dec 9, 2021
eb645ef
Changelog
jmacd Dec 9, 2021
d18cdd2
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Jan 4, 2022
5a78017
rewrite top matter using input from @bdarfler
jmacd Jan 4, 2022
7b90384
sampling->sampled
jmacd Jan 4, 2022
6de220e
typo
jmacd Jan 4, 2022
f568ed4
rephrase
jmacd Jan 4, 2022
0fc9a3e
Update specification/trace/tracestate-probability-sampling.md
jmacd Jan 5, 2022
71c4ce9
Merge branch 'main' of github.com:open-telemetry/opentelemetry-specif…
jmacd Jan 18, 2022
055b921
conform to syntax of https://github.com/open-telemetry/opentelemetry-…
jmacd Jan 18, 2022
7d91729
Merge branch 'jmacd/sampling_spec2' of github.com:jmacd/opentelemetry…
jmacd Jan 18, 2022
4b32c69
Merge branch 'main' into jmacd/sampling_spec2
jmacd Jan 24, 2022
fa83099
Merge branch 'main' into jmacd/sampling_spec2
carlosalberto Jan 26, 2022
b349090
Merge branch 'main' into jmacd/sampling_spec2
jmacd Jan 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion specification/trace/tracestate-handling.md
Expand Up @@ -10,7 +10,7 @@ When setting [TraceState](api.md#tracestate) values that are part of the OTel ec
they MUST all be contained in a single entry using the `ot` key, with the value being
a semicolon separated list of key-value pairs such as:

* `ot=p:8;r:64`
* `ot=p:8;r:62`
* `ot=foo:bar;k1:13`

The [TraceContext](https://www.w3.org/TR/trace-context/) specification defines support for multiple "tenants" each to use their own `tracestate` entry by prefixing `tenant@` to tenant-specific values in a mixed tracing environment. OpenTelemetry recognizes this syntax but does not specify an interpretation for multi-tenant `tracestate`.
Expand Down
352 changes: 352 additions & 0 deletions specification/trace/tracestate-probability-sampling.md
@@ -0,0 +1,352 @@
# TraceState: Probability Sampling

<!-- toc -->

- [Definitions used in this document](#definitions-used-in-this-document)
* [Sampling](#sampling)
+ [Adjusted count](#adjusted-count)
+ [Power-of-two random sampling](#power-of-two-random-sampling)
* [Sampler](#sampler)
+ [Parent-based sampler](#parent-based-sampler)
+ [Probability sampler](#probability-sampler)
+ [Consistent probability sampler](#consistent-probability-sampler)
+ [Always-on sampler](#always-on-sampler)
+ [Always-off sampler](#always-off-sampler)
+ [Non-probability sampler](#non-probability-sampler)
- [Probability sampling](#probability-sampling)
* [Context: traceparent](#context-traceparent)
+ [Sampled flag](#sampled-flag)
- [Requirement 1](#requirement-1)
- [Requirement 2](#requirement-2)
* [Context: tracestate](#context-tracestate)
+ [P-value](#p-value)
- [Requirement 1](#requirement-1-1)
- [Requirement 2](#requirement-2-1)
- [Requirement 3](#requirement-3)
- [Requirement 4](#requirement-4)
- [Requirement 5](#requirement-5)
- [Requirement 6](#requirement-6)
+ [R-value](#r-value)
- [Requirement 1](#requirement-1-2)
- [Requirement 2](#requirement-2-2)
+ [Composition rules](#composition-rules)
- [Requirement 1](#requirement-1-3)
- [Requirement 2](#requirement-2-3)
- [Requirement 3](#requirement-3-1)
- [Requirement 4](#requirement-4-1)

<!-- tocstop -->

**Status**: [Experimental](../document-status.md)

Probability sampling allows OpenTelemetry tracing users to lower their
collection costs with the use of randomized sampling techniques.
OpenTelemetry specifies how to convey and record the results of
probability sampling using the W3C `tracestate` in a way that allows
Span-to-Metrics pipelines to be built that accurately count sampled
spans.

The specification in this document is semantic in nature. Two
`tracestate` fields, known as "r-value" and "p-value", are defined to
enable the development of interoperable probability Sampler
implementations. OpenTelemetry is gathering experience with Samplers
based on this specification while the group considers how to add
probability sampling support to the default SDK specification.

## Definitions used in this document

### Sampling

Sampling is a family of techniques for collecting and analyzing only a
fraction of a complete data set. Individual items that are "sampled"
are taken to represent one or more spans when collected and counted.
The representivity of each span is used in a Span-to-Metrics pipeline
to accurately count spans.

Sampling terminology uses "population" to refer to the complete set of
data being sampled from. In OpenTelemetry tracing, "population"
refers to all spans.

In probability sampling, the representivity of individual sample items
is generally known, whereas OpenTelemetry also recognizes
"non-probability" sampling approaches, in which representivity is not
explicitly quantified.

#### Adjusted count

Adjusted count is a measure of representivity, the number of spans in
the population that are represented by the individually sampled span.
Span-to-metrics pipelines can be built by adding the adjusted count of
each sample span to a counter of matching spans.

For probability sampling, adjusted count is defined as the reciprocal
(i.e., mathematical inverse) of sampling probability.

For non-probability sampling, adjusted count is unknown.

Zero adjusted count is defined in a way that supports composition of
probability and non-probability sampling. Zero is assigned as the
adjusted count when a probability sampler does not select a span.

#### Power-of-two random sampling
jmacd marked this conversation as resolved.
Show resolved Hide resolved

A simple sampling scheme can be implemented using a random bit string
jmacd marked this conversation as resolved.
Show resolved Hide resolved
as the input. This scheme is limited to power-of-two sampling
probabilities, as follows.

1. Express the sampling probability as `2**-s`. For example, 25%
equals `2**-2` with `s=2`
2. Count `r`, the number of consecutive zero bits in the input string
jmacd marked this conversation as resolved.
Show resolved Hide resolved
3. If `s <= r`, select the item with adjusted count `2**s`.

This algorithm is the basis of the consistent probability sampling
approach used in OpenTelemetry, defined in greater detail below.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

### Sampler

A Sampler provides configurable logic, used by the SDK, for selecting
which Spans are "recorded" and/or "sampled" in a tracing client
library. To "record" a span means to build a representation of it in
the client's memory, which makes it eligible for being exported. To
"sample" a span implies setting the W3C `sampled` flag, recording the
span, and exporting the span when it is finished.

OpenTelemetry supports spans that are "recorded" and not "sampled"
for in-process observability of live spans (e.g., z-pages).

The Sampler interface and the built-in Samplers defined by
OpenTelemetry decide immediately whether to sample a span, and the
child context immediately propagates the decision.

#### Parent-based sampler

A Sampler that makes its decision to sample based on the W3C `sampled`
flag from the context is said to use parent-based sampling.

#### Probability sampler

A probability Sampler is a Sampler that knows immediately, for each
of its decisions, the probability that the span had of being selected.

Sampling probability is defined as a number less than or equal to 1
and greater than 0 (i.e., `0 < probability <= 1`). The case of 0
probability is treated as a special, non-probabilistic case.

#### Consistent probability sampler

A consistent probability sampler is a Sampler that supports independent
sampling decisions at each span in a trace while maintaining that
traces will be complete with probability equal to the minimum sampling
probability across the trace. Consistent probability sampling requires that
for any span in a given trace, if a Sampler with lesser sampling probability
selects the span for sampling, then the span would also be selected by a
Sampler configured with greater sampling probability.

#### Always-on sampler

An always-on sampler is another name for a consistent probability
sampler with probability equal to one.

#### Always-off sampler

An always-off Sampler has the effect of disabling a span completely,
effectively excluding it from the population. This is not defined as
a probability sampler with zero probability, because these spans are
effectively unrepresented.

#### Non-probability sampler

A non-probability sampler is a Sampler that makes its decisions not
based on chance, but instead uses arbitrary logic and internal state.

## Probability sampling
jmacd marked this conversation as resolved.
Show resolved Hide resolved

The consistent sampling scheme adopted by OpenTelemetry propagates two
values via the context, termed "p-value" and "r-value":

1. p-value: the "parent probability" value can be set independently by any span in the trace, for its children, and informs child parent-based Samplers of their adjusted count
2. r-value: the "randomness" value is determined and propagated from the root to all spans in the trace and serves to make sampling decisions consistent

Both fields are propagated via the OpenTelemetry `tracestate` under
the `ot` vendor tag using the rules for [tracestate
handling](tracestate-handling.md). Both fields are represented as
unsigned integers requiring at most 6 bits of information. An
invariant will be stated that connects the `sampled` trace flag found
in `traceparent` context to the r-value and p-value found in
`tracestate` context.

### Context: traceparent

The W3C `traceparent` (version 0) contains three fields of
information: the TraceId, the SpanId, and the trace flags. The
`sampled` trace flag has been defined by W3C to signal an intent to
sample the context.

The [Sampler API](sdk.md#sampler) is responsible for setting the
`sampled` flag.

#### Sampled flag

Probability sampling uses additional information to enable consistent
decision making and to record the adjusted count of sampled spans.
When both values are defined and in the specified range, the invariant
between r-value and p-value and the `sampled` trace flag states that
`sampled` is equivalent to the expression `p <= r || p == 63`.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

When the invariant is violated, the `sampled` flag takes precedence
and `p` is unset from `tracestate` in order to signal unknown adjusted count.

##### Requirement 1

If `sampled` is set, the `r` and `p` values are valid, `p < 63`, and
`p > r`, then the invariant is violated. In this case, Samplers
SHOULD honor the `sampled` flag and unset `p` from the OpenTelemetry
`tracestate`.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

##### Requirement 2

If `sampled` is not set, the `r` and `p` values are valid, and `p <=
r` or `p == 63`, the invariant is violated. In this case,
implementations SHOULD honor the `sampled` flag and unset `p` from the
OpenTelemetry `tracestate`.

### Context: tracestate

P-value and r-value are set in the OpenTelemetry `tracestate` using
the identifiers `p` and `r`, each an unsigned base-16 integer.

P-value is valid in the inclusive range `[0, 63]` (i.e., there are 64
valid values).

R-value is valid in the inclusive range `[0, 62]` (i.e., there are 63
valid values).

P-value and r-value are independent settings, each can be meaningfully
set without the other present. The invariant between `sampled`, `p`,
and `r` only applies when both `p` and `r` are present.

#### P-value

Zero adjusted count is represented by the special p-value 63,
otherwise the p-value is set to the negative base-2 logarithm of
sampling probability:

| p-value | Parent Probability | Adjusted count |
| ----- | ----------- | -- |
| 0 | 1 | 1 |
| 1 | 1/2 | 2 |
| 2 | 1/4 | 4 |
| ... | ... | ... |
| N | 2**-N | 2**N |
| ... | ... | ... |
| 61 | 2**-61 | 2**61 |
| 62 | 2**-62 | 2**62 |
| 63 | 0 | 0 |

##### Requirement 1
jmacd marked this conversation as resolved.
Show resolved Hide resolved

Samplers SHOULD unset `p` from the tracestate if the unsigned value is
greater than 63.

##### Requirement 2

Parent-based Samplers SHOULD NOT modify a valid `tracestate`.

##### Requirement 3

Non-probability samplers, having unknown adjusted count, SHOULD unset
jmacd marked this conversation as resolved.
Show resolved Hide resolved
`p` from the `tracestate`.

##### Requirement 4

If p-value is set without r-value, the consumer SHOULD interpret the
adjusted count from the context, which is provided without the ability
to make new consistent sampling decisions.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

##### Requirement 5

Consistent probability samplers, when they decide not sample a span,
MUST unset `p`.

##### Requirement 6

Consistent probability samplers, when they decide to sample a span,
MUST set `p` to the base-2 logarithm of the adjusted count.

#### R-value

The r-value SHOULD be set in the `tracestate` by the Sampler at the
root of the trace in order to support consistent probability sampling.
When the value is omitted or not present, child spans in the trace are
not able to participate in consistent probability sampling.

R-value determines which sampling probabilities and will not sample
jmacd marked this conversation as resolved.
Show resolved Hide resolved
for spans of a given trace, as follows:

| r-value | Probability of r-value | Implied sampling probabilities |
jmacd marked this conversation as resolved.
Show resolved Hide resolved
| ---------------- | ------------------------ | ---------------------- |
| 0 | 1/2 | 1 |
| 1 | 1/4 | 1/2 and above |
| 2 | 1/8 | 1/4 and above |
| 3 | 1/16 | 1/8 and above |
| ... | ... | ... |
| 0 <= r <= 61 | 1/(2**(-r-1)) | 2**(-r) and above |
jmacd marked this conversation as resolved.
Show resolved Hide resolved
| ... | ... | ... |
| 59 | 2**-60 | 2**-59 and above |
| 60 | 2**-61 | 2**-60 and above |
| 61 | 2**-62 | 2**-61 and above |
| 62 | 2**-62 | 2**-62 and above |
jmacd marked this conversation as resolved.
Show resolved Hide resolved

##### Requirement 1

Samplers SHOULD unset both `r` and `p` if the unsigned value is
greater than 62.

##### Requirement 2

Samplers SHOULD NOT modify `r` when it is already set in the `tracestate`.

#### Composition rules

When more than one Sampler participates in the decision to sample a
context, their decisions can be combined using composition rules. In
all cases, the combined decision to sample is the logical-OR of the
Samplers' decisions (i.e., sample if at least one of the composite
Samplers decides to sample).

To combine p-values from two consistent probability Sampler decisions,
the Sampler with the greater probability takes effect. The output
p-value becomes the minimum of the two values for `p`.

To combine a consistent probability Sampler decision with a
non-probability Sampler decision, p-value 63 is used to signify zero
adjusted count. If the probability Sampler decides to sample, its
p-value takes effect. If the probability Sampler decides not to
sample when the non-probability sample does sample, p-value 63 takes
effect signifying zero adjusted count.

##### Requirement 1

When combining Sampler decisions for multiple consistent probability
Samplers and at least one decides to sample, the minimum of the "yes"
decision `p` values MUST be set in the `tracestate`.

##### Requirement 2

When combining Sampler decisions for multiple consistent probability
Samplers and none decides to sample, p-value MUST be unset in the
`tracestate`.

##### Requirement 3

When combining Sampler decisions for a consistent probability Sampler
and a non-probability Sampler, and the probabilty Sampler decides to
sample, its p-value MUST be set in the `tracestate` regardless of the
non-probability Sampler decision.

##### Requirement 4

When combining Sampler decisions for a consistent probability Sampler
and a non-probability Sampler, and the probabilty Sampler decides not
to sample but the non-probability does sample, p-value 63 MUST be set
in the `tracestate`.