Add exporter data classes for experimental profiling signal type. #6374

jhalliday · 2024-04-10T14:29:37Z

First small step towards supporting the new experimental signal type for profiling!

This PR introduces data interfaces corresponding to the OTLP model as defined in
open-telemetry/oteps#239
open-telemetry/opentelemetry-proto#534

Until the latter is merged the wire format is still something of a moving target, but no major changes are expected so it should be safe to start reviewing this...

The code structure corresponds pretty much 1:1 with the .proto files, modulo some tidying up of field names for better readability. The easiest way to review is probably split screen the .java and .proto files - fields appear in the same order in both. The bulk of the work will likely be getting familiar with the new data model, the code itself its not complicated.

The javadocs are more or less copies from the .proto comments, many of which are in turn inherited from google's pprof.proto The OTel profiling SIG has an open task to better document the data model, after which another pass on the code will update the javadocs to match. For now they should be considered placeholders.

Some elements are @deprecated. This is unusual for fresh code! The situation arises because the new OTel model builds on google's existing pprof one, adding some fields and deprecating others. In order to allow 'legacy' style data to be written for compatibility, the data classes have methods for the older fields. An alternative design choice is not to support these, which should in theory allow all necessary data to be written for sending to OTel receivers, but would not allow for older style data expected by existing pprof tools. I've opted for more flexibility since there is little cost.

The profiling format is I think the first OTLP spec in which byte[] type fields appear as first class citizens, not merely as carriers for strings. How to model this in the Java API is an open question, given the design preferences around nullability and immutability. ByteBuffer isn't ideal, but may be preferred to raw byte[]. Protobuf uses ByteString, but that's more relevant to the Marshalers than the API - we don't really want a dependency on protobuf runtime classes at this level.

This foundation code leads on to the Immutable data classes, then the Marshalers. Those are mostly ready, but I prefer smaller pieces for review, so they will follow in later PRs. The Marshalers in particular will require the new proto PR above to be merged first anyhow.

linux-foundation-easycla · 2024-04-10T14:29:42Z

The committers listed above are authorized under a signed CLA.

✅ login: jhalliday / name: Jonathan Halliday (2fa0060, 112796b, ffb73b9, 3044610, ae94404, f84df98, a7af9cb, 64da240, 4d31472)

codecov · 2024-04-11T14:53:42Z

Codecov Report

Attention: Patch coverage is 0% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 90.87%. Comparing base (66cf1b6) to head (ffb73b9).
Report is 55 commits behind head on main.

Files	Patch %	Lines
...try/exporter/otlp/data/AggregationTemporality.java	0.00%	4 Missing ⚠️
.../opentelemetry/exporter/otlp/data/BuildIdKind.java	0.00%	3 Missing ⚠️
.../io/opentelemetry/exporter/otlp/data/LinkData.java	0.00%	2 Missing ⚠️
...metry/exporter/otlp/data/ProfileContainerData.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #6374      +/-   ##
============================================
- Coverage     91.09%   90.87%   -0.23%     
- Complexity     5772     6169     +397     
============================================
  Files           627      682      +55     
  Lines         16852    18494    +1642     
  Branches       1720     1813      +93     
============================================
+ Hits          15351    16806    +1455     
- Misses         1006     1155     +149     
- Partials        495      533      +38

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jkwatson · 2024-04-18T23:36:59Z

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/MappingData.java

+  @Deprecated
+  long getId();


Putting a deprecated method on a brand new data interface seems strange. If this is needed for pprof compatibility, it should be used for that, and not be deprecated. Deprecation means that it would be eventually deleted, and should never be used by any new code. Clearly, there will be new code that should use this, or you wouldn't have added it to the interface. Recommend just adding javadoc around usage, rather than deprecating.

It is indeed strange, a point I made explicitly in the text of the PR above, along with the reasons I did it. I think including the method but marking it deprecated hits a good balance between making the SDK flexible enough to be used for generating pprof style output whilst also signalling to the user that doing so is discouraged. There isn't an annotation with that exact semantic, so Deprecated is as close as I can get whilst still having wide tool support - merely using the javadoc won't be as effective in signalling intent to users as the tools won't parse it.

I fully expect this one to generate some lively discussion - it's even possible to challenge the assumption the methods are needed at all and argue that the OpenTelemetry SDK should be opinionated about only generating OTel style data and not allowing the older pprof style, which would fit with Postel's Law better. https://en.wikipedia.org/wiki/Robustness_principle. I'm happy roll with whatever the maintainers prefer, but being new to the project I don't yet have enough feel to know what that is.

I think we should absolutely leave these out until there is demand for them.

@jack-berg any preference on this one?

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/AggregationTemporality.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/LinkData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/LocationData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ProfileContainerData.java

jkwatson · 2024-04-18T23:42:53Z

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/SampleData.java

+  /** Additional context for this sample. It can include thread id, allocation size, etc. */
+  @Deprecated
+  List<LabelData> getLabels();


same comment on deprecation on a brand new, never before used interface.

Same reply :-)

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ValueTypeData.java

jkwatson · 2024-04-18T23:45:08Z

I'm not sure that mirroring the protos exactly is a great way to build out our SDK classes. We have the opportunity to build something better than raw protos that will apply specifically to java. I'd want to have a very strong reason to expose clearly mutable data (for example), and yet call it as Immutable.

jhalliday · 2024-04-19T09:53:32Z

I'm not sure that mirroring the protos exactly is a great way to build out our SDK classes. We have the opportunity to build something better than raw protos that will apply specifically to java. I'd want to have a very strong reason to expose clearly mutable data (for example), and yet call it as Immutable.

I think there are perhaps two separate issues here. I'll take them in reverse order:

I don't like byte[] in the API and don't expect it to be used in the final version, but before I can swap it for something better we'll need to agree on what that is. ByteBuffer would do, but isn't reliably immutable either. Then again nor is List, which is widely used in the API anyhow. At least it solves the Nullability problem which is distinct from the Immutability one.

The 1:1 mapping to the proto is absolutely horrible as a user API and I am sure it's not a great way to build such an API. That said, this isn't intended to be the user facing API. It's intended for feeding the Marshalers which encode the proto, and for that 1:1 produces the cleanest, least error prone code and maximises flexibility. As it gets more usage, we'll likely start finding patterns that are more user friendly and I expect the eventual public (as distinct from internal) API will look different. Designing that, I'd be concerned with masking the lookup table mechanism, so users can e.g. feed in a List<StackFrame[]> and have all the decomposing of that handled for them. That's likely to take the form of an SDK package that maps the user facing layer onto these Marshaler-facing internal data classes. It's possible that putting these interfaces where I have in the repository structure is misleading as to their intended use, but then again all other protos have more or less 1:1 structure like this, so from a uniformity and maintenance point of view it follows principle of least surprise. Profiling is perhaps the first proto where the wire structure is so significantly different from the optimal user API, so using the same classes for both purposes, as in other signal types, won't work so well. We'll just have to feel out what that means for the package structure as we go along I think.

jkwatson · 2024-04-22T15:01:14Z

If our goal is to provide a route into marshallers, maybe we should create a write-only API that facilitates this use-case, rather than constructing a proto-mirroring set of Java classes. 🤔

Let's discuss approaches at the SIG meeting this week.

jhalliday · 2024-04-22T17:03:20Z

If our goal is to provide a route into marshallers, maybe we should create a write-only API that facilitates this use-case

Marshallers read an object model and transform it into wire format data. This is that object model. It can't be write-only, because then the marshallers have nothing to read. It should be 1:1 with the wire format, so the marshallers don't have to handle complex transformations. This API is well designed for its intended use case. If you have a different use case, that's a different API.

jkwatson · 2024-04-22T20:15:50Z

If our goal is to provide a route into marshallers, maybe we should create a write-only API that facilitates this use-case

Marshallers read an object model and transform it into wire format data. This is that object model. It can't be write-only, because then the marshallers have nothing to read. It should be 1:1 with the wire format, so the marshallers don't have to handle complex transformations. This API is well designed for its intended use case. If you have a different use case, that's a different API.

If our Marshaller exposed a "write only API" to the producer of the profile, then the marshaller can do with the data as it will. There doesn't need to be an intermediate representation, necessarily.

jhalliday · 2024-04-23T10:15:02Z

If our Marshaller exposed a "write only API" to the producer of the profile, then the marshaller can do with the data as it will. There doesn't need to be an intermediate representation, necessarily.

True. However, all existing Marshallers, including the ones I already wrote for profiling, work the same way. This follows principle of least astonishment, ensuring that users/maintainers can reuse prior knowledge of the pattern when working with the new classes. It seems to me there would need to be a good reason for deviating from that. What is it?

jkwatson · 2024-04-24T02:11:06Z

If our Marshaller exposed a "write only API" to the producer of the profile, then the marshaller can do with the data as it will. There doesn't need to be an intermediate representation, necessarily.

True. However, all existing Marshallers, including the ones I already wrote for profiling, work the same way. This follows principle of least astonishment, ensuring that users/maintainers can reuse prior knowledge of the pattern when working with the new classes. It seems to me there would need to be a good reason for deviating from that. What is it?

The primary reason is highlighted by the proposal in this PR. Having an intermediate representation as ugly and non-ergonomic as this one seems like a very good reason to do things differently. The profiling signal is several orders of magnitude (at least) more complex than either metrics, logs or spans. I think it's worth considering a radically different approach.

A big question to answer: how many exporters are there going to be, realistically? If it's just OTLP, then I think skipping the intermediate representation would allow optimizations that would not be possible if it would be required to convert back and forth.

jack-berg

Key question is whether we should strictly reflect the proto representation or have a more user-friendly java representation, and leave the translation to the optimized proto representation to serialization layer. I think we should do the later, since its what we've done elsewhere.

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/AggregationTemporality.java

jack-berg · 2024-04-24T14:39:40Z

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/AggregationTemporality.java

+public enum AggregationTemporality {
+
+  /** The default AggregationTemporality, it MUST not be used. */
+  UNSPECIFIED,


We're inconsistent with how we represent the unspecified option of proto enums:

For metric AggregationTemporality, we omit it, and all the code forces aggregation temporality to be set.

For log SeverityNumber, we include it in the java enum.

For SpanKind, we omit it.

I lean towards omitting it. And since this mirrors metric AggregationTemporality which omits it, we should follow suit unless we have a good reason.

The profiling proto spec explicitly says it's fine to not set field aggregationTemporality in ValueType, so omitting an undefined value means allowing the aggregationTemporality field of ValueType to be null as an alternative way of indicating 'not set'? This is a case where copying comments from the .proto to javadoc is not so good - there is actually a potential use case for the enum value at Java level.

I'm also kind of fan of having the enum ordinals align with the protobuf ones, though to be fair not many users need to debug marshaling issues at that level.

@jack-berg do you still want UNSPECIFIED gone, or shall we keep it to use in place of null / Optional, pending a decision on if the two AggregationTemporality definitions will be merged into one in common?

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/AttributeUnitData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ProfileData.java

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ProfileContainerData.java

jack-berg · 2024-04-24T15:01:10Z

sdk/profiles/src/main/java/io/opentelemetry/sdk/profiles/data/ProfileContainerData.java

+ * @see "profiles.proto::ProfileContainer"
+ */
+@Immutable
+public interface ProfileContainerData {


I wonder if we need to differentiate between ProfileContainer and ProfileData. The proto seems to do it purely because they're extending pprof. I'm inclined to flatten this to a single class called ProfileData.

I think perhaps this is another instance of figuring out how we want to position the SDK to users, much like the deprecated methods question.
I can appreciate the 'OpenTelemetry is the One True Way' approach of providing only functionality that does things the Otel way, but... There may also be use cases where it is desirable to e.g. generate pprof format output for feeding into pre-existing tools that expect it. For such use, being able to handle ProfileData (i.e. pprof/pprofextended) without thinking about the OTel wrapper (ProfileContainerData) makes sense. I don't really want to have to construct and pass a ProfileContainerData around when I'm only interested in the ProfileData field of it. I'm therefore slightly in favour of retaining the greater modularity and flexibility here, as I don't see it having a particularly high overhead in either runtime or maintenance and indeed the code is perhaps even simpler with it in place e.g. marshallers are more loosely coupled.

I'm inclined to leave this as is now that the revised plan is to move these classes into the serialization package for use just by marshallers. A 1:1 between data classes and marshallers helps modularity and testing.

Changes addressing code review comments.

jhalliday · 2024-05-07T13:17:12Z

In term of relocating this to the same place the marshallers will eventually be, how about:
exporters/otlp/profiles/src/main/java/io/opentelemetry/exporter/otlp/profiles/data
or
exporters/profiles-otlp/src/main/java/io/opentelemetry/exporter/profiles/otlp/data

jack-berg · 2024-05-09T19:59:13Z

We don't have any other signals that are structured purely as an exporter, with exporter and data classes bundled together without any sdk artifact.

Looking at the opentelemetry-exporter-otlp artifact, we see packages:

io.opentelemetry.exporter.otlp
- http
  - logs - otlp/http logs
  - metrics - otlp/http metrics
  - traces - otlp/http traces
- logs - otlp/grpc logs
- metrics - otlp/grpc metrics
- traces - otlp/grpc traces

And over in the sdk packages (say opentelemetry-sdk-traces for examples), we see packages:

io.opentelemetry.sdk.traces - top level trace sdk
- data - trace data interfaces
- export - trace exporter interfaces
- samplers - sampler interfaces

(Side note - we could have done better with the SDK package design. Not sure what we gain by having dedicated packages for data, exporting, and samplers. But what's done is done.)

Can we preserve parts of both of these package structures coherently in a single OTLP profile artifact? Not exactly. The two package hierarchies diverge so we couldn't have a single root package for the java module system. Since this is an exporter artifact and not an SDK artifact, let's prioritize symmetry with the existing OTLP exporter artifact:

io.opentelemetry.exporter.otlp
- http
  - profiles - otlp/http profile
- profiles - otlp/grpc profile
- data - profile data interfaces
- export - profile exporter interfaces

I don't anticipate this being permanent, since by bundling data / exporter interfaces with the OTLP exporter, we limit any future exporter to take a dependency on the OTLP profile dependency. Minimally, I think we'd also be interested in a logging exporter at some point. But let's wait on that. Since the OTLP profile exporter will be the only thing dependent on the profile data / exporter interfaces, breaking them out into a separate sdk module will be minimally disruptive.

Changes addressing code review comments.

Add SDK data classes for experimental profiling signal type.

a7af9cb

jhalliday requested a review from a team as a code owner April 10, 2024 14:29

jhalliday added 3 commits April 10, 2024 15:42

Add SDK data classes for experimental profiling signal type.

3044610

Add SDK data classes for experimental profiling signal type.

64da240

Add SDK data classes for experimental profiling signal type.

ae94404