Stabilize messaging semantic conventions for tracing #192

pyohannes · 2021-12-02T01:06:28Z

This OTEP aims to describe the necessary changes for bringing the existing semantic conventions for messaging from the current experimental to a stable state.

It is based on OTEP 0173, which defines basic terms and describes messaging scenarios that should be supported by the semantic conventions.

NOTE: This is an early draft document. It captures results of discussions currently going on in the messaging workgroup.

joaopgrassi

Thanks for putting this up! I left a few things.

text/trace/0192-messaging-semantic-conventions-spec.md

Co-authored-by: Joao Grassi <joao@joaograssi.com>

lmolkova · 2022-07-21T01:47:50Z

text/trace/0192-messaging-semantic-conventions-spec.md

+----------------------|--------|---------
+[`messaging.system`](#messagingsystem) | string | Yes
+[`messaging.operation`](#messagingoperation) | string | Required
+[`messaging.destination.name`](#messagingdestinationname) | string | For producer spans


Done some research on messaging.destination.name:

kafka can publish a batch where different messages are sent to different topics. The only way out of it I see is through link per message with attributes.

Azure EventGrid either does not have a destination name at all and is fully identified by host or has topic-name per message and not per send call, i.e. same issue as above

Also, Pulsar that have tenant/namespace/topic structure, does not really care about it on the client API level. E.g, public/dlt-example/dlt-example-topic is passed as a string to the producer. I believe the whole thing should go to the messaging.destination.name

With this in mind, messaging.destination.name should be a conditionally required attribute (when available and when all messages in a batch are published to the same topic).
And the contract is that it uniquely identifies topic/queue/subject/entity within either

net.peer.name:net.peer.port (not available for GCP, AWS or vanilla JMS) or

cloud.account.id + cloud.region for AWS - check out AWS lambda otel samples

cloud.account.id for GCP

It's probably hard to build UI with so few guarantees.

link to notes discussed over the call: https://gist.github.com/lmolkova/1bdcb0cd56ef876f278c5d9ba8fa7b08

For cloud messaging systems, I think we should go for the cloud-native resource ID, like the full ARN in AWS (see faas.id spec in current specification, it has some general wording that could be extracted / reused here). For AWS SQS you will, however, often only know the Queue URL instead of the full ARN (messaging.url). So it could be a one-of requirement, or we allow both as messaging.destination.
I think you end up needing messaging.system anyways to fully interpret that name.
Alternatively, we could separate destination.name and destination.id, where the first is something suitable for display and the second is something suitable for identifying the destination e.g. to show which operations involve the same queue.

I wanted to see if we can take a step back with this. It seems we have the following clear facts:

This varies a lot between different systems

But it still seem like there is always something that can be added here that uniquely identifies the destination

Given those facts, couldn't we leave this still as Required? We then give some hints on what instrumentation should usually put in here, but ultimately it's on them to know exactly how this should be composed.

I feel we are running into circles with this attribute. Maybe we should make it more flexible and allow each individual messaging system SDK/Instrumentation do what's best for them. WDYT?

This definitely needs a per-system definition, much like faas.id, db.name, etc.. But I don't think each instrumentation should decide on their own, at least not when it is likely that there will be different instrumentations for the same messaging system.

dpauls · 2022-07-22T13:24:07Z

text/trace/0192-messaging-semantic-conventions-spec.md

+> A producer SHOULD attach a creation context to each message. The creation context
+> SHOULD be attached in a way so that it is not possible to be changed by intermediaries.


As stated, the creation context is intended to allow correlation of a message's producer with its consumers. One of the criticisms of the concept of the creation span was that it is essentially a zero-duration span, that isn't really measuring anything. The counter to that criticism was that producer-consumer correlation is important and such a context is important for creating these links.

This makes me wonder why a message's send span couldn't serve this purpose? I believe the main concern with this was that a send span may represent a batch of messages. It should be possible to precisely correlate back to a unique context for a single message.

During yesterday's discussion, we were further puzzling over batch send spans and how to express per-message attributes such as message_id, correlation_id, and destination.name. A number of possibilities were discussed, but a couple that I'll highlight were:

Do not include per-message attributes in the send span directly. Instead, for each message, create a link to its creation context. Inside the link, include per-message attributes.

Create a span for each message in the batch as a child of the batch send. For some messaging systems, this may be a zero-duration span again. However, for other systems I believe a batch send may simply be a convenience function that loops over sending individual messages. In these cases the spans could measure sending the individual messages. Perhaps they are acknowledged as a batch and therefore they would all end at the same time. But you could measure from when you started the send of that message until the batch was acked.

I believe we generally agreed option 2 was the nicest option. The concern was the cost of extra spans and the general dislike for zero-duration spans. However, what if this span also served as the creation context? This gets rid of a span that is almost certainly zero-duration with a span that:

Measures an actual send operation for single message sends

May measure an actual send operation for a single message in a batch; or

May be a zero-duration span for other messages. In these cases wouldn't it also be an option to have the span duration cover the batch send? I'm not sure it adds a lot of value, but I also don't see a problem with this approach.

I think if we took this approach, it completely gets rid of all zero-duration spans for single message sends. It may get rid of zero duration spans for batch sends. I think in all cases the total number of spans is the same or less than we would have had otherwise, and the overal span structure modeling makes a lot more sense to me.

If we did this, I think it makes sense for the "batch send" span (which is the parent of all individual message send spans) be defined as it's own operation, and with its own set of attributes. It would likely be lighter weight as it would only include attributes common for all messages in the send. I think this span structure would be easier for back ends to implement against.

Is there anything I'm missing as to why a send span couldn't serve as the context used for producer-consumer correlation?

Does anyone have any feedback on this idea?

@dpauls thanks for putting the summary here!

However, for other systems I believe a batch send may simply be a convenience function that loops over sending individual messages.

This is interesting. In this case the span/context in each message is actually something that can be used to track time. Do you happen to know maybe a system/SDK that does this?

I believe we generally agreed option 2 was the nicest option. The concern was the cost of extra spans and the general dislike for zero-duration spans. However, what if this span also served as the creation context? This gets rid of a span that is almost certainly zero-duration with a span that:

As we discussed, my understanding is that a creation context is a span, created for each message as part of a batch send operation (thus being children of the "send" span). It's also what the Johannes wrote in this OTEP here: https://github.com/pyohannes/oteps/blob/72e1215d698ad86473f874ef0f0282de07c66521/text/trace/0192-messaging-semantic-conventions-spec.md?plain=1#L152

What I'm a bit confused now is this part: https://github.com/pyohannes/oteps/blob/72e1215d698ad86473f874ef0f0282de07c66521/text/trace/0192-messaging-semantic-conventions-spec.md?plain=1#L162

If a "Create" span exists for a message, its context SHOULD be attached to
the message. If no "Create" span exists, the context of the related "Publish"
span SHOULD be attached to the message.

I interpret this as:

Single message: The Publish span is attached to the message, essentially being copied to it. This makes sense, as it's not necessary to have an extra create span for this case. We can attach message-specific attributes here.

Batch: If the messages don't have the create context, then the Publish span context should be attached to the messages in the batch. Then here we run into the message-specific attributes problem.

I want to mention that a create-span is not necessarily zero-duration. I think there is an argument for changing the concept of creation to make the stages more symmetrical so that just how "publish" corresponds to "receive" or "deliver", "create" should correspond to "process". Meaning, "create" should track the logical, application-level business logic of creating the message (data). The following would then happen in order:

(n times) create some logical message, store creation context in there

publish one of these previously created messages, or a whole batch. Notice that this will typically not be a child of the create span, but might even be on a separate trace. Putting a link to all creation contexts on the publish span would seem sensible. If a creation context is not already on the message, you would still have to proceed as currently discussed, but if there is, no artificial spans have to be created.

... from here one as before ...

Just like for a "process" span, a natural "create" span will usually require some cooperation by the application developer and in many cases cannot be automatically created.

I guess we are all on the same page that having unique context per message is the best approach here. And setting artificial duration on such spans would be a way to hide the logical nature of this span - in most cases it does not track any work or operation except message creation. The only scenario where it'd provide value is when messages are sent separately by the client SDK and it seems like an edge case.

I suggest we don't try to hide this fact - we're lacking OTel concept such as unique context without a span or span without duration or event with unique context and we should discuss it with the broader OTel spec community before making any decisions here.

Summarized different options here https://docs.google.com/document/d/1OrHsepd6GjzXKll1ggZyx1jBQd0d_t8NZXT1ZOem7D0/edit?usp=sharing - let's see if we can discuss it tomorrow on the call.

Meaning, "create" should track the logical, application-level business logic of creating the message (data)

@Oberon00 's take is indeed interesting, I think we never thought this way. It highly depends on how the application works but I can imagine something like this:

App receives a request to schedule a birthday notification to a user

App needs to find the user birth date to be able to create the message Start 'create' span here

App fetchs the necessary data

App creates the message, with the user data + adds the ambient/current create span to it

App passes the message on to be send

Scheduler service receives the message and process it

This indeed is not a 0 duration span, but of course highly depends on application owners to create such spans. In other cases, creating a message might be really just var myMessage = new Message(data) so it's really a 0 duration.

The question is then if creation of data is already (part of) the logical message creation.

The question is then if creation of data is already (part of) the logical message creation.

this is very app-specific. Should message span be the span in which data was read? or created? Both seem to be controversial.

Auto-instrumentations can only control what happens within SDK API and at least my hopes of manual instrumentation precisely following conventions are low.

From my understanding, we are going to push for a message span that goes from producer send to consumer receive. Right?

ppatierno · 2022-07-26T10:32:36Z

text/trace/0192-messaging-semantic-conventions-spec.md

+#### Consumer
+
+A "consumer" receives the message and acts upon it. It uses the context and
+data to execute some logic, which might lead to the occurrence of new messages.


It uses the context

Which context it is referring to?

It's probably the "creation span context" that is used to link a producer with a consumer. But I think we can remove this from here, since it's just defining terminology and setting the stage. Also the creation context term is not defined yet.

Addressed in: 5fa715d

ppatierno · 2022-07-26T10:35:08Z

text/trace/0192-messaging-semantic-conventions-spec.md

+* "Receiving" is the process of obtaining a message from the intermediary.
+* "Processing" is the process of acting on the information a message contains.
+* "Settling" is the process where intermediary and consumer agree on the state
+  of the transfer.


I was wondering why the "Settling" process isn't something taken into account for producer as well.
Depending on the messaging protocol, there is always an "acknowledge" phase related to the message sent. It happens for AMQP 1.0 (it's called "settlement" as well) but also with MQTT (QoS level), or the Kafka protocol (ack) and more. Usually the producer can decide to not receive an "ack" (at most once semantic) or receive it (at least one semantic).

That's a good point. Maybe it explicitly doesn't mention producer because this is inside the Consumer section? Maybe these three bullet points should be put "outside" so it applies for consumers, producers or intermediaries? WDYT?

Yeah, agree. But other than being in the Consumer section I noticed that the "ack" phase from a Producer point of view wasn't taken into account at all, so I thought there was a specific reason for that ... but it doesn't seem to be the case.
We should add this concept and even updating the corresponding ASCIIDOC.

ppatierno · 2022-07-26T10:38:16Z

text/trace/0192-messaging-semantic-conventions-spec.md

+
+#### Consumer
+
+A "consumer" receives the message and acts upon it. It uses the context and


Why isn't it specified a "batch" of messages as well as defined for the producer?

Good point. I will address it.

Addressed in: 5fa715d

ppatierno · 2022-07-26T10:43:27Z

text/trace/0192-messaging-semantic-conventions-spec.md

+4. The consumer processes the message.
+5. The consumer settles the message by notifying the intermediary that the
+   message was processed. In some cases (fire-and-forget), the settlement stage
+   does not exist.


In general I think that "fire and forget" is something more related to the producer than the consumer. It's actually related to the "at most once" semantic when producer sends a message but not waiting for any ack: so a message could been arrived to destination (one time) or not.
The settlement on the consumer side is fine, because some messaging systems have to deal with "removing" message from the queue when it's processed by consumer. Of course, it would not be true with Kafka which works in a completely different manner.

Would you say removing this (fire-and-forget) makes sense then? I can only speculate, but I think the intention was to say that some consumers may chose not to do it, thus being more of a "receive and forget"? I guess @pyohannes would be the ultimate to know the intention for this.

I am not sure that "receive and forget" makes sense as well.
In a traditional messaging system, if a consumer gets the message and then doesn't settle it (so it "forgets"), the message will be still available to other consumers at some point after a timeout so it will drive to a duplicate processing of the same message which would be caused not by a network problem (settlement lost for example) but by the consumer intention to "forget". I would say that in general it doesn't make sense, the consumer should settle to allow the system to remove the message from the queue at some point.
In Kafka instead things are different, because the messaging system doesn't have to remove the message from the "topic partition", it will be there for the configured retention period. The difference is in the client which tracks (and commits) the offset of the messages it already read to continue from there (having anyway the way to re-read older messages).

MQTT offers a QoS level 0, which is defined as follows:

The minimal QoS level is zero. This service level guarantees a best-effort delivery. There is no guarantee of delivery. The recipient does not acknowledge receipt of the message and the message is not stored and re-transmitted by the sender. QoS level 0 is often called “fire and forget” and provides the same guarantee as the underlying TCP protocol.

Something similar exists for AMQP.

There is the concept of at most once delivery, which is possible with several messaging systems and which some people wanted to have covered by these conventions.

Yeah you are right, I forgot that in MQTT for example the consumer can subscribe with a QoS different from the one used by the publisher. Of course, AMQP offers the same with settlement.
Anyway I have never heard talking about "fire and forget" or "receive and forget" so I would avoid using them.
The best way to me sounds to be using:

AT MOST ONCE, which is a QoS 0, (or "receive and forget")

AT LEAST ONCE, which is a QoS 1

EXACTLY ONCE, which is a QoS 2

ppatierno · 2022-07-26T10:43:57Z

text/trace/0192-messaging-semantic-conventions-spec.md

+  v        +--------------+       |
+Publish -> | INTERMEDIARY | -> Receive
+           +--------------+
+```


As previous comment, still wondering if we would need the "ack" phase on the producer side as well.

yes, I think it's missing. A Previous comment from @ppatierno also mentions that

ppatierno · 2022-07-26T12:19:41Z

text/trace/0192-messaging-semantic-conventions-spec.md

+message is sent to or received from.
+
+See [Network Transport Attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/span-general.md#network-transport-attributes)
+for further details.


Having the net. prefix here makes much sense because of network layer related stuff. Just to highlight why I suggested to use messaging.protocol.name and messaging.protocol.version for the messaging related parts.

Note that "net." is not not the network-layer stuff anymore. See merged PR open-telemetry/opentelemetry-specification#2614

Yes. This change was discussed a bit and if you want, you can watch the recording about it here https://youtu.be/gOYmTTJTjLs?t=708

text/trace/0192-messaging-semantic-conventions-spec.md

ppatierno · 2022-07-26T13:07:16Z

text/trace/0192-messaging-semantic-conventions-spec.md

+
+For each producer scenario, a "Publish" span needs to be created. This span
+measures the duration of the call or operation that provides messages for
+sending or publishing to an intermediary. This call or operation (and the


"duration of the call or operation that provides messages for sending" ... what does it actually mean?
For example, with the Kafka producer, the send method is just a way to put the message in a buffer and it will be sent asynchronously (even because it works by using batches). Does the span measure only the time to call the send and exit? What if the producer wants a level of acknowledge waiting for the ack (so the Kafka producer has also a callback for that)?

My interpretation is that it's "vague" intentionally. In my view it can mean/cover all of the scenarios you brought. The problem is that it can't enforce anything and since it changes between systems + what instrumentations can do I'm not sure if we should be highly specific here.

I made a suggestion to list the scenarios.

ppatierno · 2022-07-26T13:25:01Z

text/trace/0192-messaging-semantic-conventions-spec.md

+
+                                           +--------------------------+
+                                           | Ambient                  |
+  +------------+                           +-+------------+-----------+


sorry for the stupid question ... as a newbie on OpenTelemetry, what's this "Ambient"? And how its presence makes clear that this is with "auto-settlement" while the previous one was with "manual settlement".

"Ambient" means the span that's active at the time of Deliver m1. For ex when an HTTP request arrives at a server, it may start a span before reaching the user's handler code. In this case, the user code has a "ambient" context already.

About the auto-settlement I believe the ambient here has no connection, it's just to highlight a use case.

"Ambient" means the span that's active at the time of Deliver m1. For ex when an HTTP request arrives at a server, it may start a span before reaching the user's handler code. In this case, the user code has a "ambient" context already.

Ok thanks! It makes sense now

About the auto-settlement I believe the ambient here has no connection, it's just to highlight a use case.

Maybe instead of having it in the title, we should have a simple description explaining the use case. This could be valid for all the others examples.

text/trace/0192-messaging-semantic-conventions-spec.md

Oberon00

Seeing just how much (valuable!) discussion this OTEP generates, I would strongly suggest to change the goal here:

Do not aim to overhaul & stabilize the messaging semantic conventions in one go. Overhaul it first, and make stabilizing (which is currently the OTEP title) a separate, next goal that comes after.

Oberon00 · 2022-07-27T16:22:43Z

text/trace/0192-messaging-semantic-conventions-spec.md

+This document aims to describe the necessary changes for bringing the [existing semantic conventions for messaging](https://github.com/open-telemetry/opentelemetry-specification/blob/a1a8676a43dce6a4e447f65518aef8e98784306c/specification/trace/semantic_conventions/messaging.md)
+from the current [experimental](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#experimental)
+to a [stable](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable)
+state.


Suggested change

This document aims to describe the necessary changes for bringing the [existing semantic conventions for messaging](https://github.com/open-telemetry/opentelemetry-specification/blob/a1a8676a43dce6a4e447f65518aef8e98784306c/specification/trace/semantic_conventions/messaging.md)

from the current [experimental](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#experimental)

to a [stable](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable)

state.

This document aims to overhaul the [existing semantic conventions for messaging](https://github.com/open-telemetry/opentelemetry-specification/blob/a1a8676a43dce6a4e447f65518aef8e98784306c/specification/trace/semantic_conventions/messaging.md).

Oberon00 · 2022-07-27T16:24:34Z

text/trace/0192-messaging-semantic-conventions-spec.md

@@ -0,0 +1,659 @@
+# Stabilizing messaging semantic conventions for tracing


Suggested change

# Stabilizing messaging semantic conventions for tracing

# Overhauling messaging semantic conventions for tracing

Stabilizing has a more concrete meaning in the context of the spec. Overhauling is just a change to something else that is not stable and can change again.
If the purpose is getting this done for a fist version, I would be in favor of stabilizing.

Oberon00 · 2022-07-27T17:11:17Z

text/trace/0192-messaging-semantic-conventions-spec.md

+[`messaging.destination.kind`](#messagingdestinationkind) | string | No
+[`messaging.destination.temporary`](#messagingdestinationtemporary) | string | No
+[`messaging.destination.anonymous`](#messagingdestinationanonymous) | string | No
+[`messaging.source.name`](#messagingsourcename) | string | For consumer spans


I think the OTEP text should document why the decision was made to split the "destination" attributes were split into "destination" and "source" (and maybe reconsider).

Oberon00 · 2022-07-27T17:20:50Z

text/trace/0192-messaging-semantic-conventions-spec.md

+[`net.app.protocol.name`](#netappprotocolname) | string | No
+[`net.app.protocol.version`](#netappprotocolversion) | string | No
+[`net.peer.ip`](#netpeerip) | string | No
+[`net.peer.name`](#netpeername) | string | No


I want to suggest a new "concept" here, which is that of the "receipt handle". In AWS SQS, when you call the ReceiveMesssage API, you get, along with the message ID (which is also missing as an attribute here), a unique "Receipt handle" for each message. The receipt handle is what you need to provide to DeleteMessage or DeleteMessageBatch to settle the message (instead of, e.g., the message ID).

The receipt handle is a rather but not terribly long base64 string. I wonder if...

...Other messaging systems have a similar mechanism (if not, all the other questions are irrelevant at this point)

...This concept should be described in the "individual message settlement" section

...It would make sense defining an attribute for it, e.g. to correlate with logs. It is possible that this only makes sense in a future step together with intermediary instrumentation. But it could be interesting to track whether a message is re-processed without being re-delivered/received. Maybe too much of an edge case 🙂

Tbh I have never used AWS SQS and don't know any other messaging system having this concept, can you mention another one please?

I don't know one, that's why I'm asking.

I read that wrong ;-) Now it let me think that something like that should be in the Service Bus as well but in general in AMQP 1.0.
Let's take a look at the REST interface for Service Bus via HTTP. Other than a receive and delete message, there is a two-steps way: peek lock and delete. During the Peek Lock the consumer gets the message and a sequencenumber with it. This will be used in the next Delete Message together with the message id to delete the message.
But even from an AMQP 1.0 point of view it makes sense because looking at the specification, the disposition frame (which is the ack from the consumer) defines a delivery-id. This maps with the Azure Service Bus AMQP 1.0 implementation here where you can search for the sequence-number again for deleting a message. I guess that the sequence number maps to the delivery-id at AMQP 1.0 level.
Said that, I would agree that maybe a concept like a receipt handle, sequence number or delivery id could make sense to add for the "settlement" by the consumer.

Oberon00 · 2022-07-27T17:23:19Z

text/trace/0192-messaging-semantic-conventions-spec.md

+particular set of producers and consumer. Often such destinations are unnamed
+or have an auto-generated name.


Why "often"? Isn't that the definition of it being anonymous?

I would agree but in the end it's never unnamed but always auto-generated (it's needed to be addressed). This is the reason why, for example, AMQP 1.0 protocol define it as "dynamic" not "anonymous".

It already says "unnamed or have an auto-generated name". So why is that only "often"? If it is named and the name is not auto-generated, can it still be anonymous?

Tbh I don't know what it's referring to as "unnamed" destination. Right now "anonymous" sounds to be wrong, unless someone else can raise an example of why it's right.

ppatierno · 2022-07-28T20:05:08Z

text/trace/0192-messaging-semantic-conventions-spec.md

+For each producer scenario, a "Publish" span needs to be created. This span
+measures the duration of the call or operation that provides messages for
+sending or publishing to an intermediary. This call or operation (and the
+related "Publish" span) can either refer to a single message or a batch of
+multiple messages.


Suggested change

For each producer scenario, a "Publish" span needs to be created. This span

measures the duration of the call or operation that provides messages for

sending or publishing to an intermediary. This call or operation (and the

related "Publish" span) can either refer to a single message or a batch of

multiple messages.

For each producer scenario, a "Publish" span needs to be created. This span measures the time for publishing the message which, depending on the messaging system and the producer API, could include:

* just storing the message in an internal producer buffer, and asynchronously sent to the intermediary

* actual synchronous sending to the intermediary

* one of the above but also getting an acknowledgement/settlement back from the intermediary

This call or operation (and the related "Publish" span) can either refer to a single message or a batch of multiple messages.

ppatierno · 2022-07-28T20:05:39Z

text/trace/0192-messaging-semantic-conventions-spec.md

+
+For each producer scenario, a "Publish" span needs to be created. This span
+measures the duration of the call or operation that provides messages for
+sending or publishing to an intermediary. This call or operation (and the


I made a suggestion to list the scenarios.

melvinkcx · 2022-08-24T05:02:27Z

text/trace/0192-messaging-semantic-conventions-spec.md

+messages in order to fulfill the [requirements for context propagation](#context-propagation).
+While preserving the freedom for instrumentor to choose how to propagate
+context, in the future these conventions should list recommended ways of how to
+propagate context using popular messaging protocols.


Hi, just leaving a comment as I came across an issue that can be resolved if some sort of standards are in place.

If you refer to the issue above, it is apparent that there is already a divergence of approaches to propagate context with different instrumentors in different languages. I wonder if more attention should be given to establishing a convention before everyone starts doing things differently and becoming out of control.

Hi @melvinkcx
A while ago this OTEP https://github.com/open-telemetry/oteps/blob/main/text/trace/0205-messaging-semantic-conventions-context-propagation.md was merged to make the context propagation a bit more explicit.

It doesn't go in exact details on how this should be done (where in the message to put the context, under which "name" etc) because there's no finalized standards for it. The OTEP only makes it clear that the context should be transported together with the message and should be immutable. Since each protocol/messaging system does something different, it's hard to come up with a specification that works for everyone. Our intentions for now is to define the minimum on how propagation will work, and leave to instrumentation to implement it using their appropriate idioms. For ex, this draft document for AMQP tells to add the context in application-properties.

Of course, once stable specifications are available, we will update the recommendations in the OTel conventions, like the AMQP one.

Thanks for the insights!

Since each protocol/messaging system does something different, it's hard to come up with a specification that works for everyone

How about specifying just a guideline / best-practice like: "Call the propagator on the message headers or equivalent"

I opened a PR to bring the contents of the OTP into the spec. Maybe that already gives the direction we need, without having to be precise on where to add. open-telemetry/opentelemetry-specification#2750. It also mentions when more standards are available, we will be updating the guidelines.

One interesting line on the W3C document for AMQP linked above:
"...AMQP message section "application-properties" is immutable collection of properties"
They recommend to use application-properties because is immutable data that brokers cannot change. This aligns with the immutable Creation Context from above.
The MQTT doc doesn't mention immutability but defines user-properties and the method to use

@brunobat and that's for MQTT v5 you are referring to I guess. Thinking at MQTT 3.1.1 there is no notion of application/user properties, everything is in your payload.

brunobat

Please check #207 because the propagation will soon might have:

Baggage. For app data
Context-scoped attributes. It is proposed that Context-scoped attributes MUST not be propagated.
Instrumentation scope attributes. Not sure.

Seems that we need to clarify what tracing data will be transported in the message and how.

brunobat · 2022-09-19T09:32:29Z

text/trace/0192-messaging-semantic-conventions-spec.md

@@ -0,0 +1,659 @@
+# Stabilizing messaging semantic conventions for tracing


Stabilizing has a more concrete meaning in the context of the spec. Overhauling is just a change to something else that is not stable and can change again.
If the purpose is getting this done for a fist version, I would be in favor of stabilizing.

brunobat · 2022-09-19T15:23:14Z

text/trace/0192-messaging-semantic-conventions-spec.md

+identifiable.
+
+In the strict sense, a _message_ is a payload that is sent to a specific
+destination, whereas an _event_ is a signal emitted by a component upon


This is not the case for a topic, unless the destination is defined as a broker.
Wouldn't it be better to say something like:
_message_ is a payload sent by a Producer to one or more Consumers, either directly or by using Intermediaries
I think it aligns better with what's written bellow.

I think with a topic, the destination is the topic. That's how the current ("old") messaging semantic conventions have it defined: The destination of a message is a queue or a topic.

sounds good to me @Oberon00

brunobat · 2022-09-19T15:30:54Z

text/trace/0192-messaging-semantic-conventions-spec.md

+A _creation context_ allows correlating the producer with the consumers of a
+message, regardless of intermediary instrumentation. The creation context is
+created by the producer and must be propagated to the consumers. It must not be
+altered by intermediaries.  This context helps to model dependencies between


It must not be altered by intermediaries.
The only way to ensure this is to sign the propagated attributes, like what's done with a Json Web Token (JTW). Not saying we need to sign stuff now, but in the future we might need, to prevent tampering.

We have this part sorted out in the other PR open-telemetry/opentelemetry-specification#2750 (comment)

brunobat · 2022-09-20T12:09:25Z

text/trace/0192-messaging-semantic-conventions-spec.md

+> A producer SHOULD attach a creation context to each message. The creation context
+> SHOULD be attached in a way so that it is not possible to be changed by intermediaries.


From my understanding, we are going to push for a message span that goes from producer send to consumer receive. Right?

dpauls · 2022-09-24T13:49:01Z

text/trace/0192-messaging-semantic-conventions-spec.md

+                | Create m2 | . . . . . . . .
+                +-----------+---------+     .   +------------+
+                            | Publish |     . . | Receive m2 |
+                            +---------+         +------------+


We don't seem to be consistent within pull-based consumer scenarios in what the parent context should be for a receive span. I believe the intent is that the parent should be the application's ambient span?

Although intermediary instrumentation is out of scope, it's something I'm thinking about and there is a decision to make as to how the transport context of the message fits into these traces. If you care about the path the message takes, it's nice to have the receive span as a child of the message's context. But since it is the application's "pull" that caused the span, it's probably more correct for the receive span to be a child of the application's ambient span.

If we agree this is the right approach, it probably makes the most sense for the receive span to link to the message's transport context (although no need to mention here as that would be out of scope). If there is no ambient context, it seems as though it would be nicer to be a child rather than create a brand new trace with no parent at all. On the one hand, this means we have inconsistent structure. On the other hand, does splitting the traces up help with anything?

brunobat · 2022-09-26T08:46:13Z

text/trace/0192-messaging-semantic-conventions-spec.md

+1. The _creation context layer_ allows correlating the producer with the
+   consumers of a message, regardless of intermediary instrumentation. The
+   creation context is created by the producer and must be propagated to the
+   consumers. It must not be altered by intermediaries.


"It Should not be altered by intermediaries."
Because we are going to discuss intermediaries later.

brunobat · 2022-09-26T08:57:34Z

text/trace/0192-messaging-semantic-conventions-spec.md

+One possibility to seamlessly integrate producer/consumer and intermediary
+instrumentation in a flexible and extensible way would be the introduction of a
+second transport context layer in addition to the creation context layer.


I like the idea of an "immutable" creation context for the message because it seems simpler and less ambiguous to understand, define and implement.
Transport context seem too broad. It can be used for anything transport related and in each message, it might contain arbitrary pieces of data that were added or modified along the way. Super useful, but makes every message a non obvious subject requiring analysis. Even graphic representation might be convoluted.

brunobat · 2022-09-26T09:01:23Z

text/trace/0192-messaging-semantic-conventions-spec.md

+   instrumentation.
+2. An additional _transport context layer_ allows correlating the producer and
+   the consumer with an intermediary. It also allows to correlate multiple
+   intermediaries among each other. The transport context can be changed by


The correlation should be message based. "Correlating intermediaries" sounds like broker framework related work. Do we want to provide tools to make that easier? Which ones?

brunobat · 2022-09-26T09:17:45Z

text/trace/0192-messaging-semantic-conventions-spec.md

+messages in order to fulfill the [requirements for context propagation](#context-propagation).
+While preserving the freedom for instrumentor to choose how to propagate
+context, in the future these conventions should list recommended ways of how to
+propagate context using popular messaging protocols.


One interesting line on the W3C document for AMQP linked above:
"...AMQP message section "application-properties" is immutable collection of properties"
They recommend to use application-properties because is immutable data that brokers cannot change. This aligns with the immutable Creation Context from above.
The MQTT doc doesn't mention immutability but defines user-properties and the method to use

Oberon00 · 2022-09-27T08:02:12Z

@brunobat
#192 (review)

Please check #207 because the propagation will soon might have

#207 should not be relevant to this PR at all. It does not change anything regarding propagation.

pyohannes · 2022-10-19T21:48:28Z

The messaging workgroup was capturing findings and results of discussions in this draft PR. As this draft PR got very big and it got hard to keep track of discussions going on in comments, the workgroup decided to close this PR and work on different artifacts:

Proposed changes for context propagation are already merged into the specification (Make messaging context propagation requirements explicit opentelemetry-specification#2750).
Proposed changes for trace and span structure are captured in Span structure for messaging scenarios #220.
Proposed changes to attributes are covered in Refactor messaging attributes and per-message attributes in batching scenarios opentelemetry-specification#2763.

This OTEP aims at defining consistent conventions about what spans to create for messaging scenarios, and at defining how those spans relate to each other. Instrumentors should be able to rely on a consistent set of conventions, as opposed to deducing conventions from a set of examples. This was split from OTEP #192, which became too comprehensive.

joaopgrassi reviewed Dec 6, 2021

View reviewed changes

kenfinnigan reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

blumamir reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

blumamir reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Show resolved Hide resolved

blumamir reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Show resolved Hide resolved

blumamir reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

blumamir reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

blumamir reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Show resolved Hide resolved

blumamir reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

blumamir reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

blumamir reviewed Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

dpauls suggested changes Dec 13, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

pyohannes commented Dec 16, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Show resolved Hide resolved

pyohannes commented Dec 16, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Show resolved Hide resolved

lmolkova reviewed Dec 20, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

lmolkova reviewed Dec 20, 2021

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

pyohannes force-pushed the conventions-messaging branch from d392e89 to b890a85 Compare February 3, 2022 21:57

lmolkova mentioned this pull request Feb 7, 2022

[FEATURE REQ] ServiceBus receiver is not traced Azure/azure-sdk-for-java#26269

Closed

blumamir reviewed Feb 9, 2022

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Show resolved Hide resolved

text/trace/0192-messaging-semantic-conventions-spec.md Show resolved Hide resolved

pyohannes mentioned this pull request Feb 15, 2022

Allow adding links after span creation open-telemetry/opentelemetry-specification#2278

Closed

joaopgrassi mentioned this pull request Feb 16, 2022

Introduce new semantic conventions for CloudEvents open-telemetry/opentelemetry-specification#1978

Merged

pyohannes commented Feb 24, 2022

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Show resolved Hide resolved

kenfinnigan reviewed Mar 16, 2022

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved

kenfinnigan reviewed Mar 16, 2022

View reviewed changes

text/trace/0192-messaging-semantic-conventions-spec.md Show resolved Hide resolved

pyohannes and others added 6 commits April 7, 2022 16:29

Messaging semantic conventions for tracing, first draft

03d8291

Change name to PR request id

13c975d

Update text/trace/0192-messaging-semantic-conventions-spec.md

8f4349f

Co-authored-by: Joao Grassi <joao@joaograssi.com>

PR comments

98cf8f3

PR comments

2d8f25c

Add first draft for consumer instrumentation span structure

f48379f

PR comments

72e1215

lmolkova mentioned this pull request Jul 18, 2022

Diagnostics is not setting the Activity ParentId for OpenTelemetry tracing Azure/azure-sdk-for-net#29907

Closed

lmolkova reviewed Jul 21, 2022

View reviewed changes

dpauls reviewed Jul 22, 2022

View reviewed changes

ppatierno reviewed Jul 26, 2022

View reviewed changes

Oberon00 requested changes Jul 27, 2022

View reviewed changes

Oberon00 self-requested a review July 27, 2022 16:21

Oberon00 reviewed Jul 27, 2022

View reviewed changes

ppatierno reviewed Jul 28, 2022

View reviewed changes

joaopgrassi and others added 4 commits August 2, 2022 11:28

Merge branch 'main' into conventions-messaging

e4ccef0

PR suggestions and grammar fixes

5fa715d

Make messaging.system attribute level required

8b4461a

Set requirement level for destination.temporary and anonymous

964d65e

Oberon00 mentioned this pull request Aug 22, 2022

boto3sqs: Do not override propagator-determined key open-telemetry/opentelemetry-python-contrib#1202

Closed

lmolkova mentioned this pull request Aug 22, 2022

Amqp core metrics: step 1 Azure/azure-sdk-for-java#30583

Merged

melvinkcx reviewed Aug 24, 2022

View reviewed changes

lmolkova mentioned this pull request Aug 30, 2022

Refactor messaging attributes and per-message attributes in batching scenarios open-telemetry/opentelemetry-specification#2763

Closed

brunobat reviewed Sep 20, 2022

View reviewed changes

dpauls reviewed Sep 24, 2022

View reviewed changes

brunobat reviewed Sep 26, 2022

View reviewed changes

pyohannes mentioned this pull request Oct 4, 2022

Span structure for messaging scenarios #220

Merged

pyohannes closed this Oct 19, 2022

lmolkova mentioned this pull request Nov 16, 2022

Refactor messaging attributes and specify per-message attributes open-telemetry/opentelemetry-specification#2957

Merged

		> A producer SHOULD attach a creation context to each message. The creation context
		> SHOULD be attached in a way so that it is not possible to be changed by intermediaries.


		#### Consumer

		A "consumer" receives the message and acts upon it. It uses the context and

		@@ -0,0 +1,659 @@
		# Stabilizing messaging semantic conventions for tracing

	# Stabilizing messaging semantic conventions for tracing
	# Overhauling messaging semantic conventions for tracing

		particular set of producers and consumer. Often such destinations are unnamed
		or have an auto-generated name.

Stabilize messaging semantic conventions for tracing #192

Stabilize messaging semantic conventions for tracing #192

Conversation

pyohannes commented Dec 2, 2021

joaopgrassi left a comment

Choose a reason for hiding this comment

lmolkova Jul 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 Jul 29, 2022 • edited

Choose a reason for hiding this comment

joaopgrassi Aug 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 Jul 27, 2022 • edited

Choose a reason for hiding this comment

lmolkova Jul 27, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joaopgrassi Jul 28, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joaopgrassi Jul 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 Jul 27, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joaopgrassi Aug 26, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunobat left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Jul 21, 2022 •

edited

Oberon00 Jul 29, 2022 •

edited

joaopgrassi Aug 5, 2022 •

edited

Oberon00 Jul 27, 2022 •

edited

lmolkova Jul 27, 2022 •

edited

joaopgrassi Jul 28, 2022 •

edited

joaopgrassi Jul 29, 2022 •

edited

Oberon00 Jul 27, 2022 •

edited

joaopgrassi Aug 26, 2022 •

edited

brunobat left a comment •

edited