Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify acceptable attribute value types #1777

Closed
wants to merge 2 commits into from

Conversation

djaglowski
Copy link
Member

The log data model indicates that attribute values may be of any data type. To the best of my knowledge, this reflects the broad consensus of stakeholders in the logs community.

This pull request proposes a change to a stable document, which itself warrants discussion. I would like to suggest that the current specification was designed for a subset of signal types, and therefore implicitly specifies constraints for only those signals. Therefore, this change would be a clarification and extension of the specification, rather than a breaking change.

The [log data model](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#field-attributes) indicates that attribute values may be of any data type. To the best of my knowledge, this reflects the broad consensus of stakeholders in the logs community.

This pull request proposes a change to a stable document, which itself warrants discussion. I would like to suggest that the current specification was designed for a subset of signal types, and therefore implicitly specifies constraints for only those signals. Therefore, this change would be a clarification and extension of the specification, rather than a breaking change.
@djaglowski djaglowski marked this pull request as ready for review June 25, 2021 18:21
@djaglowski djaglowski requested review from a team as code owners June 25, 2021 18:21
@@ -16,7 +16,7 @@ Table of Contents
Attributes are a list of zero or more key-value pairs. An `Attribute` MUST have the following properties:

- The attribute key, which MUST be a non-`null` and non-empty string.
- The attribute value, which is either:
- The attribute value, which for logs is any type, but for traces and metrics is either:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't ANY be allowed for span attributes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean to suggest any change to the way span attributes are defined, nor am I trying to defend the current definition.

My only objective is to reconcile the way log attributes are defined, with a strong bias towards the definition in the log data model. Other than that, my suggested change is only an attempt to respect the stability of the spec by clarifying that the current definition was never specifically intended to apply to logs.

That said, my suggestion is entirely based on some assumptions that may or may not be valid:

  • That the current definition was written with consideration for traces and metrics, but not with serious consideration for logs.
  • That the spec being marked stable means there is very little room, if any, to make changes to it.
  • That my suggested change, along with the rationale for it, might be considered a backwards compatible modification.

I hope I've addressed your question, and would welcome any thoughts you have on whether my approach is valid, or whether there is another way to reconcile the definition of log attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested/map span attributes were more or less declined in #596. The corresponding issue #376 is still open though.

Personally, I think that not allowing arbitrarily complex tree data structures as span attributes is an advantage. Allowing basically anything as span attribute may be complex or inefficient to implement in some languages and so far no strong use case has been brought up.

@carlosalberto carlosalberto added area:api Cross language API specification issue spec:logs Related to the specification/logs directory labels Jun 28, 2021
@tedsuo
Copy link
Contributor

tedsuo commented Jun 29, 2021

What needed data types are missing? Is this just a convenience layer which converts to the types which are supported in our protocol?

@reyang
Copy link
Member

reyang commented Jun 29, 2021

I think the challenging part is how the downstream can convert these types.

For example, if a language has strong type GUID or Timestamp, we might want a consistent behavior across language clients. Do we want GUID to be converted to something like cd26ccf675d64521884f1693c62ed303, cd26ccf6-75d6-4521-884f-1693c62ed303, {cd26ccf6-75d6-4521-884f-1693c62ed303} or {CD26CCF6-75D6-4521-884F-1693C62ED303}?

@MrAlias
Copy link
Contributor

MrAlias commented Jun 29, 2021

I think the challenging part is how the downstream can convert these types.

For example, if a language has strong type GUID or Timestamp, we might want a consistent behavior across language clients. Do we want GUID to be converted to something like cd26ccf675d64521884f1693c62ed303, cd26ccf6-75d6-4521-884f-1693c62ed303, {cd26ccf6-75d6-4521-884f-1693c62ed303} or {CD26CCF6-75D6-4521-884F-1693C62ED303}?

Agreed. There was a desire to allow all types for all other signals by many, but this conversion issue was the ultimate limiting factor. I'm not seeing how this isn't also an issue for logs.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Jun 29, 2021

There was a desire to allow all types for all other signals by many, but this conversion issue was the ultimate limiting factor. I'm not seeing how this isn't also an issue for logs.

This conversion issue was seen as a problem because we decided to support out-of-the-box in OpenTelemetry SDK a few well-respected open-source trace protocols which allow limited set of well-defined types for span attributes (e.g. Jaeger, Zipkin, OpenCensus). The decision to support these protocols is part of the specification so we needed to honour the limitations that these protocols bring.

I believe the situation is different for logs. The specification does not specify that SDK needs to support any particular log protocols and I believe we should not be constrained by limitations of an arbitrarily chosen log protocol (e.g. Syslog allows nothing but strings for fields). I believe our data model for logs should allow the type any as it is currently already defined by log data model document in this spec. This is the intent and desire.

We do not need to restrict our understanding of logs to the lowest common denominator. Instead we should aim for log data model as it is currently defined in the spec, as it is defined in OTLP Protobufs spec and implemented in OpenTelemetry Collector. OpenTelemetry log data model should be based on OTLP's worldview. We can and should do that because of the following:

Logs in OpenTelemetry-enabled applications are expected to be generated via third-party logging library APIs. These logging libraries will use OpenTelemetry logging SDK to emit the logs. Our logging SDK implementations can output the logs generated via a particular logging library API via OTLP without any problems since OTLP data model is a superset of any known logging library API data model.

In other words we do not need to restrict our data model to what logging APIs allows. On the contrary, we need to make sure our log data model is a superset of logging API data models so that anything that logging libraries can produce is representable in OpenTelemetry logging SDK. Our log data model also needs to be a superset of log records representable in log network protocols so that OpenTelemetry Collector can receiver and send in these protocols. So, a flexible, future-proof superset is really what we need, not a limited subset.

The existing common.md conflicts with this intent and desire. What is written in common.md was not intended to apply to logs and was never written with logs in mind, these definitions were never discussed in the context of logs. I believe we should not allow now to retroactively apply these limitations to logs.

Instead I believe we need to clarify that existing definitions and limitations in common.md apply to traces (as they were intended) and do not apply to logs.

This is merely a clarification of trace spec, not a breaking change, so it can be allowed.

@tigrannajaryan
Copy link
Member

We can separately discuss if we want to change anything for span or resource attributes and if we want to allow broader types for these as well, but I think it needs to be a separate discussion since it may or may not be a breaking change for trace specification.

@reyang
Copy link
Member

reyang commented Jun 30, 2021

In other words we do not need to restrict our data model to what logging APIs allows. On the contrary, we need to make sure our log data model is a superset of logging API data models so that anything that logging libraries can produce is representable in OpenTelemetry logging SDK. Our log data model also needs to be a superset of log records representable in log network protocols so that OpenTelemetry Collector can receiver and send in these protocols. So, a flexible, future-proof superset is really what we need, not a limited subset.

I agree 💯 on superset vs subset.

The wording "any type" sounds murky though. For example, we don't know if we allow people to pass in a lambda function, delegate, callback function, Timer, closure, continuation, type, monad... even GUID in my previous comment will create lots of debates/confusion. It might be good to clarify that.

@reyang
Copy link
Member

reyang commented Jun 30, 2021

Another thing we probably need to consider - the interaction among different signals.

For example, span.add_event only allows a limited set of value types.

There are cases where folks want to turn existing logging (e.g. log4j logs, ILogger logs) to span.add_event calls. We can imagine there will be folks who want to convert logs to metrics (e.g. if the log already contains pre-agg data), or derive metrics from logs. If logs allow a wider set of types, we will need to define some conversion/downcast rules.

@jsuereth
Copy link
Contributor

So I understand what's being proposed:

  • When would I use a nested-value "Attribute" in a Log vs. having the payload be structured?
  • If I want to correlate telemetry (between metrics/traces/logs) do I need to explicitly ignore these Attributes?

@tigrannajaryan
Copy link
Member

The wording "any type" sounds murky though.

I believe any here refers to the specific any type that is defined precisely in the log data model. Adding a link should make it clear.

@tigrannajaryan
Copy link
Member

  • When would I use a nested-value "Attribute" in a Log vs. having the payload be structured?

If I understand the question correctly there is an open issue for that #1613

@carlosalberto
Copy link
Contributor

Approving based on @tigrannajaryan's detailed explanation.

@errordeveloper
Copy link

errordeveloper commented Jul 7, 2021

As someone new to OpenTelemetry, my initial thought was that it's indeed useful to have arbitrary (JSON) objects in attributes and resources.

However, having considered that

a) values of attributes in traces only support primitive types and lists of primitive types
b) attributes are not a map, but a keyed-list
c) log body can contain any kind of structure

I find that it's a good idea to make sure all specs are fully aligned and fields with the same names have exactly the same schema. From a user's perspective, having same schema for attributes in traces as logs would allow selecting data by common attributes. That is given that an arbitrary object can be store in log body.

@Oberon00
Copy link
Member

Oberon00 commented Jul 7, 2021

It would also be good if the "any" definition could be moved to common, to avoid the cyclic reference between common and logs.
Also consider using a different term, e.g. "log attribute" instead of "attribute", since it seems to be a different meaning.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 15, 2021
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue spec:logs Related to the specification/logs directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants