-
Notifications
You must be signed in to change notification settings - Fork 869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify acceptable attribute value types #1777
Conversation
The [log data model](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#field-attributes) indicates that attribute values may be of any data type. To the best of my knowledge, this reflects the broad consensus of stakeholders in the logs community. This pull request proposes a change to a stable document, which itself warrants discussion. I would like to suggest that the current specification was designed for a subset of signal types, and therefore implicitly specifies constraints for only those signals. Therefore, this change would be a clarification and extension of the specification, rather than a breaking change.
specification/common/common.md
Outdated
@@ -16,7 +16,7 @@ Table of Contents | |||
Attributes are a list of zero or more key-value pairs. An `Attribute` MUST have the following properties: | |||
|
|||
- The attribute key, which MUST be a non-`null` and non-empty string. | |||
- The attribute value, which is either: | |||
- The attribute value, which for logs is any type, but for traces and metrics is either: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't ANY be allowed for span attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean to suggest any change to the way span attributes are defined, nor am I trying to defend the current definition.
My only objective is to reconcile the way log attributes are defined, with a strong bias towards the definition in the log data model. Other than that, my suggested change is only an attempt to respect the stability of the spec by clarifying that the current definition was never specifically intended to apply to logs.
That said, my suggestion is entirely based on some assumptions that may or may not be valid:
- That the current definition was written with consideration for traces and metrics, but not with serious consideration for logs.
- That the spec being marked stable means there is very little room, if any, to make changes to it.
- That my suggested change, along with the rationale for it, might be considered a backwards compatible modification.
I hope I've addressed your question, and would welcome any thoughts you have on whether my approach is valid, or whether there is another way to reconcile the definition of log attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested/map span attributes were more or less declined in #596. The corresponding issue #376 is still open though.
Personally, I think that not allowing arbitrarily complex tree data structures as span attributes is an advantage. Allowing basically anything as span attribute may be complex or inefficient to implement in some languages and so far no strong use case has been brought up.
What needed data types are missing? Is this just a convenience layer which converts to the types which are supported in our protocol? |
I think the challenging part is how the downstream can convert these types. For example, if a language has strong type |
Agreed. There was a desire to allow all types for all other signals by many, but this conversion issue was the ultimate limiting factor. I'm not seeing how this isn't also an issue for logs. |
This conversion issue was seen as a problem because we decided to support out-of-the-box in OpenTelemetry SDK a few well-respected open-source trace protocols which allow limited set of well-defined types for span attributes (e.g. Jaeger, Zipkin, OpenCensus). The decision to support these protocols is part of the specification so we needed to honour the limitations that these protocols bring. I believe the situation is different for logs. The specification does not specify that SDK needs to support any particular log protocols and I believe we should not be constrained by limitations of an arbitrarily chosen log protocol (e.g. Syslog allows nothing but strings for fields). I believe our data model for logs should allow the type We do not need to restrict our understanding of logs to the lowest common denominator. Instead we should aim for log data model as it is currently defined in the spec, as it is defined in OTLP Protobufs spec and implemented in OpenTelemetry Collector. OpenTelemetry log data model should be based on OTLP's worldview. We can and should do that because of the following: Logs in OpenTelemetry-enabled applications are expected to be generated via third-party logging library APIs. These logging libraries will use OpenTelemetry logging SDK to emit the logs. Our logging SDK implementations can output the logs generated via a particular logging library API via OTLP without any problems since OTLP data model is a superset of any known logging library API data model. In other words we do not need to restrict our data model to what logging APIs allows. On the contrary, we need to make sure our log data model is a superset of logging API data models so that anything that logging libraries can produce is representable in OpenTelemetry logging SDK. Our log data model also needs to be a superset of log records representable in log network protocols so that OpenTelemetry Collector can receiver and send in these protocols. So, a flexible, future-proof superset is really what we need, not a limited subset. The existing Instead I believe we need to clarify that existing definitions and limitations in This is merely a clarification of trace spec, not a breaking change, so it can be allowed. |
We can separately discuss if we want to change anything for span or resource attributes and if we want to allow broader types for these as well, but I think it needs to be a separate discussion since it may or may not be a breaking change for trace specification. |
I agree 💯 on superset vs subset. The wording "any type" sounds murky though. For example, we don't know if we allow people to pass in a lambda function, delegate, callback function, Timer, closure, continuation, type, monad... even GUID in my previous comment will create lots of debates/confusion. It might be good to clarify that. |
Another thing we probably need to consider - the interaction among different signals. For example, There are cases where folks want to turn existing logging (e.g. log4j logs, ILogger logs) to |
So I understand what's being proposed:
|
I believe |
If I understand the question correctly there is an open issue for that #1613 |
Approving based on @tigrannajaryan's detailed explanation. |
As someone new to OpenTelemetry, my initial thought was that it's indeed useful to have arbitrary (JSON) objects in attributes and resources. However, having considered that a) values of attributes in traces only support primitive types and lists of primitive types I find that it's a good idea to make sure all specs are fully aligned and fields with the same names have exactly the same schema. From a user's perspective, having same schema for attributes in traces as logs would allow selecting data by common attributes. That is given that an arbitrary object can be store in log body. |
It would also be good if the "any" definition could be moved to common, to avoid the cyclic reference between common and logs. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
The log data model indicates that attribute values may be of any data type. To the best of my knowledge, this reflects the broad consensus of stakeholders in the logs community.
This pull request proposes a change to a stable document, which itself warrants discussion. I would like to suggest that the current specification was designed for a subset of signal types, and therefore implicitly specifies constraints for only those signals. Therefore, this change would be a clarification and extension of the specification, rather than a breaking change.