Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that schema transformations SHOULD overwrite input data #3505

Conversation

tigrannajaryan
Copy link
Member

Resolves: #3497

Alternates Considered

Instead of requiring overwriting the input data we could say that the transformation SHOULD be aborted if a conflict is detected or that transformation SHOULD be ignored if a conflict is detected.

The chosen approach (overwriting) seems to be best balance between maintaining reasonable outcome (keep the more important data, discarding the less important data) while also having minimal complexity of implementation (aborting and reporting errors is more complicated and more costly).

Why "SHOULD" and not "MUST"?

There are significant tradeoffs involved in what happens in the schema transform and we believe there may be valid use cases where after careful consideration a different approach may be taken (e.g. reject and surface the problem to the end user). We do not want to prohibit these other approaches. The goal of this change is to merely have a reasonable default interpretation of the schema transformation rules.

specification/schemas/file_format_v1.1.0.md Outdated Show resolved Hide resolved
Comment on lines +344 to +346
Some schema changes may describe a transformation that results in the conflicts with
the input data. In situations like this **the transformations described by Schema File
SHOULD take precedence** over the conflicting data present in the input.
Copy link
Contributor

@pyohannes pyohannes May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chosen approach (overwriting) seems to be best balance between maintaining reasonable outcome (keep the more important data, discarding the less important data)

What are the reasons for classifying some data to be more important than other?

In the example given below, an instrumentation library might emit host according to semantic conventions, and a user might add a custom attribute host.name. When transforming this data to conform to schema 1.1.0 according to the conflict handling rule proposed here, the data explicitly specified by the customer in host.name will be lost. In my opinion it cannot be generally said that the one is more important than the other, but would depend on the particular use case.

Probably this is a rare edge case, however, I consider it problematic to drop data during schema conversions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the reasons for classifying some data to be more important than other?

An attribute defined in official semantic conventions and subject to schema file is deemed more important than a custom attribute. Arguable, I know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that custom attributes should not use Otel namespaces. So this is someone doing a bad job at following Otel recommendations, polluting data with custom attributes with bad names that are later used by Otel for a likely different purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say a custom attribute set by the application explicitly is much more important than any automation.

From "the least surprise" principle, it'd be confusing to override it and hard to investigate who and why overridden custom data.

It's also unclear why the instrumentation of V-1 supports attributes from version V, so this telemetry is inconsistent and broken already. Fixing it does not seem beneficial.

It seems to be best to warn (once) and not touch this data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment at #3497 for why "not touch" is hard to implement.

Resolves: open-telemetry#3497

Alternates Considered
=====================

Instead of requiring overwriting the input data we could say that the
transformation SHOULD be aborted if a conflict is detected or that transformation
SHOULD be ignored if a conflict is detected.

The chosen approach (overwriting) seems to be best balance between maintaining
reasonable outcome (keep the more important data, discarding the less important data)
while also having minimal complexity of implementation (aborting and reporting errors
is more complicated and more costly).

Why "SHOULD" and not "MUST"?
============================

There are significant tradeoffs involved in what happens in the schema transform
and we believe there may be valid use cases where after careful consideration a
different approach may be taken (e.g. reject and surface the problem to the end user).
We do not want to prohibit these other approaches. The goal of this change is to merely
have a reasonable default interpretation of the schema transformation rules.
@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/clarifyschemaconficts branch from f1a1d40 to aab0407 Compare May 15, 2023 19:08
@tigrannajaryan
Copy link
Member Author

It is probably worth discussing the approaches in the issue #3497 (comment) before discussing this PR (which implements one of the suggested approaches).

@tigrannajaryan tigrannajaryan marked this pull request as draft May 15, 2023 19:28
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 23, 2023
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this May 31, 2023
@tigrannajaryan
Copy link
Member Author

Anyone interested in this topic see #3497 for discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Schema] Converting values that cause conflicts
4 participants