Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdk/log: Drop duplicated KeyValues #5086

Closed
pellared opened this issue Mar 19, 2024 · 11 comments · Fixed by #5230
Closed

sdk/log: Drop duplicated KeyValues #5086

pellared opened this issue Mar 19, 2024 · 11 comments · Fixed by #5230
Assignees
Labels
area:logs Part of OpenTelemetry logs pkg:SDK Related to an SDK package
Milestone

Comments

@pellared
Copy link
Member

pellared commented Mar 19, 2024

Why

From
https://opentelemetry.io/docs/specs/otel/logs/data-model/#type-mapstring-any:

The keys in the map are unique (duplicate keys are not allowed).

Also: open-telemetry/opentelemetry-specification#3931 (comment)

Adding an option to avoid deduplication (and allow duplicated keys) is tracked as a separate issue. #5133

What

Drop duplicated KeyValues in simple and batch processor OR record.

More:

@pellared pellared added pkg:SDK Related to an SDK package area:logs Part of OpenTelemetry logs labels Mar 19, 2024
@pellared
Copy link
Member Author

pellared commented Apr 2, 2024

There is a similar issue that indicates that users may want to have an option to avoid deduplication: #5130.

I created #5133

@pellared
Copy link
Member Author

pellared commented Apr 5, 2024

The problem of dropping duplicates in the logger is that they can be passed not only by the Bridge API but also by the processor (which can add/set attributes and set a body). Therefore, the duplicates should be deduplicated as far int the pipeline as possible.

Therefore, my proposal is that the OTLP exporters handle deduplication. We can have an option in the OTLP exporters to disable the deduplication to improve the OTLP exporter's performance.

This is looks to be acceptable:

Especially, by @tigrannajaryan:

I think it is fine to do the deduplication anywhere you want as long as externally observable data complies with this document.

I will try to make a second try and update the specification here to make it more clear. I plan to add something more or less like:

The SDKs MAY allow duplicates (represent this type as a list of key-values) as long as the OTLP exporters by default handle deduplication.

@open-telemetry/go-approvers Please leave 👍 if you agree, if so I will close the issue and update the description of:

@pellared
Copy link
Member Author

pellared commented Apr 5, 2024

The alternative, could be creating a processor which would handle deduplication.

I am leaning to this approach as I find that it would be more-performant by default. It follows the behavior of all popular Go logging libraries. At last, even if we would do it by default - the user would be able to by-pass and duplicates at some point (e.g. by implementing a custom processor).

Please leave 👍 if you agree, if so if so I will close this issue and create a new one to add DedupProcessor.

I propose open-telemetry/opentelemetry-specification#3987 to allow flexibility.

@pellared
Copy link
Member Author

pellared commented Apr 5, 2024

Another alternative, could be adding deduplication to the simple and batch processor.

It is the best place to add if we would like to "enable deduplication be default". I assume most people would use the batch processor in production. We could still implement it via a DedupDecorator which would be applied by default. We can add options to the processors to not have them decorated.

Side note: Simple processor does not currently accept any options. I am not sure if we should add options or just focus on batch processor which is intended for production purposes.

Please leave 👍 if you agree, if so if so I will close this issue and create a new one to add DedupProcessor with a description that it should be used by default when creating a simple and batch processor.

@dashpole
Copy link
Contributor

dashpole commented Apr 5, 2024

I think it is fine to do the deduplication anywhere you want as long as externally observable data complies with this document.

This makes it sound like the data passed to an exporter must already be deduplicated.

I think it is fair to provide non-default options like this, with a clear warning that the caller is responsible for ensuring there are no duplicates, assuming there is a need for such high performance.

Is tigran saying this can't be the default?

@pellared
Copy link
Member Author

pellared commented Apr 5, 2024

Is tigran saying this can't be the default

I am not sure if it is a suggestion or a requirement. Personally, I would start with no deduplication by default. We could introduce this behavior as default to the batch processor later (#5086 (comment)) if needed.

I have not heard about many problems because of the lack of deduplication in .NET. They also prefer it as opt-in. Reference: open-telemetry/opentelemetry-dotnet#4324

@MrAlias
Copy link
Contributor

MrAlias commented Apr 5, 2024

The alternative, could be creating a processor which would handle deduplication.

I am leaning to this approach as I find that it would be more-performant by default. It follows the behavior of all popular Go logging libraries. At last, even if we would do it by default - the user would be able to by-pass and duplicates at some point (e.g. by implementing a custom processor).

Please leave 👍 if you agree, if so if so I will close this issue and create a new one to add DedupProcessor.

I propose open-telemetry/opentelemetry-specification#3987 to allow flexibility.

I think we could add this afterwards. Given the OTLP seems to be the only place that de-duplication needs to be done we could start there and if a general processor is needed we could add that later.

@pellared
Copy link
Member Author

pellared commented Apr 8, 2024

if a general processor is needed we could add that later.

I think we should add options (even though there is none) to SimpleProcessor so that we can and a WithoutDedup option later if needed. If we don't do it now, then later it would be a breaking change. WDYT?

PS. My guts tell me that we would have to go with #5086 (comment).

@pellared pellared changed the title sdk/log: logger drops duplicated keyvalues sdk/log: Drops duplicated keyvalues Apr 9, 2024
@pellared pellared changed the title sdk/log: Drops duplicated keyvalues sdk/log: Drop duplicated keyvalues Apr 9, 2024
@pellared
Copy link
Member Author

pellared commented Apr 9, 2024

Based on open-telemetry/opentelemetry-specification#3931 (comment) it looks like it it looks my guts (previous comment) were right.

@pellared pellared changed the title sdk/log: Drop duplicated keyvalues sdk/log: Drop duplicated keyvalues in simple and batch processor Apr 9, 2024
@pellared pellared removed their assignment Apr 9, 2024
@pellared pellared changed the title sdk/log: Drop duplicated keyvalues in simple and batch processor sdk/log: Drop duplicated KeyValues in simple and batch processor Apr 9, 2024
@MrAlias
Copy link
Contributor

MrAlias commented Apr 10, 2024

Partial implementation of attribute de-duplication: #5190

@pellared pellared changed the title sdk/log: Drop duplicated KeyValues in simple and batch processor sdk/log: Drop duplicated KeyValues Apr 16, 2024
@pellared pellared changed the title sdk/log: Drop duplicated KeyValues sdk/log: Drop duplicated KeyValues in simple and batch processor Apr 16, 2024
@pellared pellared changed the title sdk/log: Drop duplicated KeyValues in simple and batch processor sdk/log: Drop duplicated KeyValues Apr 16, 2024
tigrannajaryan pushed a commit to open-telemetry/opentelemetry-specification that referenced this issue Apr 26, 2024
…map as opt-in (#3987)

Fixes #3931

Per agreement: #3931 (comment)

> The SDKs should handle the key-value deduplication by default. It is acceptable to add an option to disable deduplication.

Previous PR: #3938


> I think it is fine to do the deduplication anywhere you want as long as externally observable data complies with this document.

The main purpose of this PR is to have an agreement for following questions (and update the specification to to make it more clear):
1. Is the deduplication required for all log exporters or only OTLP log exporters? Answer: It is required for all exporters.
2. Can the key-value deduplication for log records be opt-in? Answer: Yes, it is OK as long as it is documented that it can cause problems in case maps duplicated keys are exported.

Related to:
- open-telemetry/opentelemetry-go#5086
- open-telemetry/opentelemetry-dotnet#4324
@MrAlias
Copy link
Contributor

MrAlias commented Apr 29, 2024

Resolved by #5230

@MrAlias MrAlias closed this as completed Apr 29, 2024
@MrAlias MrAlias self-assigned this Apr 29, 2024
@MrAlias MrAlias added this to the v1.27.0 milestone May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logs Part of OpenTelemetry logs pkg:SDK Related to an SDK package
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants