Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update OTLP exporter transient errors table #1642

Merged
merged 6 commits into from Apr 29, 2021
Merged

update OTLP exporter transient errors table #1642

merged 6 commits into from Apr 29, 2021

Conversation

paivagustavo
Copy link
Member

@paivagustavo paivagustavo commented Apr 23, 2021

Changes

The transient errors table in the protocol/exporter.md specify Permission Denied and Unauthenticated errors as retryable, but they're defined as permanent error in the otep-35 and in the protocol/otlp.md file.

Related issues

This was found while implementing retries in the go sdk.

Copy link
Member

@arminru arminru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the spec is usually more up to date than the respective OTEP. OTEPs are used for the initial discussion but as details go into the spec, changes applied there are usually not reflected back in the OTEP. @tigrannajaryan PTAL if this deviation is intended or not.

It sounds reasonable to remove them, however, as I don't think we would be able to recover from missing or misconfigured credentials unless authentication was turned off or permissions for the configured credentials were changed during runtime of the exporter. I'm not sure if this is likely enough so that consistently retrying is worth it.

Comment on lines 77 to 83
| 7 | Permission Denied |
| 8 | Resource Exhausted |
| 10 | Aborted |
| 10 | Out of Range |
| 14 | Unavailable |
| 15 | Data Loss |
| 16 | Unauthenticated |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to have a discussion if this change is considered breaking as this is declared stable already.
In theory, one could rely on exports being retried in these cases although I think it's unlikely.

@arminru arminru requested a review from a team April 26, 2021 09:34
@tigrannajaryan
Copy link
Member

specification/protocol/exporter.md is incorrect. specification/protocol/otlp.md is correct. I think a mistake happened when #699 added specification/protocol/exporter.md file to the spec.

The retryable codes are defined by OTEP 0065: open-telemetry/oteps#65. Permission Denied and Unauthenticated were pointed out by @yurishkuro that they should be non-retryable and I agreed and that's how OTEP 0065 was approved.

I would consider this a typo in specification/protocol/exporter.md that needs to be fixed. specification/protocol/otlp.md is the canonical information for OTLP and it is corerct.

We should probably remove the list of codes from specification/protocol/exporter.md and just refer to specification/protocol/otlp.md as the source of truth.

IMO, this is not a breaking change in any way, just a typo correction in the spec.

@arminru
Copy link
Member

arminru commented Apr 26, 2021

I would consider this a typo in specification/protocol/exporter.md that needs to be fixed. specification/protocol/otlp.md is the canonical information for OTLP and it is corerct.

I agree.

We should probably remove the list of codes from specification/protocol/exporter.md and just refer to specification/protocol/otlp.md as the source of truth.

Yes, that would be cleaner. @paivagustavo could you adapt the PR accordingly? Thanks!

@tigrannajaryan
Copy link
Member

Let's make sure we also have this change listed in CHANGELOG so that anyone who implemented it incorrectly can fix. I believe it should be considered an implementation bug, no a breaking behavior change.

@paivagustavo
Copy link
Member Author

Thanks for the review @arminru and @tigrannajaryan. I've addressed the suggestions made.

I wasn't sure about which section of the CHANGELOG should be updated, so I created an OpenTelemetry Protocol section, hope that's okay.

Comment on lines -68 to -71
| HTTP Status Code | Description |
| ---------------- | ----------- |
| 408 | Request Timeout |
| 5xx | Server Errors |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced section in otlp.md#failures actually does not specify this.
Could you a statement there pointing out that 408 and 5xx are considered retryable or otherwise leave this table here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I've added a statement that point to that info. But I think It would be better if we have a full table just like the gRPC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be ideal yes. The current state is enough for this bugfix PR, however.

@arminru arminru requested review from a team April 27, 2021 16:07
@tigrannajaryan tigrannajaryan merged commit 2a712f7 into open-telemetry:main Apr 29, 2021
@paivagustavo paivagustavo deleted the otlp_exporter_transient_errors branch April 29, 2021 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants