Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdk/metric: Record measurements when context is done #4671

Merged
merged 15 commits into from
Dec 15, 2023

Conversation

pellared
Copy link
Member

@pellared pellared commented Oct 27, 2023

Fixes #4750

I think that the ctx passed to the "metric recording" API should be used only for "context propagation" meaning e.g. correlation with tracing. I do not think it should cancel the recordings.

Similarly we can still record spans when the context is canceled.

Some issues related with the current behavior:

@pellared pellared marked this pull request as ready for review October 27, 2023 17:21
@pellared pellared changed the title [Proposal] Allow recording using a canceled context [Proposal] Allow recording when context is done Oct 27, 2023
@pellared pellared changed the title [Proposal] Allow recording when context is done [Proposal] sdk/metric: Allow recording when context is done Oct 27, 2023
@pellared
Copy link
Member Author

This is also ready for review.

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user passes a context to Add or Record it would seem like a reasonable expectation that its deadline and cancellation would be honored. And their call to these methods would not hold much beyond the cancellation. This seems to be the intended behavior of a context and something ubiquitous throughout the Go community's use of them.

Why should our default behavior be different here?

It seems like a caller has the responsibility to manage the context and pass one in the state they desire. If they want to pass the values in a context that is already cancelled why isn't it their responsibility then to copy the values to a new context?

@MrAlias
Copy link
Contributor

MrAlias commented Oct 27, 2023

If a user passes a context to Add or Record it would seem like a reasonable expectation that its deadline and cancellation would be honored. And their call to these methods would not hold much beyond the cancellation. This seems to be the intended behavior of a context and something ubiquitous throughout the Go community's use of them.

Why should our default behavior be different here?

It seems like a caller has the responsibility to manage the context and pass one in the state they desire. If they want to pass the values in a context that is already cancelled why isn't it their responsibility then to copy the values to a new context?

It seems this was the intent of WithoutCancel. I think that should be prioritized instead of accommodating here.

@pellared
Copy link
Member Author

pellared commented Oct 27, 2023

It seems this was the intent of WithoutCancel. I think that should be prioritized instead of accommodating here.

Nice idea. I have missed this new method when reading release notes👍 The only problem is that it is not available in Go 1.20 that we still support. Using it in go-contrib once we drop support for Go 1.20.

I will leave this proposal opened until the next SIG.

EDIT: Personally, I think it is better to hold off until we get more feedback that would support this proposal.

@MrAlias
Copy link
Contributor

MrAlias commented Oct 27, 2023

It seems this was the intent of WithoutCancel. I think that should be prioritized instead of accommodating here.

Nice idea. I have missed this new method when reading release notes👍 The only problem is that it is not available in Go 1.20 that we still support. Using it in go-contrib once we drop support for Go 1.20.

I will leave this proposal opened until the next SIG.

The implementation looks easy enough to copy if that solution is wanted before Go 1.20: https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/context/context.go;l=566-595

@pellared
Copy link
Member Author

pellared commented Nov 6, 2023

SIG meeting:

  1. It will be safer to not change the behavior.
  2. The user may expect the context cancelation to be handled.
  3. The user may always use a context which ignores the cancelation.

@pellared
Copy link
Member Author

For reference:

Canceling the context should not affect record processing.
(Among other things, log messages may be necessary to debug a
cancellation-related problem.)

Source: https://pkg.go.dev/log/slog#Handler

@pellared
Copy link
Member Author

pellared commented Dec 6, 2023

The user may always use a context which ignores the cancelation.

While this is true. I do not think users will do it unless they face an issue. We could improve it with Go doc comments, but I do not expect it will help a lot.

More issues related with the current behavior:

Personally, I am leaning towards recreating this PR. Changing the metrics SDK behavior would be a breaking change, but I think it would help 90% of the use cases and I doubt it will cause any harm to anyone. Even if so I think it may be an acceptable breaking behavioral change. We can always revert the change in a patch release if it occurs that we were wrong. I bet there is 1% chance that we would have to revert it and 99% that the users would be satisfied with the change. Also it is worth to mention that the user can always do if ctx.Err() != nil before making the synchronous measurements to achieve current behavior.

What is more, I want to have the same behavior for Logs API+SDK for similar reasons. Reference: 21bea0f

@pellared pellared restored the record-on-ctx-canceled branch December 6, 2023 09:34
@pellared pellared reopened this Dec 6, 2023
@pellared pellared changed the title [Proposal] sdk/metric: Allow recording when context is done sdk/metric: Allow recording when context is done Dec 6, 2023
@pellared
Copy link
Member Author

pellared commented Dec 8, 2023

The PR was discussed during SIG meeting on 2023-12-07.

  • We double-checked that cancelation of the context passed to span.Start is ignored as well.
  • Other calls to the SDK (such as provider shutdown) should still honor the passed context. The change is only about making telemetry recordings.
  • It seems that we an agreement that the change is acceptable.
  • Still, we agreed that the decision must be documented in CONTRIBUTING.md as ignoring a context cancelation is not something usual.

@pellared pellared changed the title sdk/metric: Allow recording when context is done sdk/metric: Record measurements when context is done Dec 8, 2023
@pellared pellared changed the title sdk/metric: Record measurements when context is done sdk/metric: Record measurements when context is canceled Dec 8, 2023
@pellared pellared changed the title sdk/metric: Record measurements when context is canceled sdk/metric: Record measurements when context is done Dec 8, 2023
@its-felix
Copy link

I think it's fine to ignore the cancellation properties of a context here as long as those of the context given to the exporter are used properly

@seh
Copy link
Contributor

seh commented Dec 8, 2023

If recording the metric requires any blocking operations, those should respect the supplied Context being canceled already. Otherwise, the Context being done doesn't indicate that all future work must cease as soon as possible. Rather, it means not to proceed with work that must block indefinitely. If you expect that recording a metric value will complete "quickly", it's fine to proceed even if the governing Context is already done.

@jmacd
Copy link
Contributor

jmacd commented Dec 8, 2023

Here is my position: open-telemetry/opentelemetry-collector#9048 (comment)

@carrbs
Copy link
Contributor

carrbs commented Dec 8, 2023

If we merge this, should we also remove the withoutCancel implementation and usage in otelgrpc?

https://github.com/open-telemetry/opentelemetry-go-contrib/blob/6009b3188da4f25eb8d2e65ffeefb3bf4b7c3415/instrumentation/google.golang.org/grpc/otelgrpc/stats_handler.go#L148

@pellared
Copy link
Member Author

pellared commented Dec 8, 2023

If we merge this, should we also remove the withoutCancel implementation and usage in otelgrpc?

@carrbs Correct. We can prepare draft PR and make it ready for review after this one is merged.

CHANGELOG.md Outdated Show resolved Hide resolved
@carrbs
Copy link
Contributor

carrbs commented Dec 11, 2023

If we merge this, should we also remove the withoutCancel implementation and usage in otelgrpc?

@carrbs Correct. We can prepare draft PR and make it ready for review after this one is merged.

I can work on that. Should there be a new issue opened for this in the contrib repo beforehand?

@pellared
Copy link
Member Author

If we merge this, should we also remove the withoutCancel implementation and usage in otelgrpc?

@carrbs Correct. We can prepare draft PR and make it ready for review after this one is merged.

I can work on that. Should there be a new issue opened for this in the contrib repo beforehand?

You can just create a PR and reference this PR in its description.

Copy link

codecov bot commented Dec 15, 2023

Codecov Report

Merging #4671 (8326a00) into main (057f897) will increase coverage by 0.0%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #4671   +/-   ##
=====================================
  Coverage   82.1%   82.2%           
=====================================
  Files        225     225           
  Lines      18325   18319    -6     
=====================================
- Hits       15061   15059    -2     
+ Misses      2979    2977    -2     
+ Partials     285     283    -2     
Files Coverage Δ
sdk/metric/instrument.go 100.0% <ø> (+6.4%) ⬆️

... and 1 file with indirect coverage changes

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
@pellared
Copy link
Member Author

@MrAlias Thanks for the suggestions. I made some (I think that 2) little corrections in your suggestions. Can you double-check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document policy for context cancelation handling
8 participants