Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ordering of event drop mechanisms #1390

Merged
merged 3 commits into from Apr 12, 2022
Merged

Conversation

adinauer
Copy link
Member

As requested by @mitsuhiko this PR shall serve as basis for discussing the ordering of event drop mechanisms and its implications.

We are planning for sample_rate to update the session counts despite dropping an event (see getsentry/develop#551 and getsentry/develop#537). Without changing the order of filtering mechanisms this would mean any event dropped by sample_rate would update the session even if it would be dropped by ignore_errors which should not update the session counts when dropping an event. By changing the order we would first drop ignored_errors and only then check sample_rate, so session counts would not be affected in the case mentioned before. The same reasoning could probably be applied to event_processor and before_send but we don't know why a developer decided to drop an event there. Was it because they don't care about the event (then session should not be updated) or to save quota (session should be updated)? Also these may be more expensive in terms of performance (developers can provide their own implementations for both of those on some SDKs). So moving them before sample_rate would execute before_send and event_processor for every event instead of only doing it for the sampled events.

As requested by @mitsuhiko this PR shall serve as basis for discussing the ordering of event drop mechanisms and its implications.

We are planning for `sample_rate` to update the session counts despite dropping an event (see getsentry/develop#551 and getsentry/develop#537). Without changing the order of filtering mechanisms this would mean any event dropped by `sample_rate` would update the session even if it would be dropped by `ignore_errors` which should not update the session counts when dropping an event. By changing the order we would first drop `ignored_errors` and only then check `sample_rate`, so session counts would not be affected in the case mentioned before. The same reasoning could probably be applied to `event_processor` and `before_send` but we don't know why a developer decided to drop an event there. Was it because they don't care about the event (then session should not be updated) or to save quota (session should be updated)? Also these may be more expensive in terms of performance (developers can provide their own implementations for both of those on some SDKs). So moving them before `sample_rate` would execute `before_send` and `event_processor` for every event instead of only doing it for the sampled events.
@antonpirker
Copy link
Member

antonpirker commented Apr 12, 2022

I did some refactoring to make the code better readable.
Unfortunately this made the PR more complicated to read..

Here a tl:dr of what the code change does:

Dropping events decision before:

  1. first drop events by sample_rate
  2. then drop events that are in ignore_errors config option.

Now changed to dropping events decision like this:

  1. first drop events that are in ignore_errors config option
  2. then drop events by sample_rate

After events where dropped (or not dropped) the sessions are counted here:
https://github.com/getsentry/sentry-python/blob/master/sentry_sdk/client.py#L344
(a couple of lines above this the _should_capture() is called (that does the dropping decisions mentioned above)

The idea (not yet reflected in code) is now that we do not do the dropping first for all events and then update the sessions. But do the session update right before an event is dropped by sample_rate.

So that all events that are not dropped are counted in sessions and also events that where dropped because of the sample rate are counted in the sessions.

@adinauer
Copy link
Member Author

@antonpirker the way I understood what we're planning to do is different from what you describe here:

The idea (not yet reflected in code) is now that we do not do the dropping first for all events and then update the sessions. But do the session update right before an event is dropped by sample_rate.

Events dropped by before_send or event_processor should still not have an effect on session counts. So it's not just a matter of putting the session update a bit earlier in the code.

@antonpirker
Copy link
Member

antonpirker commented Apr 12, 2022

Ok, so in the _prepare_event() the before_send and event_processor events are dropped:
https://github.com/getsentry/sentry-python/blob/master/sentry_sdk/client.py#L336

So if events are dropped in _prepare_event(), they should not be counted as sessions.

So the solution would be to move the "sample rate dropping code" out of _should_capture() and put it after the call to _prepare_event() and before the call to _update_session_from_event.

This way we do all the "dropping events" stuff first (based on ignored_errors option, before_send, event_processor, recursion prevention.) and the those events still left will be counted towards the sessions and will be then dropped by sample_rate (and if dropped by sample_rate, they will still be counted towards the session.

Is this the behavior this should have?

@adinauer
Copy link
Member Author

The initial consensus only specified moving ignored_errors before the sample_rate check. If we also move before_send and event_processors we should discuss implications of that and if we are OK with them.

Things to consider:

  • For before_send and event_processors we can't be certain why a developer dropped the event, so we don't really know whether to count a dropped event or not. My guess is this doesn't really matter since it might be wrong either way (whether we sample before or after).
  • Are we OK with before_send and event_processors being executed for every event where we previously only applied it to sampled events (i.e. events not dropped due to sampling). In other SDKs both can be provided by the developers so we can't predict what exactly they're doing. They might have adjusted their sample rate for a good tradeoff regarding performance and now it would execute more often.

@antonpirker
Copy link
Member

Ok. Now I got it. Thanks for clarification.

So I think we can safely deploy this change here because it does not break anything.

I think we should discuss before_send and event_processors in a separate issue/pr

@antonpirker antonpirker merged commit 91436cd into master Apr 12, 2022
@antonpirker antonpirker deleted the adinauer-patch-1 branch April 12, 2022 11:30
adinauer added a commit that referenced this pull request Apr 14, 2022
We have already merged #1390 where we moved `ignore_errors` before sampling.

Background for the order change:
We want to update the session for dropped events in case the event is dropped by sampling. Events dropped by other mechanisms should not update the session. See getsentry/develop#551

Now we would like to discuss if we can simply move sampling after `before_send` and `event_processor` and update the session right before sampling.
What are implications of changing this?
* How does this affect session count and session crash rate?
* Will this have a negative effect on performance as `before_send` and `event_processor` will now be executed for every event instead of only being executed for sampled events? Developers may have fine tuned their sample rate for a good performance tradeoff and now we change. Also developers can supply their own implementations for both `before_send` and `event_processor` on some SDKs so we have no way of predicting performance I'm afraid.
* We are uncertain why a developer chose to drop an event in `before_send` and `event_processor`:
** Was it because they want to ignore the event - then it shouldn't update the session
** Or was it to save quota - then it should update the session

Please feel free to optimize the code this is just to start the discussion.
adinauer added a commit to getsentry/develop that referenced this pull request Apr 20, 2022
Python now serves as reference implementations. We've recently changed the order there, see getsentry/sentry-python#1394 and getsentry/sentry-python#1390
adinauer added a commit to getsentry/develop that referenced this pull request Apr 26, 2022
* Document session update for dropped events

See #537 and getsentry/sentry-java#1916

Do we also want to document order of the filtering mechanisms? If so do we go with the python implementation order as a template?

* Add filter order

Python now serves as reference implementations. We've recently changed the order there, see getsentry/sentry-python#1394 and getsentry/sentry-python#1390

* Session update should be sent...

... despite the event being dropped in application mode for certain cases.
brustolin pushed a commit to getsentry/develop that referenced this pull request Apr 26, 2022
* Document session update for dropped events

See #537 and getsentry/sentry-java#1916

Do we also want to document order of the filtering mechanisms? If so do we go with the python implementation order as a template?

* Add filter order

Python now serves as reference implementations. We've recently changed the order there, see getsentry/sentry-python#1394 and getsentry/sentry-python#1390

* Session update should be sent...

... despite the event being dropped in application mode for certain cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants