Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Setting the default level for crash events in Crashpad to fatal #852

Merged
merged 7 commits into from
Jun 23, 2023

Conversation

bitsandfoxes
Copy link
Contributor

@bitsandfoxes bitsandfoxes commented Jun 20, 2023

So the story goes like this:
An Unreal crash report lands in relay but the crash_reason got ignored and asserts and ensures ended up with the severity FATAL.
Relay now reads the reason from the Unreal context and sets the level appropriately.

Unfortunately, Sentry simply overwrite the level set in Relay with FATAL anyway. With this PR, previously set levels get respected.

This PR aims to initialize crash events with FATAL but still allow users to overwrite the level.

@bitsandfoxes bitsandfoxes changed the title fix: Setting the default level for crash events to fatal fix: Setting the default level for crash events in Crashpad to fatal Jun 20, 2023
Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note to other reviewers:

The call down below to crashpad_backend_flush_scope has previously applied the default "warning" level based on the scope, which then ended up in the event/scope file that is written to disk and being sent by crashpad.

With this solution, the customer also has a way to manually overwrite this value in their crash hook if they want to.

As this code is in the crash hook, what I believe is still missing is making sure this works on mac as well, which does not call these hooks, but rather just picks up the event file that is regularly written on scope changes.

@codecov
Copy link

codecov bot commented Jun 20, 2023

Codecov Report

Merging #852 (3c85ee7) into master (302c9f9) will decrease coverage by 0.03%.
The diff coverage is 50.00%.

❗ Current head 3c85ee7 differs from pull request most recent head 3f21f14. Consider uploading reports for the commit 3f21f14 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #852      +/-   ##
==========================================
- Coverage   82.53%   82.50%   -0.03%     
==========================================
  Files          53       53              
  Lines        7362     7368       +6     
  Branches     1186     1186              
==========================================
+ Hits         6076     6079       +3     
- Misses       1176     1179       +3     
  Partials      110      110              

@supervacuus
Copy link
Collaborator

With this solution, the customer also has a way to manually overwrite this value in their crash hook if they want to.

This hits at the core of whether this should be allowed. But yes, the idea was to enable hook users to overwrite the default behavior based on whatever they retrieve from the crash event since the scope application also checks whether level is set.

As this code is in the crash hook, what I believe is still missing is making sure this works on Mac as well, which does not call these hooks, but rather just picks up the event file that is regularly written on scope changes.

I am also unhappy with the asymmetry, but as far as I understood, the push for this change comes from the unreal-minidump handler in relay (which currently only affects Windows and Linux). The "native" minidump endpoint hasn't been touched, so for macOS, nothing changes because macOS minidumps (created with crashpad) will still be set via scope mutations or, at the latest, in relay. It is fair to defer this to when we have a first-chance handler for macOS. Or am I missing something essential?

supervacuus
supervacuus previously approved these changes Jun 21, 2023
Copy link
Collaborator

@supervacuus supervacuus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if @Swatinem does agree with my points in the last comment (and yeah, changelog :P ).

@supervacuus
Copy link
Collaborator

supervacuus commented Jun 21, 2023

The call down below to crashpad_backend_flush_scope has previously applied the default "warning" level based on the scope, which then ended up in the event/scope file that is written to disk and being sent by crashpad.

I am reluctant to this change because I think I still don't understand what happened in the backend that requires that change. I just tested master on macOS, and a crash now has level warning instead of fatal. So the (already live) backend change not only affects minidumps processed by the unreal handler but rather all minidumps. That was unclear to me (is this change tracked somewhere @bitsandfoxes?).

Was your suggestion @Swatinem to initialize the level in the scope to fatal? What would that break? At least it would cover all platforms.

@supervacuus supervacuus dismissed their stale review June 21, 2023 13:15

Effects of backend changes unclear at point of approval.

@supervacuus
Copy link
Collaborator

Okay, I found it: getsentry/sentry#50717.

Sorry, @bitsandfoxes, I misunderstood the severity of that change in our sync yesterday. There is no quick way to adapt to this cleanly in the Native SDK. Changing it only in the crashpad backend is also wrong because all minidumps will be affected, so we must ensure this also works for breakpad (which is easier because the handler works on all platforms and we can add the level to the crash-event before the hooks are invoked similar to the change in crashpad).

We could change the scope defaults. But this is a breaking change, and I wonder if it is okay from an SDK pov.

For crashpad, there is no way on macOS to set the level at the right point in time without implementing a FirstChanceHandler, making the crashpad_handler aware of the MessagePack scope or touching the defaults.

I am open to alternative proposals, though.

@Swatinem
Copy link
Member

The fundamental problem here is:

  • Previously, the backend would just hardcode "fatal" for all minidumps.
  • We want a way to overwrite that, so the backend respects an existing level without overwriting it
  • Well, it turns out that the apply_scope_to_event, and by extension the messagepack files written to disk for the crashpad integration get the default level from the scope, which is "warning".
  • We did not anticipate this, and as a result, with the backend changes, all minidump crashes are not treated as "warning" which is wrong. The should default to "fatal", but customers who want to should have a way to override that, via on_crash or whatever.

@supervacuus
Copy link
Collaborator

We did not anticipate this, and as a result, with the backend changes, all minidump crashes are not treated as "warning" which is wrong. The should default to "fatal", but customers who want to should have a way to override that, via on_crash or whatever.

Yeah, I understood it the same way. The problem - at least from my current understanding - is that changing the default in the scope affects every envelope we send and not only crashes. So if a user doesn't explicitly set the level in either the event or change it in the scope, everything we send will be FATAL. That is at least a breaking change (although I cannot say how severe that would be, especially compared to other SDKs).

I think the most sensible approach would be to

  • explicitly set the event level in every SDK crash handler available to FATAL (already happening in inproc, easily added for all breakpad platforms, possible for Windows and Linux in crashpad
  • leave the SDK defaults alone
  • and add a FIXME(supervacuus) clause to processing.py that handles crashes coming from macOS using crashpad as a backend to how it was done previously (i.e., ignore the level coming from the SDK and only check for the crashed property). Something like this:
# FIXME(supervacuus): remove this when we implement a FirstChanceHandler in macOS crashpad.
if os == "macOS" and backend == "crashpad" and response.get("crashed") is not None and response["crashed"]:
    # we crashed on macOS using the crashpad backend, so let us ignore the level from the SDK
    data["level"] = "fatal"
else:
    # do what is currently implemented
    if response.get("crashed") is not None and data.get("level") is None:
        data["level"] = "fatal" if response["crashed"] else "info"

This way we don't create a dance around shifting defaults, and the problem is solved for all in a somewhat future-proof way. The level coming from the SDK is only ignored for crashpad-on-macOS if it actually crashed... a situation where SDK users cannot override anyway.

Horrible? Please let me know!

Comment on lines +111 to +112
sentry_value_set_by_key(
event, "level", sentry__value_new_level(SENTRY_LEVEL_FATAL));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@supervacuus
Copy link
Collaborator

The failing breakpad integration test is due to this:

// the event we just prepared is empty,
// so no error is recorded for it
sentry__record_errors_on_current_session(1);

which can be removed.

@supervacuus
Copy link
Collaborator

We should extend the integration tests with checks for all crashes to be FATAL. No need to do this in this PR; I can see we are in a hurry.

Comment on lines +111 to +112
// FIXME: This should be handled in the FirstChanceHandler but that does
// not exist for macOS just yet.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! ❤️

@supervacuus supervacuus merged commit adb2689 into master Jun 23, 2023
18 of 19 checks passed
@supervacuus supervacuus deleted the fix/crashpad-event-level branch June 23, 2023 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants