Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Memory Termination Handling (OOM) #3870

Closed

Conversation

naftaly
Copy link
Contributor

@naftaly naftaly commented Apr 22, 2024

馃摐 Description

This PR introduces memory pressure and memory level systems in order to effectively track memory terminations and correctly report them.

Example Report

Screenshot 2024-04-22 at 3 34 56鈥疨M Screenshot 2024-04-22 at 3 35 08鈥疨M

馃挕 Motivation and Context

A lot has changed since this article about handling memory terminations. In order to bring Sentry OOM handling up to date, some additional heuristics are needed.

At this point, this is more of a proposal than a full PR since a few things would need to be updated to land this. For example, SentryAppMemoryTracker should be move to the dependency container.

馃挌 How did you test it?

Some new tests were added for new memory classes. More in depth testing is required as well as some changes to DI in order to allow for these tests.

馃摑 Checklist

You have to check all boxes before merging:

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

馃敭 Next steps

@philipphofmann
Copy link
Member

Hey @naftaly, thanks a lot for this PR. I like the idea of subscribing to memory pressure notifications to decide if a watchdog termination was an OOM. To merge this PR, we would require a couple of changes. We also need to investigate how reliable the new OOM is because we had plenty of confusion in the past for reporting OOMs that weren't OOMs. As you stated in the PR description, this is more of a proposal. Would you rather prefer guidance on getting this PR merged, or are you OK with us taking this as an inspiration and opening a new PR?

With this PR, we can also tackle #2514.

@naftaly
Copy link
Contributor Author

naftaly commented Apr 23, 2024

I鈥檇 love some guidance on getting this PR merged, but I can also help you folks take inspiration and go from there, I can leave it up to you which way you prefer.

@philipphofmann
Copy link
Member

I鈥檇 love some guidance on getting this PR merged, but I can also help you folks take inspiration and go from there, I can leave it up to you which way you prefer.

@naftaly, we will discuss this in our internal sync meeting this week and get back to you.

@naftaly
Copy link
Contributor Author

naftaly commented Apr 23, 2024

@philipphofmann, sounds good. For a bit of info around how this works, check out the comments in SentryAppMemory.mm. I look forward to hearing back.

@philipphofmann
Copy link
Member

Hey, @naftaly.

We decided on a call that we want to implement the proposals of this PR in increments. We definitely would like to use the memory pressure logic for adding breadcrumbs, which is #2514. Would you be up for changing this PR or opening a new PR for this or would you rather leave it to us, which is not a problem at all.

We are unsure if we want to use the memory pressure info to change the OOM logic because we first need to investigate if it's working reliably. Furthermore, we need to figure out how we will reflect this in the product. As this is a bit more complicated and requires SDK and Sentry knowledge, we think it's better to add the implementation for this, but we can't give you an ETA yet. I created a new issue for this: #3890

@naftaly
Copy link
Contributor Author

naftaly commented Apr 24, 2024

@philipphofmann, all sounds good. I鈥檒l let you folks take over both aspects, and if you need any help feel free to reach out.

When I worked at Meta, I implemented something similar to this in the termination pipeline and it changed how we worked with OOMs. The detection was almost perfect. False positives were a thing of the past, sometimes one would be missed but then we鈥檇 end up reverting to fall through behaviour where it becomes a general WD termination.

On the implementation of it all. The memory pressure basically acts the same as the didReceiveMemoryWarning notification while in the foreground. It鈥檚 in the background that it becomes useful and has much better precision.

Overall, we鈥檝e discussed memory pressure, but the more important feature here is memory level. This is what is very novel to this PR (proposal), it actually gives you almost perfect knowledge surrounding the OS terminating due to getting close to the memory limit, which is the more likely scenario in day-to-day apps. You won鈥檛 get sudden huge allocation that kill the app, but this is much less common.

I look forward to seeing this in Sentry, it鈥檚 something that is not available in any of the other systems out there that are public and can change the way reliability teams work with OOMs (and prioritize issues).

@philipphofmann
Copy link
Member

Thanks for the detailed update and all the input @naftaly 馃.

@philipphofmann
Copy link
Member

We are going to pull code from this proposal once we tackle #3890, but I'm going to close this for now, as we aren't actively working on this right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants