Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in _Unwind_RaiseException on Android #783

Closed
1 of 3 tasks
triplef opened this issue Dec 22, 2022 · 12 comments
Closed
1 of 3 tasks

Crash in _Unwind_RaiseException on Android #783

triplef opened this issue Dec 22, 2022 · 12 comments
Assignees
Labels

Comments

@triplef
Copy link

triplef commented Dec 22, 2022

Description

Calling _Unwind_RaiseException() on Android causes the following segfault for us using sentry-native v0.5.1 or later on Android via sentry-java. The issue does not happen using v0.5.0.

* thread #42, name = 'qtMainLoopThrea', stop reason = signal SIGSEGV: invalid address (fault address: 0x22)
    frame #0: 0x0000000000000022
    frame #1: 0x000079370dc96baa libsentry.so`___lldb_unnamed_symbol1159$$libsentry.so + 10
    frame #2: 0x000079370dc9692c libsentry.so`___lldb_unnamed_symbol1146$$libsentry.so + 12
    frame #3: 0x000079370dc93973 libsentry.so`___lldb_unnamed_symbol1090$$libsentry.so + 131
    frame #4: 0x000079370dc93800 libsentry.so`__gxx_personality_v0 + 224
    frame #5: 0x0000793707df3037 libobjc.so`_Unwind_RaiseException(exc=0x00007937b7e55af0) at unwind.inc:113
    frame #6: 0x0000793707dd0ec3 libobjc.so`objc_exception_throw(object=0x0000793797ef0ad8) at eh_personality.c:253:28

We can reproduce the issue using sentry-java v6.6.0 or later, which bundles sentry-native v0.5.1. Using sentry-java v6.5.0 (bundling sentry-native v0.5.0) does not reproduce the issue.

We suspect the updated libunwindstack-ndk in v0.5.1 causing this issue.

Unfortunately I haven’t found documentation on how to build sentry-java for Android. I’m happy to debug/triage this further if someone could point me in the right direction.

We use Sentry with our Qt-based app using Objective-C on Android. In this setup, _Unwind_RaiseException() is called by the libobjc2 runtime to raise Objective-C exceptions.

When does the problem happen

  • During build
  • During run-time
  • When capturing a hard crash

Environment

  • OS: Android 11
  • Compiler: Clang bundled with Android NDK 22.1.7171670

Steps To Reproduce

  1. Use hello-objectivec project from https://github.com/gnustep/android-examples
  2. Add sentry-android dependency
  3. Throw Objective-C exception

Log output

None.

@Swatinem
Copy link
Member

Swatinem commented Jan 3, 2023

Thanks for reporting this! This is a bit surprising, as we are not actively hooking into the _Unwind_* APIs.

Its possible that some change in libunwindstack indeed caused this, we are taking a look.

@kerenkhatiwada
Copy link
Member

Changing the status based on the above. Please feel free to change. :)

@supervacuus
Copy link
Collaborator

Hi @triplef. Sorry for the delay, but I invested a bit of time in analyzing the problem. The TLDR is this:

  • I wasn't able to reproduce your segfault with hello-objectivec, but I have an idea where it could come from
  • we can also go further with a repro of the bug, but I think the following point is the more sensible approach:
  • I am going to prevent client code from accidentally hooking into "our" potentially incompatible exception-handling primitives
  • if you want to give it a try, you can check whether updating your NDK to at least version 23.1.7779620 prevents the crash

For more detail, feel free to read on. There are two aspects to the issue:

  1. we statically link all our stdlib and cxxabi dependencies so that libsentry.so is independent of the environment on the target device as much as possible (whether this is a good strategy is yet another topic not tackled in this comment). But at the same time, we export all these symbols, which leads client code that looks up symbols at runtime to end up in our library code, potentially connecting incompatible code paths.
  2. Some choice (or rather accident) in our release-build or in the changes from upstream produced a code path that affects you without having a direct interaction with any code in libunwindstack itself.

While the second aspect is the one that is currently affecting you, it is the first that we should fix. Let me first elaborate on the second:

_Unwind_RaiseException is a function that looks up __cxx_personality_v0 to find the corresponding try/catch block to the raised exception. This look-up finds the implementation inside libsentry.so instead of the cxxabi the toolchain provides. This ABI is relatively stable, but there can be breaking changes. For example, in the NDKs I have locally, I can at least see that the alignment of some of the members in __cxa_exception (which corresponds to _Unwind_Exception) has changed. If your version and ours disagree on the data layout, that could lead to a segmentation fault in dependent code.

I can't tell whether this (or a) change in the ABI causes the issue because I could not reproduce the segmentation fault even using outdated NDKs (compared to the ones used in the sentry-android builds). I tried it with five different NDK versions between 21.1.6352462 and 25.1.8937393 against sentry-android 6.5.0, 6.6.0, and the latest release 6.12.1 and in all cases, the ObjectiveC exception thrown in hello-objectivec ends in an ABRT as a result of a

Uncaught exception NSInternalInconsistencyException, reason: Testing Objective-C exception from -[Test throwException]

which is what I would expect in such a situation. So there might be yet another unknown variable (target-arch?).

I also looked into the release build of sentry-android: they always track the default NDK per AGP version. For both 6.5.0 and 6.6.0, that was AGP 7.3 (for which the default NDK is the same as for 7.4: 23.1.7779620). So I can only ask you to update your NDK to at least that version, but don't get your hopes up since, as described above, it had little effect on my tests.

I used Android Studio 221.6008.13.2211.9477386 during testing, and it forced me to update hello-objectivec to use a more recent compile/target SDK, which in turn required me to switch to androidx for the AppCompatActivity. But that shouldn't affect the stack trace you are seeing.

No element in the code path from __cxx_personality_v0 up to your crash is related to libunwindstack. The _Unwind* "namespace" is part of the clang/llvm unwind-infra (because it is its EH unwinder). At the same time, the __cxx_personality_v0 counterpart below is part of the CXXABI (= llvm libcxxabi). libunwindstack is entirely separate from these modules and is part of the Android platform. The only connection is that libunwindstack is a C++ library that uses parts of the standard library that can potentially throw, which enables EH and references (and, in our case, embeds) its handling primitives.

This brings me back to aspect 1: it shouldn't be necessary for the Native SDK or our client code to consider incidental code paths. libunwindstack is an implementation detail, and none of our consumers have direct access to it or will ever receive exceptions from it. There is no need to expose this machinery and open the door for issues like these. If you raise an exception the EH unwinder should only defer to the primitives of the toolchain you provided for your build. There is an open issue with the same root-cause, and I think we should just fix it.

@supervacuus supervacuus self-assigned this Jan 26, 2023
@triplef
Copy link
Author

triplef commented Jan 26, 2023

Thank you very much for looking into this and your detailed summary of the underlying issues @supervacuus! 🙏

If I understand it correctly, the underlying issues of internal symbols being exported is planned to be fixed via #363.

Regarding the NDK version: we’re using 22.1.7171670 for our project and the GNUstep toolchain, because that’s the version used by Qt 5.15 which we also depend on. If you used the most recent pre-built GNUstep Android Toolchain release from GitHub that was built with 25.1.8937393.

Could reproducing the issue depend on which NDK version the libobjc2 runtime was built with, since that is what is calling into the personality function? If so that might explain why you weren’t able to reproduce. Unfortunately it’s unclear to me to what extend NDK releases are supposed to be compatible – I think I read that compatibility is not guaranteed, but at the same time everyone seems to mix binaries built with different NDK releases (because building everything from source would be too much of a pain).

In any case I’ll be watching #363 for the proper fix – if there’s anything I can do to help get this merged please let me know.

@supervacuus
Copy link
Collaborator

Yes, I downloaded the latest pre-built and did not look into which NDK was used for the build. That would certainly explain why I do not see the issue. If the root cause is indeed in EH primitives, then it is, of course, more critical which NDK was used for building the Objective-C runtime rather than the app :-) It also substantiates my suspicion (though open to further verification), so thanks for the heads up.

Regarding your question on NDK compatibility: I am not the right person to answer this, but I guess that breakage is very rare, depending on the abstraction level you are operating on. Again, I am not even sure the changes are the reason for your segfault; they were just the most apparent modifications at that level.

@triplef
Copy link
Author

triplef commented Jan 26, 2023

Thanks, that makes sense.

If you do want to try reproduce it once more, the previous toolchain build was done with NDK 22.1.7171670:
https://github.com/gnustep/tools-android/releases/tag/release-20210912

@triplef
Copy link
Author

triplef commented Feb 14, 2023

Now that #363 is released (thank you!) would it be possible to get the sentry-native dependency of sentry-java updated so we can give this a go? 🙏

@supervacuus
Copy link
Collaborator

Hi @triplef. The process is the following: pretty soonish, a GitHub bot will signal the Java SDK team that a new Native SDK release is out. Then they will integrate the update into their source tree and release it with their next release if our update doesn't raise any further issues on their side. When this happens, or if they will wait with the integration until a more significant release, I cannot say.

I would still leave this issue open because even with a new sentry-android + sentry-native@0.6.0 out, the change could potentially not solve your problem. I could not reproduce this issue even with release-20210912.

CC @markushi.

@markushi
Copy link
Member

Here's a link to the relevant PR on sentry-java: getsentry/sentry-java#2545

@markushi
Copy link
Member

markushi commented Mar 3, 2023

@triplef we just released 6.16.0-beta.1 of sentry-android which is linked against sentry-native@0.6.0, it would be great if you could give it a try!

@triplef
Copy link
Author

triplef commented Mar 7, 2023

I tried 6.16.0-beta.1 and it works correctly now!

Thank you, I really appreciate the work that went into supporting our probably somewhat rare case. 🙏

@triplef triplef closed this as completed Mar 7, 2023
@supervacuus
Copy link
Collaborator

@triplef, thanks for giving the beta a test run! That is valuable feedback. While your case might be rare, the root cause can reach a broader audience. Revealing private dependencies is always a maintenance nightmare, especially at that level of abstraction. So, thanks also for providing another reason to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants