Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV on Android 7 #1683

Closed
PaulWoitaschek opened this issue Nov 29, 2019 · 16 comments
Closed

SIGSEGV on Android 7 #1683

PaulWoitaschek opened this issue Nov 29, 2019 · 16 comments
Labels

Comments

@PaulWoitaschek
Copy link
Contributor

Issue

After updating my application I faced a lot of crashes on android 7 devices. I couldn't reproduce it on my nexus or on the emulator so I bought one of the crashing devices.
After spending half of the day I finally managed to find the root cause and I can reproduce it in a hello world project.

Sample

class MainActivity : Activity() {

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        CoroutineScope(SupervisorJob() + Dispatchers.Main).launch {
            while (true) {
                combine(flowOf(Unit), flowOf(Unit)) { _ -> Unit }.collect()
            }
        }
    }
}

Crash

This leads to a sigsegv

    Flags: 0x28c8bf46
    Package: com.example.weirdcrash v1 (1.0)
    Foreground: Yes
    Build: TCL/5009D/U5A_PLUS_3G:7.0/NRD90M/5009D_ALWE_V2.9_20180529:user/release-keys
    
    *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
    Build fingerprint: 'TCL/5009D/U5A_PLUS_3G:7.0/NRD90M/5009D_ALWE_V2.9_20180529:user/release-keys'
    Revision: '0'
    ABI: 'arm'
    pid: 29639, tid: 29639, name: mple.weirdcrash  >>> com.example.weirdcrash <<<
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x15608
        r0 00015000  r1 12d35700  r2 00000002  r3 00000000
        r4 af58a6d8  r5 12d31ee0  r6 12d37280  r7 00000000
        r8 12d35700  r9 ac684400  sl 12cb0a90  fp 12d33bf8
        ip bef2f84c  sp bef2f8b0  lr 9901f5ef  pc 9901f608  cpsr 000f0030
    
    backtrace:
        #00 pc 00000608  /dev/ashmem/dalvik-jit-code-cache (deleted)
        #01 pc 000005ed  /dev/ashmem/dalvik-jit-code-cache (deleted)
     29639

I currently see about 1200 affected users..

Versions

'com.android.tools.build:gradle:3.6.0-beta04'
"org.jetbrains.kotlinx:kotlinx-coroutines-core:1.3.2"
"org.jetbrains.kotlinx:kotlinx-coroutines-android:1.3.2"

Affected Devices

image
image
image
image
image

Note

Take a look at the last crash, maybe that's an indicator for what's going wrong. That one is no native crash but an android 7 only issue.

@PaulWoitaschek
Copy link
Contributor Author

I boiled it further down by removing the dependency on coroutines-android and using the GlobalScope:

class MainActivity : Activity() {

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        GlobalScope.launch {
            while (true) {
                delay(100)
                combine(flowOf(Unit), flowOf(Unit)) { _ -> Unit }.collect()
            }
        }
    }
}
    *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
    Build fingerprint: 'TCL/5009D/U5A_PLUS_3G:7.0/NRD90M/5009D_ALWE_V2.9_20180529:user/release-keys'
    Revision: '0'
    ABI: 'arm'
    pid: 3096, tid: 3125, name: DefaultDispatch  >>> com.example.weirdcrash <<<
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x155f8
        r0 00015000  r1 12d09f40  r2 00000002  r3 00000000
        r4 af583768  r5 12d06fe0  r6 12d0f790  r7 00000000
        r8 12d09f40  r9 a1a1aa00  sl 12c93520  fp 12d3c280
        ip 9639333c  sp 963933a0  lr 99020e4f  pc 99020e68  cpsr 20070030
    
    backtrace:
        #00 pc 00001e68  /dev/ashmem/dalvik-jit-code-cache (deleted)
        #01 pc 00001e4d  /dev/ashmem/dalvik-jit-code-cache (deleted)
     3096

@Tolriq
Copy link

Tolriq commented Dec 3, 2019

Most probably an R8 issue generating code that those device improperly handle as a few other cases.

You should report to https://issuetracker.google.com/issues?q=componentid:326788%20status:open they are quite reactive and if the user base is high enough they might add a workaround.

@PaulWoitaschek
Copy link
Contributor Author

@qwwdfsad
Copy link
Member

qwwdfsad commented Dec 9, 2019

Thanks for the detailed write-up!
The last crash looks familiar to me and usually reproduces when we have a bug in the inliner :) ISE basically says that the same continuation was resumed twice.
It is, indeed, looks like an Android issue, otherwise, it would be reproducible on a regular JVM and/or other devices.

It could be because of the fact that for crossinlined coroutines (and flow has a lot of crossinline) pretty unusual bytecode (-> dex) shape is produced and some of the manufacturers could have missed it in their test suites. So as a (really limited) workaround, you can try to minimize the number of invoked
functions with crossinline lambdas or to isolate them. E.g. in your reproducer, it makes sense to extract combine(flowOf(Unit), flowOf(Unit)) { _ -> Unit }.collect() into a separate method and see if that helps

@PaulWoitaschek
Copy link
Contributor Author

My current workaround is to create my own combine functions which are based on rxjava:

/**
 * Because of a bug in android 7 we can't use the regular combine functions and fallback to RxJava
 * @see [https://github.com/Kotlin/kotlinx.coroutines/issues/1683]
 */
@PublishedApi
internal class Wrapper<T>(val value: T)

@PublishedApi
internal inline fun <T, R> combine(vararg sources: Flow<T>, crossinline transform: suspend (List<T>) -> R): Flow<R> {
  val flowables = sources.map { it.map { value -> Wrapper(value) }.asFlowable() }
  return Flowable
    .combineLatest(flowables) {
      @Suppress("UNCHECKED_CAST")
      it.toList() as List<Wrapper<T>>
    }
    .asFlow()
    .map { wrappedValues ->
      val unwrapped = wrappedValues.map { it.value }
      transform(unwrapped)
    }
}

inline fun <T1, T2, R> combine(
  flow: Flow<T1>,
  flow2: Flow<T2>,
  crossinline transform: suspend (T1, T2) -> R
): Flow<R> = combine(flow, flow2) { args ->
  @Suppress("UNCHECKED_CAST")
  transform(
    args[0] as T1,
    args[1] as T2
  )
}

...

With that in place all native crashes are gone.

@PierreFourreau
Copy link

PierreFourreau commented Feb 6, 2020

Hi, we have exactly the same problem on a lot of Devices (processor Mediatek, Android 7) with a big combine function in our viewmodel ( 6 Flows combined (one combine with 4 Flows and one Flow which is a combine of 2 Flows) ).

I can reproduce this bug on a Doogee Mix Lite.

What is your advice please ? Use other operator ? Add a debounce, buffer ? I don't want to use Rx 😅 thanks

@qwwdfsad
Copy link
Member

@PierreFourreau try to extract combine method to a separate, non-inline function

@qwwdfsad
Copy link
Member

Closing, please keep track on the https://issuetracker.google.com/issues/145569946 (thanks Paul for filing this!)

@PaulWoitaschek
Copy link
Contributor Author

I have written a non-rx version:

/**
 * Because of a bug in android 7 we can't use the regular combine functions and have to use our own.
 * @see [https://github.com/Kotlin/kotlinx.coroutines/issues/1683]
 */
@PublishedApi
internal object UnInitialized

@PublishedApi
internal inline fun <T, R> combine(vararg sources: Flow<T>, crossinline transform: suspend (List<T>) -> R): Flow<R> {
  return channelFlow {
    val values = Array<Any?>(sources.size) { UnInitialized }
    coroutineScope {
      sources.forEachIndexed { index, flow ->
        launch {
          flow.collect { value ->
            values[index] = value
            if (values.all { it !== UnInitialized }) {
              @Suppress("UNCHECKED_CAST")
              send(transform(values.toList() as List<T>))
            }
          }
        }
      }
    }
  }
}

@bnvinay92
Copy link

Is your crash reproducible on every launch, or just the first one? We're facing a similar SIGSEG in libart.so, but on the first launch after update. Wondering if it's also got to do with R8.

@PaulWoitaschek
Copy link
Contributor Author

On every launch.

@mhernand40
Copy link

@PaulWoitaschek in your minimized repro, I see that combine(flowOf(Unit), flowOf(Unit)) { … } is being used. I am curious if the repro still occurs if you were to use the Flow.combine extension, e.g., flowOf(Unit).combine(flowOf(Unit)) { … }, instead. Looking at the source code, it does not make use of crossinline. Wondering if this might be a possible workaround instead of having to come up with an Rx-based solution?

@mhernand40
Copy link

I guess with my suggestion, the assumption is that your use case only requires combining two Flows.

@mhernand40
Copy link

@qwwdfsad, let's say I have a class that returns a Flow that internally uses combine like so:

class ObserveFoo(val dep1: Dep1, val dep2: Dep2, …, val depN: DepN) {
    operator fun invoke(): Flow<Foo> {
        return combine( // Real combine function from kotlinx.coroutines.flow.combine 
            dep1.observe(),
            dep2.observe(),
            …
            depN.observe()
        )
    }
}

Would this still run the risk of producing native crashes on some Android devices based on what we currently know?

I am currently migrating some existing RxJava code over to Coroutines/Flow but I want to make sure I do not introduce native crashes as a result. Thank you!

@qwwdfsad
Copy link
Member

qwwdfsad commented Aug 3, 2020

@mhernand40 Unfortunately, we do not know. The stable reproducer is required in order to check such a hypothesis

qwwdfsad added a commit that referenced this issue Oct 19, 2020
    * Do not allocate array on each transform call in combine, do it only for combineTransform
    * Simplify zip operator even further
    * Get rid of crossinline in combine to fix weird Android crashes

Fixes #1743
Fixes #1683
qwwdfsad added a commit that referenced this issue Oct 20, 2020
* Rework Flow.zip operator: improve its performance by 40%, collect one of the upstreams in the same coroutine as emitter

* Rework Flow.combine
    * Get rid of two code paths
    * Get rid of accidental O(N^2) where N is the number of flows that caused #2296
    * Get rid of select that hits performance hard, improving performance by 50% in the pessimistic case
    * Get rid of crossinlines in API and implementation to fix Android issues
    * Make combine fairer and its results less surprising in sequential scenarios

* Improve stacktrace recovery and stackwalking for SafeCollector, flowOn and zip operators
* Update JMH

Fixes #1743
Fixes #1683
Fixes #2296
@httpdispatch
Copy link

The combine operator is still unstable in some Android 7 devices (Sony Xperia XA F3112 2/16, Kotlin 1.8-1.9.10, Coroutines 1.6.4-1.7.3). It just stops working without crashes randomly when combine is used for 16 different flows. However the custom implementation from here works without issues. I hope it will be fixed in the coroutines library someday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants