Structured concurrency with two parent jobs #1001

antanas-arvasevicius · 2019-02-20T15:27:33Z

Hello,
Today we've found out that this code do not conform to official structured concurrency concept:

        runBlocking {
            val job = Job()

            val mainJob = launch {
                launch(context = coroutineContext + job) {
                    while (isActive) {
                        println("ACTIVE")
                        delay(1000)
                    }
                }.join()
            }.also {
                it.invokeOnCompletion {
                    println("CANCELLED")
                }
            }

            delay(1)
            mainJob.cancel()
            delay(6000)
            println("DONE")
        }

Prints:
CANCELLED
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
DONE

When canceling "mainJob" it won't cancel all children jobs inside.
Seems like calling launch with currentContext + job breaks parent-child relationships.
Logically we thought that custom job property will be used only as extra control for coroutine cancellation and not completely break parent-child relationships.
Is this behaviour by design or it's a bug?

Solutions I think could be one of:

extra "job" will be used only for coroutine cancellation trigger (extra trigger)
extra "job" would be attached implicitly to parent's job and becomes by default it's children (complicated)
at least add extra caution in documentation about this behaviour

elizarov · 2019-02-22T08:35:36Z

This behavior is by design. Job() is the context element that control parent-child relations of coroutine and when you install your own Job() into the context, you explicitly override this mechanism. Note, that can also create Job(parent), which gives you a choice on the parent of the job.

antanas-arvasevicius · 2019-02-22T09:10:10Z

Thanks for explanation. It was not clear from coroutine documentation how exactly parent-child tracking works.

So to make a "job" behave as an extra trigger and maintain structured concurrency the code should be written as:

val job = Job(coroutineContext[Job]!!)
launch(context = coroutineContext + job) {
}

Is it correct?
What do you think if coroutineContext plus operator will automatically merge a given job with a current job?
It would make less erroneous and less ambiguous, because now If I want to break parent-child I can use GlobalScope.launch {} or call this@ClassName.launch {} // if ClassName has CoroutineScope interface.

Another similar example which is currently not clear (example from android Activity, but could be e.g. some backend BackgroundService):

        class A : CoroutineScope {
            val job = Job()
            override val coroutineContext: CoroutineContext
                get() = Dispatchers.IO + job

            suspend fun run() = withContext(coroutineContext) {
                while (isActive) {
                    println("RUNNING")
                    delay(400)
                }
            }
            fun onDestroy() { job.cancel() }
        }

        runBlocking {
            val a = A()
            val job = launch { a.run() }
            delay(100)
            job.cancel()
            delay(4000)
        }
}

I pretend when cancelling 'job' it will cancel computations, but it is running indefinitely (actually this is currently in our production system :-)
And it's not really what we've expected. And strangely when I'm removing "job" from Dispatchers.IO then it cancels out, wouldn't then class A() by default have it's own empty job pool.
But would be better that by default kotlin will maintain parent-child relations unless explicitly stated. It really hard to understand is child jobs will cancel out when parent scope cancels or not especially for deep calls of suspend functions with various different scopes.

elizarov · 2019-02-22T16:25:37Z

The idiomatic way to fix your code is to either move launch into run:

fun run() = launch {
                while (isActive) {
                    println("RUNNING")
                    delay(400)
                }
            }

or remove withContext(coroutineContext) { ... } from run and use val job = a.launch { a.run() } instead.

Anyway, writing less code makes it work correctly and it is so by design. The general idea that the shortest possible code shall be a correct one.

The docs on mixing Job into the context shall be indeed improved.

Dougrinch · 2019-02-23T14:53:07Z

I think this is dangerous. Now we have situation when due to a little code changes, the semantic changes dramatically. Moreover, in some cases the shortest code will be incorrect. For example, if I want to start child but non cancellable async coroutine, I will write the simplest/shortest code like this:

coroutineScope {
    launch(NonCancellable) { ... }
    //do cancellable something else
}

but it is not correct. Now I can do this only like

coroutineScope {
    launch { 
        withContext(NonCancellable) { ... }
    }
    //do cancellable something else
}

I suppose, even with updated documentation current behaviour will be difficult and dangerous because it differs from "default expected behaviour".

elizarov · 2019-02-25T11:29:51Z

@Dougrinch can you, please, clarify what is the use-case for "child but non cancellable async coroutine". What you are trying to achieve by writing this code? (I'm asking because I'd like to see if there any existing or proposed solution to your use-cases that is actually simpler to code)

Dougrinch · 2019-02-25T12:15:09Z

To be honest, there is not real use case. I found it while reading the coroutine docs and "playing with code to understand". I just was confused an implicitness of this. I agree with @antanas-arvasevicius that GlobalScope.launch or job.launch would be better. At least, more explicitness - less bugs.

antanas-arvasevicius · 2019-02-26T09:20:22Z

Hi,
More I'm thinking about this topic more it gets a little cryptic.
Let's suppose that 'CoroutineScope' is a kind of object which aggregates other CoroutineScopes into a group. Now we can create new coroutine scope using coroutineScope builder, or implement CoroutineScope interface on some object.
But there are different behaviours:

With parent-child relationship (auto attaching to callees scope):
a) when withContext builder is used with context parameter without Job attribute
b) when class implements CoroutineScope interface and its coroutineContext property contains context without Job attribute then all launch inside class will be attached to callees scope if called inside suspend functions (it's not cleared what parent will be on not suspend functions? Will job be created implicitly once? Or multiple jobs?)
Without parent-child relationship:
c) when withContext is used with context which has a Job attribute
d) When class implements CoroutineScope interface and its coroutineContext has Job attribute

(As of writing test example, I've found more strange things, seems internally there are parent-child relationships independent of Job objects, see below)

It leads to confusion in this examples:

        val workers = Dispatchers.Default + Job()
        suspend fun test() = withContext(workers) {
            while (isActive) {
                println("doing work")
                delay(1000)
            }
            println("work completed")
        }

        runBlocking {
            val job = launch {
                test()
            }.apply {
                invokeOnCompletion {
                    println("completed") // <- never prints "completed"
                }
            }
            delay(200)
            job.cancel()
            job.join()    // <- join is waiting also for test() to complete? But how? it's on different Job()
            println("done") // <- never reaching here
        }

Could you explain how it works? Looks like cancel() not maintains parent-child, but join() "knows" that test.

Other:

        val workers = Dispatchers.Default + Job()
        suspend fun test() = coroutineScope {
            launch(workers) {
                while (isActive) {
                    println("doing work")
                    delay(1000) 
                }
                println("work completed")
            }
        }

        runBlocking {
            val job = launch {
                test()
            }.apply {
                invokeOnCompletion {
                    println("completed")
                }
            }
            delay(200)
            job.cancel()
            job.join()
            println("done")
            delay(5000)
            //..
            //still producing "doing work"
        }

Now it joins, but never cancels.

And all works here:

        suspend fun test() = coroutineScope {
            launch { //<- not using context with "Job"
                while (isActive) {
                    println("doing work")
                    delay(1000)
                }
                println("work completed")
            }
        }

        runBlocking {
            val job = launch {
                test()
            }.apply {
                invokeOnCompletion {
                    println("completed")
                }
            }
            delay(200)
            launch { delay(4000); job.cancel() }
            job.join()
            println("done")
            // completes after 4 seconds. all ok.
        }
    }

Shouldn't join work in tact with cancel?

elizarov · 2019-02-26T19:47:35Z

In you first example you cancel the outher job = launch { ... }, but inside of this job you run non-cancellable code (it is detached from this job) becuase you did withContext(wokers) { ... } and your workers context contains its own Job() that is not a child of your other job. So when you do job.join() it just waits for the job to complete, but it is not going to happend, since the code running inside is not cooperating.

On the other hand note, that if you do workers.cancel, then your withContext(workers) { ... } will get cancelled properly.

antanas-arvasevicius · 2019-02-28T11:50:39Z

Hi,
okay, but how to explain that code

suspend fun test() = coroutineScope {  launch(workers) {  while (isActive) { } }

and code

suspend fun test() = withContext(workers) {  while (isActive) { } }

works differently when calling launch { test() }.join() one blocks until completes other completes instantly.
And if I'd call launch without context it works the same (from join perspective`) as withContext(workers):

suspend fun test() = coroutineScope {  launch/*(workers)*/ {  while (isActive) { } }

Some conslusions:

So from join perspective, withContext is always bound to its parents whatever context would be.
and launch depends on given context, if it has Job() then it is bound to Job not the parent.
From cancel perspective, withContext and launch both works the same and depends on context's Job attribute.
seems like there are hidden relationships between scopes apart Job - maybe coroutines call stack itself which do not depend from Job?

elizarov · 2019-03-01T15:34:01Z

There is nothing hidden.

Each coroutine's context has a Job. See here on how to access it.
A Job that coroutine creates is always a child of the Job in its parent context.
When you pass coroutineContext parameters you add to the coroutines' parent context.
When the coroutineContext you pass to the coroutine has another Job, then this job because a parent of this coroutine, instead a Job it would have inherited.

Does it explain the results?

antanas-arvasevicius · 2019-03-04T12:29:48Z

This part is clear, what is not clear is how then withContext() works? As I understand it's not a coroutine builder (as launch async) but as a regular suspend function which suspends current execution until given block completes.
That would explain why this code of block never prints "completed".

val workers = Dispatchers.Default + Job()
suspend fun test() = withContext(workers) {  while (isActive) { } }

launch {  test() }.invokeOnCompletion { println("completed") }

so difference from launch is that launch creates a child job and attaches it to a given parent Job without any suspension of existing execution, and withContext creates a child job and suspends execution until block completes (or gets cancelled).

But what concerns me for this that using withContext with Job attribute made calls to "test()" not cancellable by a context on which it was called, but it blocks and waits for completion of "test()". It may be useful in some scenarios but not a daily default.

And how to implement cancellation? Because withContext do not provide result's Job so we cannot cancel it explicitly: pseudo code would be:

val workers = Dispatchers.Default + Job()
suspend fun test() { 
    coroutineContext[Job]!!.invokeOnCompletion(true) {
        j.children.find { /* how to find children ? */ }?.cancel()
    }
    withContext(workers) {  while (isActive) { } }
}

// or with launch

val workers = Dispatchers.Default + Job()
suspend fun test() { 
    val childJob = launch(workers) {  while (isActive) { } }
    coroutineContext[Job]!!.invokeOnCompletion(true) { childJob.cancel() }
    childJob.join()
}

By this example it possible to explicitly cancel actions on call sites of test() and also cancel globally with workers.cancelChildren()

Question is why withContext have this "Job" override behaviour which is so hard to manage, and if doing so we get by default "non-cancellable" job implicitly (hidden, not clear, unintuitive). (Could be applied to launch too but at least it returns a job which can be manipulated it's not that bad.)

I see problem in official examples of CoroutineScope android activity implementation, because it's maybe ok for android, but applying the same for services is not OK because all calls is not cancellable by default, e.g. we need to write services as this to be able callers to cancel it's actions.

class FileUploaderService : CoroutineScope { 
val job = Job()
override val coroutineContext get() = Dispatchers.Default + job

suspend fun uploadFile() {
   val childJob = launch {
       // do uploading...
   }
  coroutineContext[Job]!!.invokeOnCompletion(true) { childJob.cancel() }
   childJob.join()
}
 fun destroy() {
      coroutineContext.cancelChildren()
  }
 }

And it could be nice that this code could be replaced with just one liner withContext(coroutineContext) { } instead, but unfortunately it breaks cancellation completely.

More I'm elaborating on this more strangely it looks and yes it could be documented much, but it will be hard to avoid WTF moments and what I'm afraid that many developers will be hit by this and then search for manuals later to find out this behaviour. And sorry for your taken time.

elizarov · 2019-03-06T11:31:40Z

Question is why withContext have this "Job" override behaviour which is so hard to manage, and if doing so we get by default "non-cancellable" job implicitly (hidden, not clear, unintuitive). (Could be applied to launch too but at least it returns a job which can be manipulated it's not that bad.)

there are two use-cases:

withContext(NonCancellable) { ... } -- execute a piece of code even when outer coroutine was already cancelled
val job = Job(); withContext(job) { ... } -- the same, but with an added ability to do job.cancel() if needed to cancel whatever withContext is doing.

The same use-case apply to launch. Even though launch returns job, it is not always convenient, since it becomes knows only when launch had already started a coroutine:

val job = launch { ... /* oops.. cannot use job here */ ... }

So, sometimes injecting an explicit Job() helps.

Concerning this example:

suspend fun uploadFile() {
   val childJob = launch {
       // do uploading...
   }
  coroutineContext[Job]!!.invokeOnCompletion(true) { childJob.cancel() }
   childJob.join()
}

What is the purpose of writing this complicated code? You can just write:

suspend fun uploadFile() {
   // do uploading...
}

antanas-arvasevicius · 2019-03-06T13:00:08Z

No, I cannot write that way because it won't run on a CoroutineScope of FileUploaderService object, which has different dispatcher and explicit job, for explicit cancellation of all uploads if "destroy" will be called. But will be run on whatever calls that uploadFile().
I think withContext is more used to change a dispatcher on which it executes a given block and that non-cancellable behaviour is only on edge cases, so I don't get why it's always non cancellable.
If I'm writing suspend fun uploadFile() = withContext(coroutineContext) { /*block*/ } I'm expecting that a given block will be run on object's scope and also be cancellable if parent scope will be cancelled. And if it only works when there are no explicit Job attribute, then what is the purpose of CoroutineScope interface?
Because without Job CoroutineScope'd object is not a coroutine scope at all, it is just Dispatcher's "switch" operation and all "launches" inside a class is unknown, everything is run on no-parent Job, no any parent-child relations, and everything is non-cancellable.

Example:

        class A : CoroutineScope {
            override val coroutineContext: CoroutineContext get() = Dispatchers.Default
            suspend fun test() {
                launch { while (isActive) { delay(1000) } }
                launch { while (isActive) { delay(2000) } }
            }
        }
        runBlocking {
            val a = A()
            val job = launch { a.test() }
            delay(1000)
            println(a.coroutineContext[Job]!!.children.toList()) //<- KotlinNullPointerException
            job.join()
        }

So, actually, using CoroutineScope interface without Job attribute is barely useful.
And using withContext WITH Job attribute is also hard to use due instant non-cancellable, breakage of parent-child relationships.
Looks like these two concepts seems not complementary and incompatible to each other.

antanas-arvasevicius · 2019-03-15T13:05:42Z

Hi,
we've added some extension functions in our project which fixes these issues:
Now everything is within structured concurrency semantics and everything can be cancelled normally.
Shouldn't this implementation be default in Kotlin?

suspend fun <T> withCancellableContext(
    context: CoroutineContext,
    block: suspend CoroutineScope.() -> T
): T {
    val job = coroutineContext[Job]
    return withContext(context) {
        if (context[Job] != NonCancellable) {
            job?.invokeOnCompletion(onCancelling = true) { coroutineContext[Job]?.cancel() }
        }
        block()
    }
}

fun CoroutineScope.cancellableLaunch(
    context: CoroutineContext = EmptyCoroutineContext,
    start: CoroutineStart = CoroutineStart.DEFAULT,
    block: suspend CoroutineScope.() -> Unit
): Job = launch(context, start, block).also { job ->
    if (context[Job] != NonCancellable) {
        coroutineContext[Job]?.invokeOnCompletion(onCancelling = true) { job.cancel() }
    }
}

jemshit · 2019-03-15T13:53:28Z

Not sure if related to this thread, i have custom CoroutineScope:

A:CoroutineScope {
    protected lateinit var job: Job
    override val coroutineContext: CoroutineContext
        get() = job + Dispatchers.Main
}

And i use:

A.launch{
    val model = withContext(Dispatchers.Default) {
        deserializeModel()
    }
    // print("Deserialized, coroutineActive:${this.isActive}")
}

Even i cancel the job of Scope (job.cancel()) after deserializeModel() starts to run, i can see print line getting executed with isActive being false. Is this expected behavior? deserializeModel() has gson.fromJson. I was expecting withContext and rest of lines getting cancelled and not executed

Edit: is this related to this section in documentation (making computation code cancellable)?

elizarov · 2019-03-30T12:32:10Z

@antanas-arvasevicius You don't need to pass explicit job. There is no need to write all this code and hence no need to have in the library. You can just use the job you already have in the coroutine context. If you need to cancel some job, like a file upload, just write:

suspend fun uploadFile() = coroutineScope { // creates a scoped job
    // now if you need to cancel anything inside here just call:
   coroutineContext.cancel()   
    // if you need to pass uploadFile's job somewhere else for cancellation:
    val job = coroutineContext[Job]!! // this is the Job of uploadFile
    // ...
    // do whatever you need here
}

Dougrinch · 2019-03-31T09:25:05Z

@elizarov Are you say that if I want to write FileUploaderService that cancel all uploads on destroy call (see @antanas-arvasevicius example above), I need maintain MutableSet<Job> manually?

elizarov · 2019-03-31T12:32:23Z

I see. This FileUploaderService is an interesting use-case, indeed. We have a similar problem in Ktor. Basically, what is wanted here is a (missing now) ability to have a Job with two parents:

There is a Job of FileUploaderService itself. When it is cancelled, we want all uploads to be cancelled.
There is a Job in which uploadFile is called. When it is cancelled, we want this upload to be cancelled.

The proposed withCancellableContext by @antanas-arvasevicius emulates this functionality.

How can we add it to kotlinx-coroutines library? I don't have a good answer:

We cannot just change the way withContext works (it is going to be a breaking change)/
We cannot just add xxxCancellable variant for each builder xxx -- there is going to be too much duplication and it is not composable.

What realistic options to we have? I see one way at the moment (but I don't like it very much):

We can have some new variant of Job/SupervisorJob constructor function that works in a such a way, that going withContext(new_job_type) works like withCancellableContext. That is going to look quite error-prone in the long-term, so we could augment it with a long-term plan to deprecate usages of withContext(old_job_type) and maybe, in the future, to deprecate old Job() and SupervisorJob() constructors altogether, by replacing them with some other fresh names (this also gives us a chance to fix asymetry in Job/SupervisorJob naming).

antanas-arvasevicius · 2019-03-31T12:43:36Z

Hi, How about not going in two parents route, but just create new job object which will be added as a child in caller's scope if scopes will differ. XxxJob's responsibility would be to delegate cancellation only. Could be named EntangledJob :)) How about this? There is seems missing concept - creation graph of coroutines separate from execution graph of coroutines (as currently job children-parent does). Don't know if it is good way to decouple that and made explicit concept. Maybe debug stack tracking will work better with creation graph instead of current implementation.

…

On Sun, 31 Mar 2019 at 14:32, Roman Elizarov ***@***.***> wrote: I see. This FileUploaderService is an interesting use-case, indeed. We have a similar problem in Ktor. Basically, what is wanted here is a (missing now) ability to have a Job with two parents: 1. There is a Job of FileUploaderService itself. When it is cancelled, we want all uploads to be cancelled. 2. There is a Job in which uploadFile is called. When it is cancelled, we want *this* upload to be cancelled. The proposed withCancellableContext by @antanas-arvasevicius <https://github.com/antanas-arvasevicius> emulates this functionality. How can we add it to kotlinx-coroutines library? I don't have a good answer: 1. We cannot just change the way withContext works (it is going to be a breaking change)/ 2. We cannot just add xxxCancellable variant for each builder xxx -- there is going to be too much duplication and it is not composable. What realistic options to we have? I see one way at the moment (but I don't like it very much): We can have some new variant of Job/SupervisorJob constructor function that works in a such a way, that going withContext(new_job_type) works like withCancellableContext. That is going to look quite error-prone in the long-term, so we could augment it with a long-term plan to deprecate usages of withContext(old_job_type) and maybe, in the future, to deprecate old Job() and SupervisorJob() constructors altogether, by replacing them with some other fresh names (this also gives us a chance to fix asymetry in Job/SupervisorJob naming). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1001 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEAcnCwV9xr3_EJA2azX2bhVqtsrUcHPks5vcKrcgaJpZM4bFeeP> .

elizarov · 2019-03-31T13:15:53Z

How about not going in two parents route, but just create new job object
which will be added as a child in caller's scope if scopes will differ.

I don't know if that is what you were thinking, but another solution is going to be to define a completely new kind of entity. Not a Job at all. Let's call it Group (not a real name, just to simplify discussion). So, instead of creating a Job inside FileUploaderService it could create Group and the use withContext(group) { ... }. That would completely abolish the need to have "two parent Jobs". However, the problem is that Group's API and implementation is going to very much like the Job.

antanas-arvasevicius · 2019-03-31T14:32:55Z

No, I think it's almost ok as is because now to create an object with its own scope we just need to define coroutine context with Job() to control lifecycle, if we'll add new concept apart job it will definitely add more confusion than there is now. I think current implementation defines coroutine relationships between runtime execution (answers on where coroutine is executed on), but not how coroutines was created (we do not have relationships). Yes in any case implementation woud be the same as current Job (children-parent). But job tracks execution relationship and new would track creation relationship. Yes it seems the same as Job does now, I am also not clear if it is should be different concepts (creation & execution) or not. So there are two options to add sentinel job EntagledJob and inject that job in coroutineContext of caller just before merging of passed context. Or implement ChildrenGroup and add it to Job for current implementation and add for creation parent child tracking. Personally I feel that explicitly separating execution and creation relationships would help for understanding and for implementation, as now it looks like they are implicitly interleaved together (but still not clear now is current Job not covers everything and that case is exception). I think this needs some prototyping and investigate if it reveals something or not.

…

On Sun, 31 Mar 2019 at 15:15, Roman Elizarov ***@***.***> wrote: How about not going in two parents route, but just create new job object which will be added as a child in caller's scope if scopes will differ. I don't know if that is what you were thinking, but another solution is going to be to define a completely new kind of entity. Not a Job at all. Let's call it Group (not a real name, just to simplify discussion). So, instead of creating a Job inside FileUploaderService it could create Group and the use withContext(group) { ... }. That would completely abolish the need to have "two parent Jobs". However, the problem is that Group's API and implementation is going to very much like the Job. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1001 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEAcnKCGtZwmH2G1UrfzPLGdJ87LBcZzks5vcLUOgaJpZM4bFeeP> .

svasilinets · 2020-02-18T12:35:10Z

I've encountered this recently too. Could you please advise what is the best way to currently achieve the same behaviour without using InternalCoroutinesApi? @antanas-arvasevicius 's solution unfortunately uses invokeOnCompletion(onCancelling = true) {...} which is internal.

The way that I currently found is:

suspend fun uploadFile() {
  val callerJob = coroutineContext[Job]!!
  return uploaderServiceScope.async {
      callerJob.invokeOnCompletion { if (it is CancellationException)  this.cancel() }
      doWork()
  }.await()   
}

This small trick with async {}.await() allows to caller's Job to be completed while async block is running, once caller's job is done completion signal is received and async block can be cancelled. (note the same thing won't work with withContext)

Is there a nicer / simpler way to express it?

Dominaezzz · 2020-11-26T22:06:08Z

What about this?

combineScope(secondCoroutineScope) {
    // coroutineContext[Job]::class == GroupJob::class
}

fluidsonic · 2021-01-01T17:03:30Z

I've always assumed that this already works out of the box because it's such a common use case. I'm surprised that it doesn't. I have the use case described by @antanas-arvasevicius and @elizarov (#1001 (comment)) quite often.

There is a wider scope (a service, a worker, an API/HTTP client, etc.) and a narrower scope (a unit of work, an HTTP request, etc.). For some reason I've assumed the following to work out of the box:

class Worker(coroutineContext: CoroutineContext) {

    private val coroutineContext = coroutineContext + SupervisorJob() + CoroutineName("Worker") // "wide scope"

    suspend fun process(work: Work) { // "narrow scope"
        withContext(coroutineContext) {
            // process work
            // canceled if either the caller's Job is canceled or the Worker's Job is canceled
        }
    }

    fun stop() {
        coroutineContext.cancel()
    }
}

What's the best approximate to do that with what we already have?

fluidsonic · 2021-01-11T13:22:09Z

I can't believe I ended up in this thread again with another unexpected behavior 😅

The following code does… absolutely nothing.

suspend fun main() {
    val flow = flowOf(1)

    coroutineScope {
        launch(NonCancellable) {
            flow.collect { number ->
                println("Processing number $number…")
                delay(5_000)
                println("Done processing number $number.")
            }
        }
    }
}

Description of NonCancellable:

A non-cancelable [sic] job that is always active. It is designed for withContext function to prevent cancellation of code blocks that need to be executed without cancellation.

The side-effect that the mentioned "non-cancelable job" breaks the parent-child relationships comes quite surprising, at least to me. So the main function returns and the program terminates even before the new coroutine was launched.
Why do I need a connection between launch and the enclosing coroutineScope when it isn't honored anyway? (i.e. why do I need a scope anyway)

What I would have expected, from both "reading the code" and from the documentation:

There's a new CoroutineScope and we wait for its completion.
There's a new coroutine launched in the CoroutineScope, so the scope shouldn't end until that coroutine is complete.
The new coroutine is launched non-cancellable, so even if I cancel the CoroutineScope the coroutineScope {…} block won't return until all non-cancellable coroutines inside it have completed.

Unfortunately that's not the case.

Background

I have a back-end with plenty of services, workers and coroutines.
When the back-end initiates a shutdown then all pending work (often Flow downstreams) should be completed without interruption. Everything else (incoming HTTP requests, scheduled jobs, etc.) must no longer happen.

With the current behavior pending work would continue due to NonCancellable context (what if the context changes to a cancelled one?). But it would be cancelled anyway because the entire program terminates due to the outermost scope completing too early.

Making the entire back-end cooperatively wait for downstream work would likely add a lot of unnecessary complexity for something that could work out-of-the-box with coroutines.

Regarding multiple parent jobs:
What would happen if there are two parent jobs (e.g. one for the worker and one of the piece of work) if one of them is cancelled and the other one is NonCancellable?

elizarov · 2021-01-11T14:27:25Z

The best explanation is in this blog post of mine: https://elizarov.medium.com/coroutine-context-and-scope-c8b255d59055

We still have no "easy to use" solution to this particular problem for this particular use-case of two parent jobs.

antanas-arvasevicius · 2021-01-11T15:05:49Z

Yeah, @fluidsonic I've been in your situation and I know what it feels. All theoretically nice "structured concurrency" just breaks apart if you pass new Job instance (and NonCancellable) in coroutine context.
For me it's a flaw in coroutines design, but seems it was solved just by extensively documenting it and pretending that the problem was solved.

I'm proposing changes is:
if withContext() launch() is called on some CoroutineScope with explicit "job" context, then it must:
a) work the same as now: coroutine will be launched on a explicitly given Job as a child
b) must create a "ExternallyDeferredJob" instance which will be registered as a child-job of scoped CoroutineScope's job and it will somehow subscribes into newly created coroutines Job and completes then coroutine completes
c) propagate cancellation in both directions.

Diagram for suggestion:

[CoroutineScope.Job]                                    [MyJob] 
            ^   (attach "synthetic" job)                     ^  (attach to a explicitly given job MyJob) (as now)
            |                                                                |  
  [ExternallyDeferredJob]                             [New Job for new coroutine]  <-----> (new coroutine)
                          ^                                                                   ^
                          ||                                                                  ||
                          ==============================
                                      1. synthetic link between states and cancellation propagation (sync between).

fluidsonic · 2021-01-11T15:05:55Z

The best explanation is in this blog post of mine: https://elizarov.medium.com/coroutine-context-and-scope-c8b255d59055

@elizarov do you mean the explanation regarding NonCancellable breaking the parent-child Job relationship?

The only info that's not in the documentation is this:

By convention, we do not usually pass a Job in a context parameter to launch, since that breaks parent-child relation, unless we explicitly want to break it using a NonCancellable job, for example.

The documentation doesn't make that clear at all. Only if you are highly familiar with how Jobs work then "A non-cancelable job" will raise a warning to you.

It should renamed to NonCancellableJob if the behavior is intended to stay that way to make it much more obvious - together with improved documentation.
In the future NonCancellable could be re-used for non-cancellable work that doesn't break the parent-child relationship.

antanas-arvasevicius · 2021-01-11T15:12:24Z

My suggestion is that API must prohibit any possibility of breaking the chain of parent-child relationships and those relationships can be created only by calling CoroutineScope.launch or CoroutineScope.withContext or if you need "new" chain you'll call GlobalScope.launch () { }

Btw. with android examples where were a classes with internal customScope val customScope = CoroutineScope(Dispatchers.Main + SupervisorJob())

Also will break structured concurrency if you'll do something like:

suspend fun uploadFile(file:String) {
    withContext(customScope) { // <- because custom scope has explicit "job"
       // do upload...
    }
}

fluidsonic · 2021-01-11T15:28:51Z

I agree. It's super easy to mess up that relationship without noticing, esp. when you're new to coroutines and even if you're intermediate. We always have to wonder what context a job comes from - if it has any, when we (accidentally) switch to another job tree, and what happens if multiple jobs collide.

CoroutineContext is not type-safe in that regard, nor is the current coroutines API. For example NonCancellable and constructors like Job() and SupervisorJob() that default to having no parent are way too easy to use. Having a parent-relationship should be the norm, so having no parent should be more explicit. Same for switching the job.

And if something needs a coroutineContext, is it okay to pass one with a Job or not? Things like flowOn(…) or launchIn(…).

I've read all info on coroutines that I could find, sometimes multiple times. I've used them a lot. And I still keep getting it horribly wrong. It's quite a steep learning curve.

elizarov · 2021-04-06T16:33:23Z

The rule of thumb that is we use in API:

If it is using CoroutineContext then you should NOT have Job there.
If it is using CoroutineScope then the scope should have a Job.

What we are seeing here, is that violations of the 1st rule break the structured concurrency without being apparent. I'd love to enforce it stronger and require an explicit opt-in for breaking structured concurrency, but I don't have any nice ideas on how we might implement this kind of enforcement in a backward-compatible way. The only easy fix is more docs: #2633

For more substantive fixes, I see the following potential directions:

Totally deprecate NonCancellable and, instead, implement and purpose non-cancellable scope function to replace withContext(NonCancellable) { ... } idiom.
As a part of Generalize coroutineScope/supervisorScope with user-specified strategy #1406 we can add more options to coroutine builders, including explicit opt-out of structured concurrency, and then deprecate launch(NonCancellabe) { ... } without that explicit option.

However, I don't see any easy way to detect and forbid code like this:

val myContext: CoroutineContext = Job() // notice the type: CoroutineContext 
launch(myContext) { ... } // implicitly opted out of structured concurrency

LouisCAD · 2021-04-06T17:10:28Z

The code of coroutine builders like launch can check for the presence of a Job inside the passed CoroutineContext, and there could be a system property to tweak the behavior on violation among:

No check
Warn
Crash/forbid

The default value could then move forward across major kotlinx.coroutines, to have developers take notice and fix possible violations.

elizarov · 2021-04-07T07:47:00Z

It is not a very good idea to emit runtime warnings from the library. We could do it, but that is a last-resort measure for problems of a grandiose scale.

LouisCAD · 2021-04-07T08:19:43Z

Why is it not a very good idea?

While I agree that runtime warnings are likely to never be seen, there's solutions to this problem. Here's one from the top of my head:
A tool included in the app (which can be excluded any time) that registers all runtime issues like that one, and forwards them to the developer right in the IDE or another place with as much context as possible (thread, date-time, current stack, coroutine info if possible…).

I'd love such a thing landing in the Kotlin ecosystem, even if it starts with a barebones approach.

elizarov added the docs KDoc and API reference label Feb 22, 2019

elizarov mentioned this issue Feb 23, 2019

withContext from inside a cancelled coroutine should not be executed #962

Closed

elizarov changed the title ~~Beware that structured concurrency is violated when adding custom Job in launch~~ Structured concurrency with two parent jobs Mar 31, 2019

elizarov added enhancement design and removed docs KDoc and API reference labels Mar 31, 2019

Dominaezzz mentioned this issue Apr 22, 2021

WIP - introduce sessionScope element-hq/element-android#3197

Merged

6 tasks

whyoleg mentioned this issue Apr 23, 2021

Mark GlobalScope as delicate API and rewrite coroutine basics doc without it #2637

Merged

qwwdfsad added the structured concurrency label May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured concurrency with two parent jobs #1001

Structured concurrency with two parent jobs #1001

antanas-arvasevicius commented Feb 20, 2019 •

edited

elizarov commented Feb 22, 2019

antanas-arvasevicius commented Feb 22, 2019

elizarov commented Feb 22, 2019

Dougrinch commented Feb 23, 2019

elizarov commented Feb 25, 2019

Dougrinch commented Feb 25, 2019

antanas-arvasevicius commented Feb 26, 2019

elizarov commented Feb 26, 2019

antanas-arvasevicius commented Feb 28, 2019

elizarov commented Mar 1, 2019 •

edited

antanas-arvasevicius commented Mar 4, 2019

elizarov commented Mar 6, 2019 •

edited

antanas-arvasevicius commented Mar 6, 2019 •

edited

antanas-arvasevicius commented Mar 15, 2019

jemshit commented Mar 15, 2019 •

edited

elizarov commented Mar 30, 2019

Dougrinch commented Mar 31, 2019

elizarov commented Mar 31, 2019

antanas-arvasevicius commented Mar 31, 2019 via email

elizarov commented Mar 31, 2019

antanas-arvasevicius commented Mar 31, 2019 via email

svasilinets commented Feb 18, 2020

Dominaezzz commented Nov 26, 2020

fluidsonic commented Jan 1, 2021

fluidsonic commented Jan 11, 2021 •

edited

elizarov commented Jan 11, 2021

antanas-arvasevicius commented Jan 11, 2021

fluidsonic commented Jan 11, 2021 •

edited

antanas-arvasevicius commented Jan 11, 2021

fluidsonic commented Jan 11, 2021 •

edited

elizarov commented Apr 6, 2021

LouisCAD commented Apr 6, 2021

elizarov commented Apr 7, 2021 •

edited

LouisCAD commented Apr 7, 2021

Structured concurrency with two parent jobs #1001

Structured concurrency with two parent jobs #1001

Comments

antanas-arvasevicius commented Feb 20, 2019 • edited

elizarov commented Feb 22, 2019

antanas-arvasevicius commented Feb 22, 2019

elizarov commented Feb 22, 2019

Dougrinch commented Feb 23, 2019

elizarov commented Feb 25, 2019

Dougrinch commented Feb 25, 2019

antanas-arvasevicius commented Feb 26, 2019

elizarov commented Feb 26, 2019

antanas-arvasevicius commented Feb 28, 2019

elizarov commented Mar 1, 2019 • edited

antanas-arvasevicius commented Mar 4, 2019

elizarov commented Mar 6, 2019 • edited

antanas-arvasevicius commented Mar 6, 2019 • edited

antanas-arvasevicius commented Mar 15, 2019

jemshit commented Mar 15, 2019 • edited

elizarov commented Mar 30, 2019

Dougrinch commented Mar 31, 2019

elizarov commented Mar 31, 2019

antanas-arvasevicius commented Mar 31, 2019 via email

elizarov commented Mar 31, 2019

antanas-arvasevicius commented Mar 31, 2019 via email

svasilinets commented Feb 18, 2020

Dominaezzz commented Nov 26, 2020

fluidsonic commented Jan 1, 2021

fluidsonic commented Jan 11, 2021 • edited

Background

elizarov commented Jan 11, 2021

antanas-arvasevicius commented Jan 11, 2021

fluidsonic commented Jan 11, 2021 • edited

antanas-arvasevicius commented Jan 11, 2021

fluidsonic commented Jan 11, 2021 • edited

elizarov commented Apr 6, 2021

LouisCAD commented Apr 6, 2021

elizarov commented Apr 7, 2021 • edited

LouisCAD commented Apr 7, 2021

antanas-arvasevicius commented Feb 20, 2019 •

edited

elizarov commented Mar 1, 2019 •

edited

elizarov commented Mar 6, 2019 •

edited

antanas-arvasevicius commented Mar 6, 2019 •

edited

jemshit commented Mar 15, 2019 •

edited

fluidsonic commented Jan 11, 2021 •

edited

fluidsonic commented Jan 11, 2021 •

edited

fluidsonic commented Jan 11, 2021 •

edited

elizarov commented Apr 7, 2021 •

edited