Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured concurrency with two parent jobs #1001

Open
antanas-arvasevicius opened this issue Feb 20, 2019 · 34 comments
Open

Structured concurrency with two parent jobs #1001

antanas-arvasevicius opened this issue Feb 20, 2019 · 34 comments

Comments

@antanas-arvasevicius
Copy link

antanas-arvasevicius commented Feb 20, 2019

Hello,
Today we've found out that this code do not conform to official structured concurrency concept:

        runBlocking {
            val job = Job()

            val mainJob = launch {
                launch(context = coroutineContext + job) {
                    while (isActive) {
                        println("ACTIVE")
                        delay(1000)
                    }
                }.join()
            }.also {
                it.invokeOnCompletion {
                    println("CANCELLED")
                }
            }

            delay(1)
            mainJob.cancel()
            delay(6000)
            println("DONE")
        }

Prints:
CANCELLED
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
DONE

When canceling "mainJob" it won't cancel all children jobs inside.
Seems like calling launch with currentContext + job breaks parent-child relationships.
Logically we thought that custom job property will be used only as extra control for coroutine cancellation and not completely break parent-child relationships.
Is this behaviour by design or it's a bug?

Solutions I think could be one of:

  1. extra "job" will be used only for coroutine cancellation trigger (extra trigger)
  2. extra "job" would be attached implicitly to parent's job and becomes by default it's children (complicated)
  3. at least add extra caution in documentation about this behaviour
@elizarov
Copy link
Contributor

This behavior is by design. Job() is the context element that control parent-child relations of coroutine and when you install your own Job() into the context, you explicitly override this mechanism. Note, that can also create Job(parent), which gives you a choice on the parent of the job.

@antanas-arvasevicius
Copy link
Author

Thanks for explanation. It was not clear from coroutine documentation how exactly parent-child tracking works.

So to make a "job" behave as an extra trigger and maintain structured concurrency the code should be written as:

val job = Job(coroutineContext[Job]!!)
launch(context = coroutineContext + job) {
}

Is it correct?
What do you think if coroutineContext plus operator will automatically merge a given job with a current job?
It would make less erroneous and less ambiguous, because now If I want to break parent-child I can use GlobalScope.launch {} or call this@ClassName.launch {} // if ClassName has CoroutineScope interface.

Another similar example which is currently not clear (example from android Activity, but could be e.g. some backend BackgroundService):

        class A : CoroutineScope {
            val job = Job()
            override val coroutineContext: CoroutineContext
                get() = Dispatchers.IO + job

            suspend fun run() = withContext(coroutineContext) {
                while (isActive) {
                    println("RUNNING")
                    delay(400)
                }
            }
            fun onDestroy() { job.cancel() }
        }

        runBlocking {
            val a = A()
            val job = launch { a.run() }
            delay(100)
            job.cancel()
            delay(4000)
        }
}

I pretend when cancelling 'job' it will cancel computations, but it is running indefinitely (actually this is currently in our production system :-)
And it's not really what we've expected. And strangely when I'm removing "job" from Dispatchers.IO then it cancels out, wouldn't then class A() by default have it's own empty job pool.
But would be better that by default kotlin will maintain parent-child relations unless explicitly stated. It really hard to understand is child jobs will cancel out when parent scope cancels or not especially for deep calls of suspend functions with various different scopes.

@elizarov
Copy link
Contributor

The idiomatic way to fix your code is to either move launch into run:

fun run() = launch {
                while (isActive) {
                    println("RUNNING")
                    delay(400)
                }
            }

or remove withContext(coroutineContext) { ... } from run and use val job = a.launch { a.run() } instead.

Anyway, writing less code makes it work correctly and it is so by design. The general idea that the shortest possible code shall be a correct one.

The docs on mixing Job into the context shall be indeed improved.

@Dougrinch
Copy link

I think this is dangerous. Now we have situation when due to a little code changes, the semantic changes dramatically. Moreover, in some cases the shortest code will be incorrect. For example, if I want to start child but non cancellable async coroutine, I will write the simplest/shortest code like this:

coroutineScope {
    launch(NonCancellable) { ... }
    //do cancellable something else
}

but it is not correct. Now I can do this only like

coroutineScope {
    launch { 
        withContext(NonCancellable) { ... }
    }
    //do cancellable something else
}

I suppose, even with updated documentation current behaviour will be difficult and dangerous because it differs from "default expected behaviour".

@elizarov
Copy link
Contributor

@Dougrinch can you, please, clarify what is the use-case for "child but non cancellable async coroutine". What you are trying to achieve by writing this code? (I'm asking because I'd like to see if there any existing or proposed solution to your use-cases that is actually simpler to code)

@Dougrinch
Copy link

To be honest, there is not real use case. I found it while reading the coroutine docs and "playing with code to understand". I just was confused an implicitness of this. I agree with @antanas-arvasevicius that GlobalScope.launch or job.launch would be better. At least, more explicitness - less bugs.

@antanas-arvasevicius
Copy link
Author

Hi,
More I'm thinking about this topic more it gets a little cryptic.
Let's suppose that 'CoroutineScope' is a kind of object which aggregates other CoroutineScopes into a group. Now we can create new coroutine scope using coroutineScope builder, or implement CoroutineScope interface on some object.
But there are different behaviours:

  1. With parent-child relationship (auto attaching to callees scope):
    a) when withContext builder is used with context parameter without Job attribute
    b) when class implements CoroutineScope interface and its coroutineContext property contains context without Job attribute then all launch inside class will be attached to callees scope if called inside suspend functions (it's not cleared what parent will be on not suspend functions? Will job be created implicitly once? Or multiple jobs?)

  2. Without parent-child relationship:
    c) when withContext is used with context which has a Job attribute
    d) When class implements CoroutineScope interface and its coroutineContext has Job attribute

(As of writing test example, I've found more strange things, seems internally there are parent-child relationships independent of Job objects, see below)

It leads to confusion in this examples:

        val workers = Dispatchers.Default + Job()
        suspend fun test() = withContext(workers) {
            while (isActive) {
                println("doing work")
                delay(1000)
            }
            println("work completed")
        }

        runBlocking {
            val job = launch {
                test()
            }.apply {
                invokeOnCompletion {
                    println("completed") // <- never prints "completed"
                }
            }
            delay(200)
            job.cancel()
            job.join()    // <- join is waiting also for test() to complete? But how? it's on different Job()
            println("done") // <- never reaching here
        }

Could you explain how it works? Looks like cancel() not maintains parent-child, but join() "knows" that test.

Other:

        val workers = Dispatchers.Default + Job()
        suspend fun test() = coroutineScope {
            launch(workers) {
                while (isActive) {
                    println("doing work")
                    delay(1000) 
                }
                println("work completed")
            }
        }

        runBlocking {
            val job = launch {
                test()
            }.apply {
                invokeOnCompletion {
                    println("completed")
                }
            }
            delay(200)
            job.cancel()
            job.join()
            println("done")
            delay(5000)
            //..
            //still producing "doing work"
        }

Now it joins, but never cancels.

And all works here:

        suspend fun test() = coroutineScope {
            launch { //<- not using context with "Job"
                while (isActive) {
                    println("doing work")
                    delay(1000)
                }
                println("work completed")
            }
        }

        runBlocking {
            val job = launch {
                test()
            }.apply {
                invokeOnCompletion {
                    println("completed")
                }
            }
            delay(200)
            launch { delay(4000); job.cancel() }
            job.join()
            println("done")
            // completes after 4 seconds. all ok.
        }
    }

Shouldn't join work in tact with cancel?

@elizarov
Copy link
Contributor

In you first example you cancel the outher job = launch { ... }, but inside of this job you run non-cancellable code (it is detached from this job) becuase you did withContext(wokers) { ... } and your workers context contains its own Job() that is not a child of your other job. So when you do job.join() it just waits for the job to complete, but it is not going to happend, since the code running inside is not cooperating.

On the other hand note, that if you do workers.cancel, then your withContext(workers) { ... } will get cancelled properly.

@antanas-arvasevicius
Copy link
Author

Hi,
okay, but how to explain that code

suspend fun test() = coroutineScope {  launch(workers) {  while (isActive) { } }

and code

suspend fun test() = withContext(workers) {  while (isActive) { } }

works differently when calling launch { test() }.join() one blocks until completes other completes instantly.
And if I'd call launch without context it works the same (from join perspective`) as withContext(workers):

suspend fun test() = coroutineScope {  launch/*(workers)*/ {  while (isActive) { } }

Some conslusions:

  • So from join perspective, withContext is always bound to its parents whatever context would be.
    and launch depends on given context, if it has Job() then it is bound to Job not the parent.
  • From cancel perspective, withContext and launch both works the same and depends on context's Job attribute.
  • seems like there are hidden relationships between scopes apart Job - maybe coroutines call stack itself which do not depend from Job?

@elizarov
Copy link
Contributor

elizarov commented Mar 1, 2019

There is nothing hidden.

  1. Each coroutine's context has a Job. See here on how to access it.
  2. A Job that coroutine creates is always a child of the Job in its parent context.
  3. When you pass coroutineContext parameters you add to the coroutines' parent context.
  4. When the coroutineContext you pass to the coroutine has another Job, then this job because a parent of this coroutine, instead a Job it would have inherited.

Does it explain the results?

@antanas-arvasevicius
Copy link
Author

This part is clear, what is not clear is how then withContext() works? As I understand it's not a coroutine builder (as launch async) but as a regular suspend function which suspends current execution until given block completes.
That would explain why this code of block never prints "completed".

val workers = Dispatchers.Default + Job()
suspend fun test() = withContext(workers) {  while (isActive) { } }

launch {  test() }.invokeOnCompletion { println("completed") }

so difference from launch is that launch creates a child job and attaches it to a given parent Job without any suspension of existing execution, and withContext creates a child job and suspends execution until block completes (or gets cancelled).

But what concerns me for this that using withContext with Job attribute made calls to "test()" not cancellable by a context on which it was called, but it blocks and waits for completion of "test()". It may be useful in some scenarios but not a daily default.

And how to implement cancellation? Because withContext do not provide result's Job so we cannot cancel it explicitly: pseudo code would be:

val workers = Dispatchers.Default + Job()
suspend fun test() { 
    coroutineContext[Job]!!.invokeOnCompletion(true) {
        j.children.find { /* how to find children ? */ }?.cancel()
    }
    withContext(workers) {  while (isActive) { } }
}

// or with launch

val workers = Dispatchers.Default + Job()
suspend fun test() { 
    val childJob = launch(workers) {  while (isActive) { } }
    coroutineContext[Job]!!.invokeOnCompletion(true) { childJob.cancel() }
    childJob.join()
}

By this example it possible to explicitly cancel actions on call sites of test() and also cancel globally with workers.cancelChildren()

Question is why withContext have this "Job" override behaviour which is so hard to manage, and if doing so we get by default "non-cancellable" job implicitly (hidden, not clear, unintuitive). (Could be applied to launch too but at least it returns a job which can be manipulated it's not that bad.)

I see problem in official examples of CoroutineScope android activity implementation, because it's maybe ok for android, but applying the same for services is not OK because all calls is not cancellable by default, e.g. we need to write services as this to be able callers to cancel it's actions.

class FileUploaderService : CoroutineScope { 
val job = Job()
override val coroutineContext get() = Dispatchers.Default + job

suspend fun uploadFile() {
   val childJob = launch {
       // do uploading...
   }
  coroutineContext[Job]!!.invokeOnCompletion(true) { childJob.cancel() }
   childJob.join()
}
 fun destroy() {
      coroutineContext.cancelChildren()
  }
 }

And it could be nice that this code could be replaced with just one liner withContext(coroutineContext) { } instead, but unfortunately it breaks cancellation completely.

More I'm elaborating on this more strangely it looks and yes it could be documented much, but it will be hard to avoid WTF moments and what I'm afraid that many developers will be hit by this and then search for manuals later to find out this behaviour. And sorry for your taken time.

@elizarov
Copy link
Contributor

elizarov commented Mar 6, 2019

Question is why withContext have this "Job" override behaviour which is so hard to manage, and if doing so we get by default "non-cancellable" job implicitly (hidden, not clear, unintuitive). (Could be applied to launch too but at least it returns a job which can be manipulated it's not that bad.)

there are two use-cases:

  • withContext(NonCancellable) { ... } -- execute a piece of code even when outer coroutine was already cancelled
  • val job = Job(); withContext(job) { ... } -- the same, but with an added ability to do job.cancel() if needed to cancel whatever withContext is doing.

The same use-case apply to launch. Even though launch returns job, it is not always convenient, since it becomes knows only when launch had already started a coroutine:

val job = launch { ... /* oops.. cannot use job here */ ... }

So, sometimes injecting an explicit Job() helps.

Concerning this example:

suspend fun uploadFile() {
   val childJob = launch {
       // do uploading...
   }
  coroutineContext[Job]!!.invokeOnCompletion(true) { childJob.cancel() }
   childJob.join()
}

What is the purpose of writing this complicated code? You can just write:

suspend fun uploadFile() {
   // do uploading...
}

@antanas-arvasevicius
Copy link
Author

antanas-arvasevicius commented Mar 6, 2019

No, I cannot write that way because it won't run on a CoroutineScope of FileUploaderService object, which has different dispatcher and explicit job, for explicit cancellation of all uploads if "destroy" will be called. But will be run on whatever calls that uploadFile().
I think withContext is more used to change a dispatcher on which it executes a given block and that non-cancellable behaviour is only on edge cases, so I don't get why it's always non cancellable.
If I'm writing suspend fun uploadFile() = withContext(coroutineContext) { /*block*/ } I'm expecting that a given block will be run on object's scope and also be cancellable if parent scope will be cancelled. And if it only works when there are no explicit Job attribute, then what is the purpose of CoroutineScope interface?
Because without Job CoroutineScope'd object is not a coroutine scope at all, it is just Dispatcher's "switch" operation and all "launches" inside a class is unknown, everything is run on no-parent Job, no any parent-child relations, and everything is non-cancellable.

Example:

        class A : CoroutineScope {
            override val coroutineContext: CoroutineContext get() = Dispatchers.Default
            suspend fun test() {
                launch { while (isActive) { delay(1000) } }
                launch { while (isActive) { delay(2000) } }
            }
        }
        runBlocking {
            val a = A()
            val job = launch { a.test() }
            delay(1000)
            println(a.coroutineContext[Job]!!.children.toList()) //<- KotlinNullPointerException
            job.join()
        }

So, actually, using CoroutineScope interface without Job attribute is barely useful.
And using withContext WITH Job attribute is also hard to use due instant non-cancellable, breakage of parent-child relationships.
Looks like these two concepts seems not complementary and incompatible to each other.

@antanas-arvasevicius
Copy link
Author

Hi,
we've added some extension functions in our project which fixes these issues:
Now everything is within structured concurrency semantics and everything can be cancelled normally.
Shouldn't this implementation be default in Kotlin?

suspend fun <T> withCancellableContext(
    context: CoroutineContext,
    block: suspend CoroutineScope.() -> T
): T {
    val job = coroutineContext[Job]
    return withContext(context) {
        if (context[Job] != NonCancellable) {
            job?.invokeOnCompletion(onCancelling = true) { coroutineContext[Job]?.cancel() }
        }
        block()
    }
}
fun CoroutineScope.cancellableLaunch(
    context: CoroutineContext = EmptyCoroutineContext,
    start: CoroutineStart = CoroutineStart.DEFAULT,
    block: suspend CoroutineScope.() -> Unit
): Job = launch(context, start, block).also { job ->
    if (context[Job] != NonCancellable) {
        coroutineContext[Job]?.invokeOnCompletion(onCancelling = true) { job.cancel() }
    }
}

@jemshit
Copy link

jemshit commented Mar 15, 2019

Not sure if related to this thread, i have custom CoroutineScope:

A:CoroutineScope {
    protected lateinit var job: Job
    override val coroutineContext: CoroutineContext
        get() = job + Dispatchers.Main
}

And i use:

A.launch{
    val model = withContext(Dispatchers.Default) {
        deserializeModel()
    }
    // print("Deserialized, coroutineActive:${this.isActive}")
}

Even i cancel the job of Scope (job.cancel()) after deserializeModel() starts to run, i can see print line getting executed with isActive being false. Is this expected behavior? deserializeModel() has gson.fromJson. I was expecting withContext and rest of lines getting cancelled and not executed

Edit: is this related to this section in documentation (making computation code cancellable)?

@elizarov
Copy link
Contributor

@antanas-arvasevicius You don't need to pass explicit job. There is no need to write all this code and hence no need to have in the library. You can just use the job you already have in the coroutine context. If you need to cancel some job, like a file upload, just write:

suspend fun uploadFile() = coroutineScope { // creates a scoped job
    // now if you need to cancel anything inside here just call:
   coroutineContext.cancel()   
    // if you need to pass uploadFile's job somewhere else for cancellation:
    val job = coroutineContext[Job]!! // this is the Job of uploadFile
    // ...
    // do whatever you need here
}

@Dougrinch
Copy link

@elizarov Are you say that if I want to write FileUploaderService that cancel all uploads on destroy call (see @antanas-arvasevicius example above), I need maintain MutableSet<Job> manually?

@elizarov
Copy link
Contributor

I see. This FileUploaderService is an interesting use-case, indeed. We have a similar problem in Ktor. Basically, what is wanted here is a (missing now) ability to have a Job with two parents:

  1. There is a Job of FileUploaderService itself. When it is cancelled, we want all uploads to be cancelled.
  2. There is a Job in which uploadFile is called. When it is cancelled, we want this upload to be cancelled.

The proposed withCancellableContext by @antanas-arvasevicius emulates this functionality.

How can we add it to kotlinx-coroutines library? I don't have a good answer:

  1. We cannot just change the way withContext works (it is going to be a breaking change)/
  2. We cannot just add xxxCancellable variant for each builder xxx -- there is going to be too much duplication and it is not composable.

What realistic options to we have? I see one way at the moment (but I don't like it very much):

We can have some new variant of Job/SupervisorJob constructor function that works in a such a way, that going withContext(new_job_type) works like withCancellableContext. That is going to look quite error-prone in the long-term, so we could augment it with a long-term plan to deprecate usages of withContext(old_job_type) and maybe, in the future, to deprecate old Job() and SupervisorJob() constructors altogether, by replacing them with some other fresh names (this also gives us a chance to fix asymetry in Job/SupervisorJob naming).

@elizarov elizarov changed the title Beware that structured concurrency is violated when adding custom Job in launch Structured concurrency with two parent jobs Mar 31, 2019
@elizarov elizarov added enhancement design and removed docs KDoc and API reference labels Mar 31, 2019
@antanas-arvasevicius
Copy link
Author

antanas-arvasevicius commented Mar 31, 2019 via email

@elizarov
Copy link
Contributor

How about not going in two parents route, but just create new job object
which will be added as a child in caller's scope if scopes will differ.

I don't know if that is what you were thinking, but another solution is going to be to define a completely new kind of entity. Not a Job at all. Let's call it Group (not a real name, just to simplify discussion). So, instead of creating a Job inside FileUploaderService it could create Group and the use withContext(group) { ... }. That would completely abolish the need to have "two parent Jobs". However, the problem is that Group's API and implementation is going to very much like the Job.

@antanas-arvasevicius
Copy link
Author

antanas-arvasevicius commented Mar 31, 2019 via email

@svasilinets
Copy link

I've encountered this recently too. Could you please advise what is the best way to currently achieve the same behaviour without using InternalCoroutinesApi? @antanas-arvasevicius 's solution unfortunately uses invokeOnCompletion(onCancelling = true) {...} which is internal.

The way that I currently found is:

suspend fun uploadFile() {
  val callerJob = coroutineContext[Job]!!
  return uploaderServiceScope.async {
      callerJob.invokeOnCompletion { if (it is CancellationException)  this.cancel() }
      doWork()
  }.await()   
}

This small trick with async {}.await() allows to caller's Job to be completed while async block is running, once caller's job is done completion signal is received and async block can be cancelled. (note the same thing won't work with withContext)

Is there a nicer / simpler way to express it?

@Dominaezzz
Copy link
Contributor

What about this?

combineScope(secondCoroutineScope) {
    // coroutineContext[Job]::class == GroupJob::class
}

@fluidsonic
Copy link

I've always assumed that this already works out of the box because it's such a common use case. I'm surprised that it doesn't. I have the use case described by @antanas-arvasevicius and @elizarov (#1001 (comment)) quite often.

There is a wider scope (a service, a worker, an API/HTTP client, etc.) and a narrower scope (a unit of work, an HTTP request, etc.). For some reason I've assumed the following to work out of the box:

class Worker(coroutineContext: CoroutineContext) {

    private val coroutineContext = coroutineContext + SupervisorJob() + CoroutineName("Worker") // "wide scope"

    suspend fun process(work: Work) { // "narrow scope"
        withContext(coroutineContext) {
            // process work
            // canceled if either the caller's Job is canceled or the Worker's Job is canceled
        }
    }

    fun stop() {
        coroutineContext.cancel()
    }
}

What's the best approximate to do that with what we already have?

@fluidsonic
Copy link

fluidsonic commented Jan 11, 2021

I can't believe I ended up in this thread again with another unexpected behavior 😅

The following code does… absolutely nothing.

suspend fun main() {
    val flow = flowOf(1)

    coroutineScope {
        launch(NonCancellable) {
            flow.collect { number ->
                println("Processing number $number")
                delay(5_000)
                println("Done processing number $number.")
            }
        }
    }
}

Description of NonCancellable:

A non-cancelable [sic] job that is always active. It is designed for withContext function to prevent cancellation of code blocks that need to be executed without cancellation.

The side-effect that the mentioned "non-cancelable job" breaks the parent-child relationships comes quite surprising, at least to me. So the main function returns and the program terminates even before the new coroutine was launched.
Why do I need a connection between launch and the enclosing coroutineScope when it isn't honored anyway? (i.e. why do I need a scope anyway)

What I would have expected, from both "reading the code" and from the documentation:

  1. There's a new CoroutineScope and we wait for its completion.
  2. There's a new coroutine launched in the CoroutineScope, so the scope shouldn't end until that coroutine is complete.
  3. The new coroutine is launched non-cancellable, so even if I cancel the CoroutineScope the coroutineScope {…} block won't return until all non-cancellable coroutines inside it have completed.

Unfortunately that's not the case.

Background

  • I have a back-end with plenty of services, workers and coroutines.
  • When the back-end initiates a shutdown then all pending work (often Flow downstreams) should be completed without interruption. Everything else (incoming HTTP requests, scheduled jobs, etc.) must no longer happen.

With the current behavior pending work would continue due to NonCancellable context (what if the context changes to a cancelled one?). But it would be cancelled anyway because the entire program terminates due to the outermost scope completing too early.

Making the entire back-end cooperatively wait for downstream work would likely add a lot of unnecessary complexity for something that could work out-of-the-box with coroutines.


Regarding multiple parent jobs:
What would happen if there are two parent jobs (e.g. one for the worker and one of the piece of work) if one of them is cancelled and the other one is NonCancellable?

@elizarov
Copy link
Contributor

The best explanation is in this blog post of mine: https://elizarov.medium.com/coroutine-context-and-scope-c8b255d59055

We still have no "easy to use" solution to this particular problem for this particular use-case of two parent jobs.

@antanas-arvasevicius
Copy link
Author

Yeah, @fluidsonic I've been in your situation and I know what it feels. All theoretically nice "structured concurrency" just breaks apart if you pass new Job instance (and NonCancellable) in coroutine context.
For me it's a flaw in coroutines design, but seems it was solved just by extensively documenting it and pretending that the problem was solved.

I'm proposing changes is:
if withContext() launch() is called on some CoroutineScope with explicit "job" context, then it must:
a) work the same as now: coroutine will be launched on a explicitly given Job as a child
b) must create a "ExternallyDeferredJob" instance which will be registered as a child-job of scoped CoroutineScope's job and it will somehow subscribes into newly created coroutines Job and completes then coroutine completes
c) propagate cancellation in both directions.

Diagram for suggestion:

[CoroutineScope.Job]                                    [MyJob] 
            ^   (attach "synthetic" job)                     ^  (attach to a explicitly given job MyJob) (as now)
            |                                                                |  
  [ExternallyDeferredJob]                             [New Job for new coroutine]  <-----> (new coroutine)
                          ^                                                                   ^
                          ||                                                                  ||
                          ==============================
                                      1. synthetic link between states and cancellation propagation (sync between).
   

@fluidsonic
Copy link

fluidsonic commented Jan 11, 2021

The best explanation is in this blog post of mine: https://elizarov.medium.com/coroutine-context-and-scope-c8b255d59055

@elizarov do you mean the explanation regarding NonCancellable breaking the parent-child Job relationship?

The only info that's not in the documentation is this:

By convention, we do not usually pass a Job in a context parameter to launch, since that breaks parent-child relation, unless we explicitly want to break it using a NonCancellable job, for example.

The documentation doesn't make that clear at all. Only if you are highly familiar with how Jobs work then "A non-cancelable job" will raise a warning to you.

It should renamed to NonCancellableJob if the behavior is intended to stay that way to make it much more obvious - together with improved documentation.
In the future NonCancellable could be re-used for non-cancellable work that doesn't break the parent-child relationship.

@antanas-arvasevicius
Copy link
Author

My suggestion is that API must prohibit any possibility of breaking the chain of parent-child relationships and those relationships can be created only by calling CoroutineScope.launch or CoroutineScope.withContext or if you need "new" chain you'll call GlobalScope.launch () { }

Btw. with android examples where were a classes with internal customScope val customScope = CoroutineScope(Dispatchers.Main + SupervisorJob())

Also will break structured concurrency if you'll do something like:

suspend fun uploadFile(file:String) {
    withContext(customScope) { // <- because custom scope has explicit "job"
       // do upload...
    }
}

@fluidsonic
Copy link

fluidsonic commented Jan 11, 2021

I agree. It's super easy to mess up that relationship without noticing, esp. when you're new to coroutines and even if you're intermediate. We always have to wonder what context a job comes from - if it has any, when we (accidentally) switch to another job tree, and what happens if multiple jobs collide.

CoroutineContext is not type-safe in that regard, nor is the current coroutines API. For example NonCancellable and constructors like Job() and SupervisorJob() that default to having no parent are way too easy to use. Having a parent-relationship should be the norm, so having no parent should be more explicit. Same for switching the job.

And if something needs a coroutineContext, is it okay to pass one with a Job or not? Things like flowOn(…) or launchIn(…).

I've read all info on coroutines that I could find, sometimes multiple times. I've used them a lot. And I still keep getting it horribly wrong. It's quite a steep learning curve.

@elizarov
Copy link
Contributor

elizarov commented Apr 6, 2021

The rule of thumb that is we use in API:

  1. If it is using CoroutineContext then you should NOT have Job there.
  2. If it is using CoroutineScope then the scope should have a Job.

What we are seeing here, is that violations of the 1st rule break the structured concurrency without being apparent. I'd love to enforce it stronger and require an explicit opt-in for breaking structured concurrency, but I don't have any nice ideas on how we might implement this kind of enforcement in a backward-compatible way. The only easy fix is more docs: #2633

For more substantive fixes, I see the following potential directions:

  1. Totally deprecate NonCancellable and, instead, implement and purpose non-cancellable scope function to replace withContext(NonCancellable) { ... } idiom.
  2. As a part of Generalize coroutineScope/supervisorScope with user-specified strategy #1406 we can add more options to coroutine builders, including explicit opt-out of structured concurrency, and then deprecate launch(NonCancellabe) { ... } without that explicit option.

However, I don't see any easy way to detect and forbid code like this:

val myContext: CoroutineContext = Job() // notice the type: CoroutineContext 
launch(myContext) { ... } // implicitly opted out of structured concurrency

@LouisCAD
Copy link
Contributor

LouisCAD commented Apr 6, 2021

The code of coroutine builders like launch can check for the presence of a Job inside the passed CoroutineContext, and there could be a system property to tweak the behavior on violation among:

  1. No check
  2. Warn
  3. Crash/forbid

The default value could then move forward across major kotlinx.coroutines, to have developers take notice and fix possible violations.

@elizarov
Copy link
Contributor

elizarov commented Apr 7, 2021

It is not a very good idea to emit runtime warnings from the library. We could do it, but that is a last-resort measure for problems of a grandiose scale.

@LouisCAD
Copy link
Contributor

LouisCAD commented Apr 7, 2021

Why is it not a very good idea?

While I agree that runtime warnings are likely to never be seen, there's solutions to this problem. Here's one from the top of my head:
A tool included in the app (which can be excluded any time) that registers all runtime issues like that one, and forwards them to the developer right in the IDE or another place with as much context as possible (thread, date-time, current stack, coroutine info if possible…).

I'd love such a thing landing in the Kotlin ecosystem, even if it starts with a barebones approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants