Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide facilities for non-threadsafe mutable coroutine context element #2839

Closed
elizarov opened this issue Jul 22, 2021 · 8 comments
Closed

Comments

@elizarov
Copy link
Contributor

elizarov commented Jul 22, 2021

Consider the use-case of keeping a trace of operations. In thread-based programming, the data structure that keeps the current trace is stored in a ThreadLocal variable and gets updated by the corresponding tracing calls in the thread. With coroutines, one can use ThreadContextElement to keep a reference to this structure. However, this element will get automatically inherited by children of the coroutine which might execute concurrently. This concurrency poses a problem when the data structure keeping a trace is not thread-safe and making it thread-safe might be impractical for performance reasons. The proposed solution is to add some mechanism, that would let the coroutine context element to copy itself during the creation of concurrent coroutines so that there can be a guarantee that each copy is accessed only sequentially.

@elizarov
Copy link
Contributor Author

elizarov commented Jul 22, 2021

The design that I would propose for this problem is to add a new method to the ThreadContextElement interface that has an opportunity to create a copy of the ThreadContextElement every time a concurrent coroutine is launched by a coroutine builder (launch, async, etc) with the following tentative signature and name:

fun copyForChildCoroutine(context: CoroutineContext): ThreadContextElement<S> = this // default impl does not copy

Note, that is is not necessarily called every time the new Job object is created. For example, withTimeout and withContext builder function all create a new coroutine context with a new Job, but they will not call this copying function as they are not creating a new concurrent coroutine and guarantee sequential execution of the code.

UPDATED: It needs to have CoroutineContext parameter, too

@yorickhenning
Copy link
Contributor

I think that'll work? Any overridable factory function called by the parent Job for each context element during initialization of a new child context.

Would you consider making copyForChildCoroutine() a function of CoroutineContext.Element instead of ThreadContextElement? It seems odd to give copy construction facilities to only one Element subtype. A ThreadContextElement subtype with its other functions given empty bodies will behave equivalently to a direct CoroutineContext.Element subtype having the same copy function. I think. So, I don't think restricting copying semantics to a subtype will prevent any kinds of supertype implementations from occurring.

Subtype<S>: Subtype<S> { fun foo(): Subtype<S> = this } can be migrated to Supertype { fun foo(): Supertype<S> = this } later, so I don't think it's a critical point. Such a change might cause an issue with JVM signature incompatibility between versions and so be a breaking change? I'm not sure, offhand.

I'll try this out.

@pyos
Copy link

pyos commented Jul 23, 2021

I added a new interface in the example implementation for two reasons:

  1. this doesn't really seem to do anything with threads, so the reason for ThreadContextElement specifically to have such a method is unclear;
  2. due to the underlying data structure used by coroutine contexts being a linked list-based map, reconstructing the entire thing for no reason might be slow (I haven't measured though), so having an explicit interface marker for elements that will change allows for a fast path in the case where the context stays the same.

@elizarov
Copy link
Contributor Author

elizarov commented Jul 23, 2021

As long as we don't have a use-case for this copying facility that is separate from ThreadContextElement, I don't see a reason to declare a separate interface for it. The way the problem is currently stated in the issue, it is intrinsically tied to "mutable ThreadLocal replacement", hence the solution needs to be tied to ThreadContextElement (which is a ThreadLocal bridge).

@yorickhenning
Copy link
Contributor

The proposed signature, with or without the `CoroutineContext parameter, seems to work.

UPDATED: It needs to have CoroutineContext parameter, too

this will always be a member of the passed-in CoroutineContext? The parameter is to allow the element to decide whether or not to construct a new instance based on elements other than this in the parent CoroutineContext?

@yorickhenning
Copy link
Contributor

We're revisiting this issue.

CopyableThreadContextElement as implemented doesn't copy the ThreadContextElement in all cases it would need to. Parallelism isn't required for incorrect behaviour. It's possible to arrange coroutines so memory access is sequential but writes aren't isolated.

Working counterexample:

// Contains a “root” CopyableTraceThreadContextElement for someThreadLocal
// that manages happens-before writing back from the `ThreadLocal` to the
// coroutine's Element.
val dependencyInjectedContext
 
fun f() {
  runBlocking(
    dependencyInjectedContext
  ) {
    api.setSomeThreadLocal(3)
    launch {
      check(api.getFromSomeThreadLocal() == 3)
      api.setSomeThreadLocal(3)
      withContext(
        dependencyInjectedContext
      ) {
        // This block has a problem. A second coroutine is using
        // `dependencyInjectedContext`. Since `dependencyInjectedContext`
        // is the `context` parameter to a coroutine builder, that
        // TraceThreadContextElement  _overrides_ the TraceThreadContextElement
        // that copy-constructed for this child coroutine when
        // `dependencyInjectedContext` was folded left onto the running context.
        //
        // This block should have `someThreadLocal.get() == 7` and its writes
        // should be isolated. Instead, it's sharing an element with the outermost
        // `runBlocking {}` coroutine, so its writes will be made visible to
        // the grandparent coroutine in an unsound way, and resumptions may restore
        // the grandparent's
        api.setSomeThreadLocal(3)
      }
    }.join()

    check(api.getFromSomeThreadLocal() == 3) // Passes only if the inner block doesn't write.
  }
}

CopyableThreadContextElement should probably be removed from the library absent any objections.

We're working on a replacement.

yorickhenning pushed a commit to yorickhenning/kotlinx.coroutines that referenced this issue Jan 28, 2022
…ine()`. (Kotlin#3025)


* This is a `ThreadContextElement` that is copy-constructed when a new coroutine
is created and inherits the context.


Co-authored-by: Tyson Henning <yorick@google.com>

Fixes Kotlin#2839
@qwwdfsad
Copy link
Member

@yorickhenning please see the last section of the original design document, I've added a new implementation proposal.

It seems like CopyableThreadContextElement is still the right primitive to do the job, it just requires some fine-grained tuning.

@yorickhenning
Copy link
Contributor

Following up, now that we've had the revised version in production for a while.

Monitoring found ultra-rare consistency errors a few months ago - somewhere in the neighbourhood of P=1/10^12 to P=1/10^15. I assumed a data race remained. After getting to reliable heuristic reproduction in a unit test, the data race turned out to be #2930. Upgrading past the fix for that issue quieted monitoring, and we've found no new races so far.

It works.

pablobaxter pushed a commit to pablobaxter/kotlinx.coroutines that referenced this issue Sep 14, 2022
…ine()`. (Kotlin#3025)


* This is a `ThreadContextElement` that is copy-constructed when a new coroutine
is created and inherits the context.


Co-authored-by: Tyson Henning <yorick@google.com>

Fixes Kotlin#2839
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants