Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support completing the test coroutine from outside the test thread. #1206

Closed

Conversation

objcode
Copy link
Contributor

@objcode objcode commented May 18, 2019

Implement an eventLoop for runBlockingTest.

As a result, runBlockingTest is now correct in the presence of repeated
time-delayed dispatch from other dispatchers.

Changes:
- runBlockingTest will now allow a 30 second timeout for other
dispatchers to complete coroutines
- Introduced WaitConfig, SingleDispatcherWaitConfig, and
MultiDispatcherWaitConfig to configure runBlockingTest timeout behavior
- Added DelayController.queueState as a ConflatedBroadcastChannel to
observe or poll the queue status of a TestCoroutineDispatcher
- Added queue status of Idle, HasCurrentTask and HasDelayedTask as
public APIs for quering the status of a TestCoroutineDispatcher
- Added dependency on runBlocking from runBlockingTest
- Improved documentation for threading concerns around resumeDispatche

@objcode
Copy link
Contributor Author

objcode commented May 18, 2019

Note to reviewer: Reading the code, I believe invokeOnCompletion will get a invoked result if the coroutine was already completed in every case (including completion by another thread). Can you confirm that this understanding is correct and the code here is sound?

Thanks,
Sean

@Zhelyazko
Copy link

Any ETA on when this will be released?

Copy link
Member

@qwwdfsad qwwdfsad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed change does not fix the problem, but masks it, adding even more non-determinism and making test flakier and much harder to debug/diagnose for inexperienced developers.

For example, the test from #1222 now passes:

@Test
fun foo() = runBlockingTest {
    println(1)
    withContext(Dispatchers.IO) {
	    println(2)
    }
    println(3)
}

But not because the fix actually fixes that, but because it hides non-determinism with the additional check. You can see that by adding a random delay right after println(2) and observing the failure ratio (it will be linearly increasing with the delay value). The similar effect can be achieved by adding Thread.sleep(rand()) right after advanceUntilIdle call.
While the error message mentions this fact, it still appears pretty non-deterministic. E.g. a developer ran her tests once, they passed, the developer pushed changes and then tests fail on CI 15 minutes later.
I am sure we can provide better user experience here :)

Additional observation, not related to this particular fix, but to establish more common ground (sorry, I've missed the original design discussion, so maybe it was already discussed before):
End-users tests behave inconsistently depending on the target dispatcher. For example:

@Test
fun foo() = runBlockingTest {
    launch {
        delay(1000) // Or any blocking/external call, e.g. Thread.sleep 
    }
}

this test is perfectly valid, though launch is not yet completed by the end of runBlockingTest block, while the same code, but with different dispatcher (e.g. launch(Dispatchers.IO)) is not.
Even worse, it is now timing dependent as we've seen in #1222. It is clear (for me) neither from the implementation point of view nor from API design shape why this is the desired behaviour for end users.

These facts (alongside with very suspicious invokeOnClose{}.dispose()) makes me think that we should change the design of runBlockingTest itself to make it deterministic and consistent.
The first thing that comes to my mind is an event loop with a timeout:

withTimeout(DEFAULT_TIMEOUT) {
    while (!isCompleted()) {
        advanceUntilIdle() // TODO should not burn 100% CPU when completion comes from another thread
    }
}

if default timeout is any reasonable value (e.g. 1-5 seconds), then most of the determinism will disappear, and for integration tests or advanced use-cases we can introduce one more parameter and/or JUnit rule.

WDYT?

kotlinx-coroutines-test/src/TestBuilders.kt Outdated Show resolved Hide resolved
@objcode
Copy link
Contributor Author

objcode commented Jun 4, 2019

Thank you for the detailed review! Agreed completely this is a surprising API and introduces an unfortunate non-determinism.

Putting together a new commit based on your suggestions.

This requires a call to invokeOnCompletion to get the completion state
from the other thread.

Changes:
   - runBlockingTest now supports completing the test coroutine from
     another thread
   - success and failure path tests

While fixing this, a subtle non-determinism was discovered that would
lead to flakey tests. If another thread completed the test coroutine
*during* the cleanup checks it was possible to modify the state of the
test during the cleanup checks in a way that could expose false-positive
or false-negative results randomly.

To resolve this, a non-completed coroutine immediately after
advanceUntilIdle is now considered a failure, even if it completes
before the more aggressive cleanup checks. There is still a very brief
window (between advanceTimeBy and getResultIfKnown) that non-determinism
may be introduced, but it will fail with a descriptive error message at
on random executions directing the developer to resolve the
non-determinstic behavior.

Note: testWithOddlyCompletingJob_fails may fail when the implementation
of runBlockingTest changes (it's very tightly coupled).
As a result, runBlockingTest is now correct in the presence of repeated
time-delayed dispatch from other dispatchers.

Changes:
- runBlockingTest will now allow a 30 second timeout for other
dispatchers to complete coroutines
- Introduced WaitConfig, SingleDispatcherWaitConfig, and
MultiDispatcherWaitConfig to configure runBlockingTest timeout behavior
- Added DelayController.queueState as a ConflatedBroadcastChannel to
observe or poll the queue status of a TestCoroutineDispatcher
- Added queue status of Idle, HasCurrentTask and HasDelayedTask as
public APIs for quering the status of a TestCoroutineDispatcher
- Added dependency on runBlocking from runBlockingTest
- Improved documentation for threading concerns around resumeDispatcher
@objcode objcode force-pushed the supportCompletionOutOfTestThread branch from 4da43bf to d3f40a9 Compare June 20, 2019 06:23
@objcode
Copy link
Contributor Author

objcode commented Jun 20, 2019

@qwwdfsad @elizarov I've uploaded a proposed change that adds an event loop to runBlockingTest. PTAL!

If you can take a look for API review I'll make any updates necessary. One big decision that I wasn't sure about - there is a way to implement this by making TestCoroutineDispatcher be an EventLoop. However, I do kind of like the current impl (using a state channel) since it also exposes the API requested in #1202.

For the parameter, I ended up giving it a class since it was really confusing to pass a timeout of 0 for the expected failure case (it felt like it would lead to false positives when using).

See

runBlockingTest(timeout = 0) { }

vs.

runBlockingTest(waitConfig = SingleDispatcherWaitConfig) {}

Let me know if you think this works or if you would prefer to see it implemented with an EventLoop.

@mariusboepple
Copy link

Is it possible that this is also very important for testing actor? I've implemented something according to this https://kotlinlang.org/docs/reference/coroutines/shared-mutable-state-and-concurrency.html#actors and now getting:

java.lang.IllegalStateException: This job has not completed yet
at kotlinx.coroutines.JobSupport.getCompletionExceptionOrNull(JobSupport.kt:1131)
at kotlinx.coroutines.test.TestBuildersKt.runBlockingTest(TestBuilders.kt:53)

Is the actor pattern covered with tests?

Copy link
Contributor

@elizarov elizarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a first pass mostly at API surface. Please, take a look.

kotlinx-coroutines-test/src/TestCoroutineDispatcher.kt Outdated Show resolved Hide resolved
kotlinx-coroutines-test/src/DelayController.kt Outdated Show resolved Hide resolved
kotlinx-coroutines-test/src/TestBuilders.kt Outdated Show resolved Hide resolved
Copy link
Member

@qwwdfsad qwwdfsad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed only API surface, let's put implementation details aside (I can help you with them later if necessary) and focus on improving public API shape and desired behaviour.

I've also added test tag to simplify feedback processing.

@@ -112,9 +113,75 @@ public interface DelayController {
* Resumed dispatchers will automatically progress through all coroutines scheduled at the current time. To advance
* time and execute coroutines scheduled in the future use, one of [advanceTimeBy],
* or [advanceUntilIdle].
*
* When the dispatcher is resumed, all execution be immediate in the thread that triggered it. This means
* that the following code will not switch back from Dispatchers.IO after `withContext`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are here, could you please elaborate on why this behaviour is preferred over "classic" dispatching?
I am a bit concerned about how this behaviour is different from "release" builds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main goal of this eager behavior was to make this test function testable without having to call runCurrent():

fun launchesCoroutine() {
    scope.launch {
        // do stuff
    }
}

In a test that doesn't use multiple threads (TestCoroutineDispatcher is used for all coroutines) it provides eager entry into the launch body (during dispatch the launch body is immediately executed).

As this is both the normal structure for this code (at least on Android) it is nice to avoid extra calls to runCurrent() in this test

@Test fun callLaunchesCoroutine() = runBlockingTest {
    // assume `scope` is using this TestCoroutineDispatcher
    launchesCoroutine()
    // coroutine has already launched here
}

This thread switching behavior is an undesired side effect of that API choice.

So basically, I think there are two options here:

  1. Modify the behavior of dispatch to require a call to runCurrent() in this test. This does make common test cases require an extra call, but leads to correct threading behavior if the developer hasn't injected TestCoroutineDispatcher throughout their code under test.
  2. Keep the behavior (prefer eager entry of launch blocks) with the assumption most tests will not execute coroutines on multiple threads.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

"Keep the behavior" part is, of course, more preferable as it really simplifies writing simple unit tests for end users. I've just realized that this is exactly why we thought to make Uncofined the default behaviour of TestMainDispatcher.

I think it's easier to change the docs and explain that test dispatcher acts like unconfined one (and maybe implementation should use it as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a pretty good way to explain it. I'll add that to the docs in this PR since it's become all about threading.

* runBlockingTest {
* pauseDispatcher()
* withContext(Dispatchers.IO) { doIo() }
* // runBlockingTest has returned to it's starting thread here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit counter-intuitive as well, pauseDispatcher does not pause the dispatcher, but changes its behaviour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - this one I'm stuck on. It seems required to support the pauseDispatcher { } block in runBlockingTest but it has this side effect in the presence of multi-threading.

kotlinx-coroutines-test/src/DelayController.kt Outdated Show resolved Hide resolved
kotlinx-coroutines-test/src/DelayController.kt Outdated Show resolved Hide resolved
kotlinx-coroutines-test/src/TestBuilders.kt Outdated Show resolved Hide resolved
@qwwdfsad
Copy link
Member

qwwdfsad commented Jul 4, 2019

Minor notes: all public API should have explicit public modifier and ExperimentalCoroutinesApi

@objcode
Copy link
Contributor Author

objcode commented Jul 8, 2019

Sorry delayed response (just got back to work after travel) - taking a look at comments above.

@objcode
Copy link
Contributor Author

objcode commented Jul 8, 2019

Ok after reading all the feedback here's a rough plan for this PR:

  1. I'll revise the current code to remove both of the new interfaces by switching to an EventLoop based implementation.
  2. A decision needs to be made about the behavior of TestCoroutineDispatcher.dispatch.

The first will help us focus in on the API surface without the distractions of the extra classes/channel API.

The second seems important and should land before this library is considered stable. I would like to advocate for keeping the current behavior since it makes single threaded testing easier (which is the recommended practice) but can see the complexity exposed by it in a multi-threading behavior change. I am curious if there is a way to unify both API goals? If so that would be the best approach.

@elizarov
Copy link
Contributor

Waiting for updates.

@qwwdfsad
Copy link
Member

qwwdfsad commented Aug 7, 2019

Hi @objcode, do you have any update on this?
We don't mind finishing the work by ourselves if you are short on time to prevent this issue from getting stuck

@objcode
Copy link
Contributor Author

objcode commented Aug 8, 2019

Definitely - sorry delay had a few interruptions. Let me put together the event loop impl. tonight and then pass it over to you for finalization.

for external parking via a single suspend function.
@objcode
Copy link
Contributor Author

objcode commented Aug 9, 2019

@mariusboepple The actor issue you're seeing above is likely because you're not terminating the actor before runBlockingTest completes. In the case of an infinite loop such as an actor, you'll need to cancel it and ensure it's complete before exiting runBlockingTest or it will detect the leaked coroutine.

@objcode
Copy link
Contributor Author

objcode commented Aug 9, 2019

@qwwdfsad @elizarov updated PR with requested changes. I was unable to find a working solution using EventLoop/unpark() since that's all implemented in EventLoopImplBase so I externalized parking via a small additional public interface https://github.com/Kotlin/kotlinx.coroutines/pull/1206/files#diff-e3c98820170c8b47a0aad1428e940c2cR255

PTAL!

@winfoozmnayef
Copy link

Any update on this pull request?

@qwwdfsad
Copy link
Member

qwwdfsad commented Oct 8, 2019

Moving it to coroutines-test branch, will open PR soon

@qwwdfsad qwwdfsad closed this Oct 8, 2019
@anudeep3998
Copy link

Any updates on the release ?

@mdenburger
Copy link

Where did the moved PR end up? This issue is closed, but kotlin-coroutines-test 1.3.8 still has this problem.

@ymaniz09
Copy link

1.4.0-M1 still has this issue

@totemcaf
Copy link

It seems that 1.4.0 still has this issue

@zivkesten
Copy link

kotlinx-coroutines-test:1.4.1 still has this issue

@Purfakt
Copy link

Purfakt commented Jan 29, 2021

kotlinx-coroutines-test:1.4.2 still has this issue

@ivanbartsov
Copy link

@qwwdfsad
The issue seems to still be valid with kotlinx-coroutines-test:1.4.3

Moving it to coroutines-test branch, will open PR soon

Can I ask you to link the succeeding PR (if any) here for future references? A couple of open issues (#1204 #1222) lead here and this PR kinda makes an impression it's been closed and abandoned for good.

@qwwdfsad
Copy link
Member

We've actually realized the amount of work needed to be done to get this module in order and decided to redesign it from scratch, trying to address all the issues with test label on it.

This process is quite long-running and painful because we, in addition, don't want to force to migrate all the users of this module to a new API and are trying to maintain backward compatibility.
We are working on the prototype, but unfortunately, we don't have a timeframe for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet