Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore timeouts using virtual time #130

Closed

Conversation

jingibus
Copy link
Collaborator

Proposing this PR because timeouts are a critical part of the dev workflow for us. They happen all the time; runTest's 60s wall clock time timeout is okay, but:

  1. 60s is long. Too long to use for a situation that represents a bug that I might want to TDD on
  2. No feedback is provided on the await call site that caused the issue.

Internally at Cash App, we've just been using runBlocking with wall clock time timeouts for this scenario. This meets a lot of our needs, but also has a couple of major drawbacks:

  1. Even at 1s wall clock time, that's annoying.
  2. 1s is way too long for the happy path. (e.g. I might want to assert that I send some event to a presenter, and nothing happens in response)

This PR is an attempt to restore timeouts with good support for virtual time on runTest. Two big changes from the old support:

  1. All timeouts are wrapped in AssertionError with an appropriate error message.
  2. The timeout duration itself is provided via a CoroutineContext element. This helps keep parameters from proliferating, and makes it easier to declare this once.

Very open to feedback.

@jingibus jingibus force-pushed the wphillips/2022-07-26/restore-timeouts branch from 73c0235 to 01dbefa Compare July 27, 2022 15:40
@@ -88,18 +91,26 @@ public operator fun <T> Turbine<T>.plusAssign(value: T) { add(value) }
/**
* Construct a standalone [Turbine].
*/
public fun <T> Turbine(): Turbine<T> = TurbineImpl()
public fun <T> Turbine(timeoutMs: Long? = null): Turbine<T> = TurbineImpl(timeoutMs = timeoutMs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout: Duration please! You should be able to keep safe types all the way down now, I think. There definitely was a delay overload but I'm not sure about withTimeout.

Comment on lines +172 to +174
} catch (e: Exception) {
println("Caught it! $e")
throw e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert?

Comment on lines +190 to +191
is Event.Item<T> -> { /* Success */
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert. Did an autoformatter do this? We can change it to like

-> {
  // Success
}

or something to prevent it from interfering.

import kotlinx.coroutines.currentCoroutineContext
import kotlinx.coroutines.withContext

public const val DEFAULT_TIMEOUT_MS: Long = 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem like it needs to be public API.

/**
* Sets the timeout for all [Turbine] instances within this context.
*/
public suspend fun <T> withTurbineTimeout(timeoutMs: Long, block: suspend CoroutineScope.() -> T): T {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public API? When would you not be in control of the turbine instantiation such that you could just pass the desired value as a parameter directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of scenario is what I was thinking of:

@Test fun myTest() = runTest {
  val subject = Subject(this /* CoroutineScope */, fakeCollaborator)

  subject.doSomething()

  assertThat(fakeCollaborator.outputTurbine.awaitItem()).isEqualTo("hi there")
}

Wrapping a block like this with withTurbineTimeout makes it possible to assign delays to any fake interactions, even if .test { ... } isn't being used.

@@ -92,10 +62,11 @@ public suspend fun <T> Flow<T>.test(
* ```
*/
public suspend fun <T> Flow<T>.test(
timeoutMs: Long? = null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only public API should be Duration. We can delete the deprecated Long overloads completely now that 0.8 is shipped.

@@ -118,8 +89,8 @@ public suspend fun <T> Flow<T>.test(
* Unlike [test] which automatically cancels the flow at the end of the lambda, the returned
* [ReceiveTurbine] must either consume a terminal event (complete or error) or be explicitly canceled.
*/
public fun <T> Flow<T>.testIn(scope: CoroutineScope): ReceiveTurbine<T> {
val turbine = collectTurbineIn(scope)
public fun <T> Flow<T>.testIn(scope: CoroutineScope, timeoutMs: Long? = null): ReceiveTurbine<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samesies

Comment on lines +83 to +84
// *With* a test scheduler, we must poll and nudge the clock
// until some kind of result is produced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. Before removing timeout, I changed it to always be wallclock time and never dispatcher time. If you want to assert a timeout on the dispatcher's fake time then you can write assertFailsWith<TimeoutException> { withTimeout(1.seconds) { awaitEvent() } }. Turbine's timeout was always meant to be a hang detector. I don't like that this makes the behavior subtle and reliant on factors which are hard to document and hard to discern when glancing at a test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points need to be responded to: (1) whether timeouts have a role other than bailing out hung code, and (2) whether withTimeout(1.seconds) is sufficient as an implementation.

  1. Hangs are garden variety error conditions in CSP-style concurrent code; timeouts at receive sites provide actionable feedback on why these hangs occur.

Here's an example of a test that fails by hanging:

@Test fun logsSubmitClicked() = runTest {
  val presenter = Presenter(fakeAnalytics, fakeAppService)
  fakeAppService.responses += ApiResult(Success("What's new today"))
  presenter.models(events).test {
    assertThat(awaitItem()).isEqualTo(Loading)
    events.emit(SubmiClicked())
    assertThat(fakeAnalytics.events.awaitItem()).isEqualTo(SubmitClickAnalyticsEvent())
    assertThat(awaitItem()).isEqualTo(Content("What's new today"))
  }
}

Some failure modes for this test:

  • Loading isn't emitted first, but skips straight to Content
  • Incorrect Content
  • Analytics event is never emitted
  • Content is never emitted

Half of those will present as a hung test. Seeing exactly which one of these happened in CI is invaluable.

  1. I had hoped that this:
runTest {
  flow.test {
    assertFailsWith<TimeoutException> { withTimeout(1.seconds) { awaitEvent() } }
  }
}

...would work with our new runTest. I initially wrote it that way, and unfortunately it does not: withTimeout(1.seconds) requires a concurrent coroutine to tick the virtual clock in order to advance. This makes it useless as a testing API.

I don't like that this makes the behavior subtle and reliant on factors which are hard to document and hard to discern when glancing at a test.

The delay(timeoutMs / 10) is the real bugaboo that needs answering.

I am able to convince myself that automatic manipulation of virtual time is worth it, because

  1. It speeds iteration on failing tests that timeout
  2. it makes it possible to implement awaitNoEvents(). awaitNoEvents() is something colleagues have asked for that solves some real testing problems, and it can't realistically be done on wall clock time. (it'd have to do something hacky like repeat(10) { yield() }; expectNoEvents()) My answer thus far has been, "Tough, you can't logically assert that nothing happens in response to a stimulus," but it sure would be nice to actually solve the problem instead being a difficult theory nerd.

Implicitly advancing the clock is a pretty hard nut to swallow, though, and delay(timeout / 10) even more so. delay(timeout / 10) can probably be rewritten to be delay(10) and solve the problem without being too confusing, but if you think that implicitly advancing virtual time will cause too much confusion to be worth the advantages above, then this approach should be ditched in favor of wall clock time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we want to land on this? If you're not comfortable with the virtual time thing, I would like to get wall clock merged as soon as possible. Without it, I don't know how to tell people how to use 0.9. I keep on hacking around it myself.

Re: details of wall clock time, I am concerned about the need to use Dispatchers.Default. It doesn't feel safe. Even if we did something like this:

  val callingContext = currentCoroutineContext()
  withContext(Dispatchers.Default) {
    withTimeout(timeoutMs) {
      withContext(callingContext) {
        receive()
      }
    }
  }

...having multiple threads involved makes me antsy.

@jingibus
Copy link
Collaborator Author

jingibus commented Sep 3, 2022

Closing in favor of #140.

@jingibus jingibus closed this Sep 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants