Add a Flow.iterate method, allowing an Iterator-style traversal of flows. #3278

lowasser · 2022-05-10T21:23:21Z

Implements #3274.

…ows.

lowasser · 2022-05-10T22:45:37Z

(Not ready for submission; needs tests.)

elizarov · 2022-05-11T10:10:53Z

I don't see how that's different from the flow-to-channel conversion, that's already supported. E.g. instead of:

flow.iterate { iter ->
   while (iter.hasNext()) {
       if (!predicate(iter.next())) return@iterate false // stops collecting the flow
   }
   true
}

You can write, to the same effect:

coroutineScope { // optional, you might already have a scope
   flow.produceIn(this).consume { // consume ensures that the channel is closed
      val iter = iterator()
      while (iter.hasNext()) {
         if (!predicate(iter.next())) return@consume // stops collecting the flow
      }
   }
}

You can create a simple iterate extension in your project to capture this pattern, if needed. You don't need all these FlowIterator definitions.

lowasser · 2022-05-11T18:09:14Z

We want something linear: something that doesn't buffer ahead, something that doesn't collect the flow concurrently with the handling of its elements. This is especially bad in the case of testing, where we want a very strict sequence of stimulus and response.

lowasser · 2022-05-11T18:18:05Z

In particular, this should deterministically succeed:

flow {
  emit(1)
  throw Exception()
}.iterate { iter ->
   while (iter.hasNext()) {
       if (itr.next() % 2 != 0) return@iterate true
   }
   false
}

elizarov · 2022-05-12T14:59:08Z

There is no thread in the code I've posted. It launches a new coroutine just like your code. I don't see much difference. What I say is that there is already a solution via channels that does essentially the same thing that you propose.

lowasser · 2022-05-12T23:40:14Z

Let me demonstrate the difference with some diagrams and some Kotlin Playground demos. It might be a difference you consider "essentially the same thing," but we've found it matters a lot in practice for us (Google).

Let A be the coroutine launched by produceIn, that collects the flow and sends it to a channel, and let B be the consumer of the flow.

Suppose the channel is a RENDEZVOUS channel, for the moment. (This is the best case scenario for the issue I'm going to discuss.) Let's suppose the flow emits X, Y, and let's step to the point in the program where A has emitted X and B has received X from the channel.

What produceIn -- or any solution based on a single Channel sending from A to B -- would do is to resume A now, working on emitting Y, while B is still processing X.

Here is a diagram of the control flow:

A(X)-B(X)-B(Y)
     A(Y)
      ^
      B(X) and A(Y) run concurrently/race
      that's the specific issue we care about
      it seems impossible to avoid in solutions that send from A to B via a channel

On the Kotlin playground, we can demonstrate this behavior: when I run https://pl.kotl.in/O04Er2PT5 (which does use buffer(0)), it prints

A(x)
A(y)
B(x)

It's clear that this could happen with nonzero buffer sizes, but it still happens even with 0/RENDEZVOUS. It is unavoidable with produceIn, as far as we can tell.

What we want instead

Instead, we want the control flow to look like:

A(X)-B(X)-A(Y)-B(Y)

with the strict alternating ordering. (This is the same control flow we'd have, for example, if B was a simple map or filter, but not if B was buffer.)

It launches a new coroutine just like your code

The PR does launch a separate coroutine to collect the flow, but coordinates between them to guarantee that they don't actually execute concurrently, and alternate as described above. That's the "new" thing in the PR.

Specifically, B suspends at a call to hasNext and passes its continuation to A via an internal channel. A receives the continuation on the channel and resumes, collecting the flow until it gets the next element. A then passes the element back to B via the continuation (resuming B), and suspends until B calls hasNext again to send another continuation. So A is always suspended while B is running, and B is always suspended while A is running.

https://pl.kotl.in/GB8_0J6NN demonstrates the proposed implementation (with some bugfixes I'll merge) and that it accomplishes the desired behavior (at least for a simple example).

// both versions terminate collection at B(x)
produceIn
A(x)
A(y)
B(x)
iterate
A(x)
B(x)

Why this matters

This is a subtle distinction, and you may still consider these "essentially the same thing." But there are several differences we care about. None of these are theoretical, we have encountered these issues the hard way.

In test use cases, debugging becomes significantly harder when A and B are going concurrently, or even in a single-threaded environment, if their ordering is not defined. The desirable state when debugging is that, at a breakpoint at B(X), A is suspended at the point where it just emitted X and has not done any more work. produceIn cannot guarantee this; iterate can.
The case I outlined in my earlier comment does not succeed deterministically with produceIn. A(Y) is throw Exception, and B(X) is return true, and since they race with produceIn, the function succeeds or fails nondeterministically. This comes up frequently in tests: flows often represent streaming RPC results, which may well throw partway through collection. We often test that case, and want something iterator-like so we can say "this stimulus, then the flow emits this, then this stimulus..." Using produceIn makes such tests almost unavoidably flaky because of races of that form.
Many flow transformations we want to build and run in production want to consume flows in this style and offer determinism guarantees to our users about what is and isn't consumed when. This isn't just handy for tests, but in code that wants to be testable and predictable.

This is not a one-or-two project need for us. Googlers have wanted this practically everywhere flows are tested, plus a variety of production use cases. Using produceIn and similar alternatives has resulted in observable issues like flaky tests and more difficulty in debugging, because of the described race.

Hopefully I've

described the desired behavior
proved that produceIn doesn't provide that behavior with the Playground demo
described how the PR hopes to implement the desired behavior
showed that, at least in a simple Playground example, the PR does implement the desired behavior
described why the desired behavior is not "essentially the same" to us
described why we need the desired behavior at more than a "project-level helper" level

elizarov · 2022-05-13T08:07:44Z

Thanks a lot for an explanation! Now I think I understand what exactly you are trying to achieve.

Now, we need to better understand use-cases for such an iteration primitive. It seems that the motivating example you've provided can be readily implemented using collectWhile (see #1087). collectWhile has a much smaller API surface, a simpler implementation, and much better performance, so it would be a preferred solution for this particular use-case. Do you have any other use-cases in mind?

qwwdfsad · 2022-05-16T08:47:09Z

Hi Louis, I'm back from vacation, so expect much more timely responses now.

Before discussing the API, let me express my general concerns about the push model -- it's both significantly[0] slower than the pull model in general and much more familiar to users. So, all things being equal, it's likely that a person not-yet-familiar with flows or Rx paradigm would pick push-based API just because it's more familiar, and because it takes less cognitive load to express the desired operator/API/transformation/business entity.

[0] -- it's not an exaggeration. Here is the simple benchmark that stresses pull vs push and push-based API is 30 times slower. I realize that the prototype is not yet polished, but my educated guess is that even if we cut all non-essentials (namely, channel), it still will be at least 5 times slower.

So, if we decide to introduce a push-based general-purpose API, its benefits should significantly outweigh these cons.
It looks like the PR is mainly driven by two use-cases: deterministic testing & debugging and "deterministic" flow operators, it would be nice to discuss them separately.

The desirable state when debugging is that, at a breakpoint at B(X), A is suspended at the point where it just emitted X and has not done any more work.

It looks like it has to be solved not on the operator level, but at the infrastructure/testing one. E.g. (optional) injectable dispatchers to the production source code and sequential testing primitives: kotlinx-coroutines-test or turbine. We just released a properly designed kotlinx-coroutines-test and Flow support is lagging behind there, so it could be a good opportunity to improve it. Could you please elaborate on why the testing infrastructure is not sufficient for your use case?
Also, I second Roman with collectWhile. If there is anything not covered with the current set of operators, high chance that we can provide it with a pull-based model.

Many flow transformations we want to build and run in production want to consume flows in this style and offer determinism guarantees

Could you please describe (or show an implementation) of a few such operators? It would be nice to know what kind of operators exactly we are talking about

qwwdfsad · 2022-05-16T08:50:07Z

Also, if you don't mind, it would be really nice if we could continue our discussion on the corresponding issue (#3274) -- it's much easier to keep track of, it's where other people look at and it's what ends up in the changelog. Also, more convenient for the sake of historical digging :)

qwwdfsad · 2022-05-30T08:49:21Z

kotlinx-coroutines-core/common/test/flow/terminal/IterateTest.kt

+        val result = flow.iterate {
+            next().also {
+                yield()
+                // not obvious if this results in a deterministic test?


We have our own internal mechanism for making such tests explicitly deterministic: expect(num), expectUnreached() and finish(lastNum).
But probably it's not worth changing these tests in the prototype

lowasser added 2 commits May 10, 2022 14:21

Add a Flow.iterate method, allowing an Iterator-style traversal of fl…

14bc9f1

…ows.

Inline checkValid.

5071746

lowasser mentioned this pull request May 16, 2022

Iterative consumption of flows #3274

Open

qwwdfsad added flow design labels May 20, 2022

lowasser added 4 commits May 23, 2022 11:32

Convert Flow.iterate to a continuation-based style.

392d76d

Simplify iterate. Add a first test.

e1c3bfd

Merge remote-tracking branch 'origin/master' into iterate-flow

4eba5c3

Add some tests.

18acc1c

qwwdfsad reviewed May 30, 2022

View reviewed changes

jingibus mentioned this pull request Sep 15, 2022

Use test-specific unconfined when test scheduler is in use cashapp/turbine#151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Flow.iterate method, allowing an Iterator-style traversal of flows. #3278

Add a Flow.iterate method, allowing an Iterator-style traversal of flows. #3278

lowasser commented May 10, 2022

lowasser commented May 10, 2022

elizarov commented May 11, 2022 •

edited

lowasser commented May 11, 2022

lowasser commented May 11, 2022

elizarov commented May 12, 2022 •

edited

lowasser commented May 12, 2022

elizarov commented May 13, 2022

qwwdfsad commented May 16, 2022

qwwdfsad commented May 16, 2022

qwwdfsad May 30, 2022 •

edited

Add a Flow.iterate method, allowing an Iterator-style traversal of flows. #3278

Are you sure you want to change the base?

Add a Flow.iterate method, allowing an Iterator-style traversal of flows. #3278

Conversation

lowasser commented May 10, 2022

lowasser commented May 10, 2022

elizarov commented May 11, 2022 • edited

lowasser commented May 11, 2022

lowasser commented May 11, 2022

elizarov commented May 12, 2022 • edited

lowasser commented May 12, 2022

What we want instead

Why this matters

elizarov commented May 13, 2022

qwwdfsad commented May 16, 2022

qwwdfsad commented May 16, 2022

qwwdfsad May 30, 2022 • edited

Choose a reason for hiding this comment

elizarov commented May 11, 2022 •

edited

elizarov commented May 12, 2022 •

edited

qwwdfsad May 30, 2022 •

edited