Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how reactive transactions work for cancellation in 5.2 and how it will work in 5.3 #25091

Closed
rpuch opened this issue May 16, 2020 · 7 comments
Assignees
Labels
in: data Issues in data modules (jdbc, orm, oxm, tx) type: documentation A documentation task
Milestone

Comments

@rpuch
Copy link

rpuch commented May 16, 2020

Affects: 5.2.6, current master

Briefly: according to my observations, Spring commits a transaction when a 'transactional reactive pipeline' gets cancelled. To the moment of such a cancellation, it could happen that only part of the work inside the transaction has been done, so a commit at that point produces partially-committed results violating the 'atomicity' transactional property.

I've created a project demonstrating the problem: https://github.com/rpuch/spring-commit-on-cancel-problems

    @Test
    void cancelShouldNotLeadToPartialCommit() throws InterruptedException {
        // latch is used to make sure that we cancel the subscription only after the first insert has been done
        CountDownLatch latch = new CountDownLatch(1);

        Disposable disposable = bootService.savePair(collection, latch).subscribe();

        // wait for the first insert to be executed
        latch.await();

        // now cancel the reactive pipeline
        disposable.dispose();

        // Now see what we have in the DB. Atomicity requires that we either see 0 or 2 documents.
        List<Boot> boots = mongoOperations.findAll(Boot.class, collection).collectList().block();

        assertEquals(0, boots.size());
    }

The main (and only) test, PartialCommitsOnCancelTest#cancelShouldNotLeadToPartialCommit(), does the following: it initiates a reactive pipeline having 2 inserts. Both inserts are wrapped in a (declarative) transaction. The code is crafted to make a cancellation exactly between the inserts possible:

    @Transactional
    public Mono<Void> savePair(String collection, CountDownLatch latch) {
        return Mono.defer(() -> {
            Boot left = new Boot();
            left.setKind("left");
            Boot right = new Boot();
            right.setKind("right");

            return mongoOperations.insert(left, collection)
                    // signaling to the test that the first insert has been done and the subscription can be cancelled
                    .then(Mono.fromRunnable(latch::countDown))
                    // do not proceed to the second insert ever
                    .then(Mono.fromRunnable(this::blockForever))
                    .then(mongoOperations.insert(right, collection))
                    .then();
        });
    }

The pipeline is initiated, and after the first insert is done (but before the second one is initiated), the test cancels the pipeline. It then inspects the collection and finds that there is exactly 1 document, which means that the transaction was committed partially.

In the log, I see the following:

2020-05-16 22:13:02.643 DEBUG 1988 --- [           main] o.s.d.m.ReactiveMongoTransactionManager  : Creating new transaction with name [com.example.commitoncancelproblems.BootService.savePair]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT
2020-05-16 22:13:02.667 DEBUG 1988 --- [           main] o.s.d.m.ReactiveMongoTransactionManager  : About to start transaction for session [ClientSessionImpl@5792c08c id = {"id": {"$binary": "1TaU0xPNQpKaJTHdPLS05w==", "$type": "04"}}, causallyConsistent = true, txActive = false, txNumber = 0, error = d != java.lang.Boolean].
2020-05-16 22:13:02.668 DEBUG 1988 --- [           main] o.s.d.m.ReactiveMongoTransactionManager  : Started transaction for session [ClientSessionImpl@5792c08c id = {"id": {"$binary": "1TaU0xPNQpKaJTHdPLS05w==", "$type": "04"}}, causallyConsistent = true, txActive = true, txNumber = 1, error = d != java.lang.Boolean].
2020-05-16 22:13:02.703 DEBUG 1988 --- [           main] o.s.d.m.core.ReactiveMongoTemplate       : Inserting Document containing fields: [kind, _class] in collection: e81205fa_eb5f_4492_8921_ffb3fccd2b76
2020-05-16 22:13:02.757 DEBUG 1988 --- [           main] o.s.d.m.ReactiveMongoTransactionManager  : Initiating transaction commit
2020-05-16 22:13:02.758 DEBUG 1988 --- [           main] o.s.d.m.ReactiveMongoTransactionManager  : About to commit transaction for session [ClientSessionImpl@5792c08c id = {"id": {"$binary": "1TaU0xPNQpKaJTHdPLS05w==", "$type": "04"}}, causallyConsistent = true, txActive = true, txNumber = 1, error = d != java.lang.Boolean].
2020-05-16 22:13:02.767 DEBUG 1988 --- [       Thread-6] o.s.d.m.ReactiveMongoTransactionManager  : About to release Session [ClientSessionImpl@5792c08c id = {"id": {"$binary": "1TaU0xPNQpKaJTHdPLS05w==", "$type": "04"}}, causallyConsistent = true, txActive = false, txNumber = 1, error = d != java.lang.Boolean] after transaction.

So the code is actually run in a transaction. The savePair() pipeline never gets completed successfully, but the transaction gets committed producing the 'partial' result.

Looking at TransactionAspectSupport.ReactiveTransactionSupport#invokeWithinTransaction(), I see the following code:

	return Mono.<Object, ReactiveTransactionInfo>usingWhen(
			Mono.just(it),
			txInfo -> {
				try {
					return (Mono<?>) invocation.proceedWithInvocation();
				}
				catch (Throwable ex) {
					return Mono.error(ex);
				}
			},
			this::commitTransactionAfterReturning,
			(txInfo, err) -> Mono.empty(),
			this::commitTransactionAfterReturning)

The last parameter is onCancel. So it actually commits on cancel. (The same behavior is in TransactionalOperatorImpl; also, spring-data-mongodb's ReactiveMongoTemplate#inTransaction() does the same, but it's a different Spring project).

This looks like a bug to me, but I can hardly believe that this behavior was implemented by mistake. Is it possible that I misunderstood something?

PS. There is an SO question with details and some context: https://stackoverflow.com/questions/61822249/spring-reactive-transaction-gets-committed-on-cancel-producing-partial-commits?noredirect=1#comment109381171_61822249

@jhoeller
Copy link
Contributor

jhoeller commented May 18, 2020

@mp911de @rstoyanchev The affected code dates back to the Reactor Dysprosium RC1 upgrade where we switched to a different usingWhen variant. This seems questionable indeed since we should rather complete the transaction in rollback mode in case of an explicit cancel step, in case any resource operations got written already?

@jhoeller jhoeller added in: data Issues in data modules (jdbc, orm, oxm, tx) type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels May 18, 2020
@jhoeller jhoeller added this to the 5.2.7 milestone May 18, 2020
@mp911de
Copy link
Member

mp911de commented May 18, 2020

The code works as designed. Whether commit on cancel is appropriate is an entirely different topic really and depends on the perspective.

Cancellation is a pretty standard signal for operators like Flux.next(), Flux.take(), even Mono.from(Publisher). These operators cancel the upstream subscription after consuming the desired number of items.

Operators like timeout() send the exact same cancellation signal (Subscription.cancel()) if the timeout exceeds. To make it worse, a connection reset (as viewed from a HTTP connection perspective) issues as well a cancellation signal.

From a Subscription perspective, there's no way of distinguishing between these cases, whether cancellation should result in limiting the number of emitted elements or whether the subscription should be canceled because of a failure.

So what's required is two things:

  1. Having a reason or flavor of cancellation (limiting the number of data signals vs. failure indication).
  2. Getting a possibility to await the outcome of cancellation so subsequent operations (see No means to synchronize when cancelling a TransactionalOperator Subscription (Publisher) #23304 for further details).

Cancelation works as per the definition in the Reactive Streams specification. We would require either a Reactor-specific extension or ideally an approach on the Reactive Streams level.

Adding @bsideup and @simonbasle to this thread.

@jhoeller jhoeller added type: documentation A documentation task and removed type: bug A general bug labels May 18, 2020
@jhoeller
Copy link
Contributor

jhoeller commented May 18, 2020

Thanks for the clarification, Mark.

I totally missed that a cancellation signal may also indicate the regular end of consumption of an expected number of elements. There really should be some way of indicating cancel-orderly-consumption from cancel-on-failure here, in particular also for the timeout and connection reset cases. Could we potentially track some additional state along with our transaction context here?

Since this isn't really obvious, we should at least add documentation around this in 5.2.7. If we can revise semantics in alignment with a Reactor extension here, the Spring Framework 5.3 timeframe would be a fine opportunity for it. Let's create a separate ticket there if we decide to revise this.

@bsideup
Copy link

bsideup commented May 18, 2020

Unfortunately, the Reactive Streams' cancellation indeed does not allow us to indicate the reason or even perform an async cancellation.
Even worse - the cancellation happens before an error is passed to the downstream operator, so that it is not possible to use a ThreadLocal with onEachOperator or something to set the reason.

We could have a mutable "cancellation reason" flag in subscriber's Context, but that will require a massive rewrite of Reactor's internals and will only work with Reactor, so if there is any non-Reactor operator in between, it will have the same issue.

I will bring it on the next team's call to discuss further.

@bsideup
Copy link

bsideup commented May 18, 2020

@rpuch
Copy link
Author

rpuch commented May 18, 2020

A comment for application-developers, like me, who need some workaround right now, in hope it may be useful before the issue is sorted out.

Indeed, with 'rollback-on-cancel' policy like this rpuch@95c2872, the following code

shoeService.savePairAndReturnFlux(collection).take(1).blockLast();

with

    @Transactional
    public Flux<Shoe> savePairAndReturnFlux(String collection) {
        return Flux.defer(() -> {
            Shoe left = new Shoe("left");
            Shoe right = new Shoe("right");

            return mongoOperations.insert(List.of(left, right), collection)
                    .thenMany(Flux.just(left, right));
        });
    }

causes a rollback, so nothing is saved.

So it's like this:

  1. The current 'commit on cancel' works well when everything goes well, but permits partial commits on unforeseen cancellations, which, unfortunately, makes Spring reactive transactions kinda unreliable. It is hard to know beforehand what can 'blow up' and cause a malign cancellation.
  2. An alternative, 'rollback on cancel', may render some valid code (using take() and friends, for example) unusable with reactive transactions (they will never commit). But if such operators are not used downstream from a transactional operation, this looks safer (of course, given that you have integration tests making sure that your code actually leaves the intended changes in the database) because you are protected from atomicity breaches.

Please correct me if I'm wrong.

@mp911de
Copy link
Member

mp911de commented May 22, 2020

Team decision:

The current behavior (commit on cancel) was added to not interfere with accidental cancellation due to operators such as Flux.take(), Flux.next(), Flux.single().

We figured also that there are several use-cases where commit on cancel is not appropriate:

  • Returning a Mono from a transaction: Cancellation would let one expect that the transaction rolls back since no element/completion was emitted yet.
  • Behavior of protective operators such as Flux.timeout(): A timeout protects the application and a cancellation in flight might lead to partial commits. Timeout operators used together with repeat(…) might lead to duplicate work being applies.
  • Cancellation on request failures: In SSE scenarios, closing a client connection results typically in a cancellation of the response stream. Unintended disconnects cannot be distinguished from intended ones so cancel on commit rather adds to surprising transactional behavior.
  • Cancellation at will directly impacts transaction progress and creates a non-deterministic behavior.

For Spring Framework 5.2, we're going to document the current behavior to explain semantics.

For 5.3, we're going to change the behavior to rollback on cancel. By flipping semantics of cancel signals we create a reliable and deterministic outcome. Cancellations by protective means or as consequence of an error lead to a rollback. These arrangements are typically expensive to test. Although cancellations caused by operators such as Flux.take() and Flux.next() cannot be distinguished from errorneous cancellations and lead as well to a rollback, these arrangements are easily testable with unit or integration tests as the transaction is expected to be rolled back.

We might consider adding an operator to enable commit on cancel semantics if these are cases worth supporting.

@rstoyanchev rstoyanchev changed the title Reactive transactions get committed on cancel producing partial commits Document how reactive transactions work for cancellation in 5.2 and how it will work in 5.3 Jun 3, 2020
FelixFly pushed a commit to FelixFly/spring-framework that referenced this issue Aug 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: data Issues in data modules (jdbc, orm, oxm, tx) type: documentation A documentation task
Projects
None yet
Development

No branches or pull requests

6 participants