Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel before rollback may cause wrong transaction to be canceled with pgbouncer #430

Closed
aaronjensen opened this issue Feb 12, 2022 · 8 comments · Fixed by #431
Closed

Comments

@aaronjensen
Copy link

aaronjensen commented Feb 12, 2022

Re: d2451b1 and #390

If a cancel happens due to an error it is possible that the cancel will take effect on the next transaction that is run, rather than the current.

See: pgbouncer/pgbouncer#684

It's unclear to me if this is a pgbouncer bug or not. I cannot reproduce it when connecting directly to pg, so I would imagine so, but since this is a recent change to pg I wanted to report it here for visibility.

I don't have a shareable repro narrowed yet, but this is how I can reproduce it locally:

    input_message = Controls::Message.example

    reservation_stream_name = Controls::StreamName.random

    Messaging::Postgres::Write.(input_message, reservation_stream_name)

    write = Reservation::Write.build

    a = Thread.new do
      20.times do
        # Open a new connection and make a query
        MessageStore::Postgres::Get.(reservation_stream_name, position: 1, batch_size: 1)
      end
    end

    b = Thread.new do
      20.times do
        # Open a new connection and make a query
        MessageStore::Postgres::Get.(reservation_stream_name, position: 1, batch_size: 1)
      end
    end

    # Open a new connection and invoke a server function that raises, which
    # ultimately is caught in ruby
    write.(input_message, reservation_stream_name)

    context "Reserved Message" do
      comment "Stream Name: #{reservation_stream_name}"

      message_data = MessageStore::Postgres::Get.(reservation_stream_name, position: 1, batch_size: 1)
      a.join
      b.join

      test "Not written" do
        assert(message_data.empty?)
      end
    end

One of the MessageStore::Postgres::Get's will occasionally fail with:

pg-1.3.1/lib/pg/basic_type_registry.rb:114:in `exec': ERROR:  canceling statement due to user request (PG::QueryCanceled)
@aaronjensen aaronjensen changed the title Cancel before rollback causes incompatibility with pgbouncer Cancel before rollback may cause wrong transaction to be canceled with pgbouncer Feb 12, 2022
@larskanis
Copy link
Collaborator

This is probably a pgbouncer bug. Ruby-pg sends the cancel request on a second connection, but it waits until the request has been processed by the PostgreSQL server. Not till then the ROLLBACK command is executed.

From reading the code I would estimate, that pgbouncer doesn't process the cancel request before it indicates it's completion. Therefore the next commands are already processed before the cancel request has been dispatched.

I'll try to reproduce this issue with a docker image for pgbouncer.

@aaronjensen
Copy link
Author

Thank you for the quick response. There is a response on the pgbouncer issue as well with a possible explanation: pgbouncer/pgbouncer#684 (comment)

larskanis added a commit to larskanis/ruby-pg that referenced this issue Feb 12, 2022
This avoids sending a cancel request if there is no active query running.

In case of a failing SQL statement, the transaction_status is PQTRANS_INERROR.
The previous code sent a cancel request in this case although the query is known to be aborted.

In case of ruby code that raised an error, the transaction_status is PQTRANS_INTRANS.
Also in this case there is no use of sending a cancel request.

The cancellation of queries in case of exceptions was introduced by ged#391 .
Now we cancel more conservative, only in case of a running query.

The cancellation can cause issues with pgbouncer, which releases a connection after a SQL error was raised.
It then dispatched the cancel to the next SQL command.
This change should solve this incompatibility.

Fixes ged#430
@larskanis
Copy link
Collaborator

I think it makes sense to cancel a query only in case it is active. This is implemented in #431.

@aaronjensen
Copy link
Author

Wow, that was fast. I’ll test with my repro shortly.

@aaronjensen
Copy link
Author

One question, is there still a very small chance that the transaction aborts immediately after the check? Maybe that’s a small enough window to not be a problem, but any action based on a query is theoretically subject to race conditions.

@aaronjensen
Copy link
Author

I can confirm that I can no longer reproduce the issue I was seeing with #431.

@larskanis
Copy link
Collaborator

One question, is there still a very small chance that the transaction aborts immediately after the check?

In case of a single exception within the query, as described in this issue, I don't think so. Pgbouncer probably releases the connection and forwards the error to the client application. But now, that #431 is merged, the client application doesn't send a cancel after it received the message. So there is no issue.

However if there are two simultaneous errors, one in the client and one in the query, then such a race can happen, I think. If the client cancels the query after the connection is already released by pgbouncer, but the error message wasn't sent to the client yet, the cancel can possibly be passed to another query command.

@aaronjensen
Copy link
Author

Thank you again @larskanis

the only thing I can think of for the extant race condition would be to allow disabling the cancel. It’s such a narrow case that I’m not worried about it personally though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants