Skip to content

Commit

Permalink
[#18192] YSQL: Make pggate use a best-effort approach for Abort Trans…
Browse files Browse the repository at this point in the history
…action

Summary:
**Background**
Postgres has a small stack (of size 5) to hold error records in re-entrant error scenarios.
When any error occurs during the execution of a transaction, the error is pushed into this stack, and postgres attempts to perform transaction error recovery.
As part of this recovery, any active transaction and sub-transaction is aborted. If an error occurs during these abort operations, they are further pushed onto the stack,
and recovery from this error recovery is attempted, leading to a recursive loop.
In YugabyteDB, aborting a transaction requires an RPC to the local tserver which introduces additional modes of failure.
Failure to communicate with the tserver during transaction error recovery can cause this recursive loop of errors to overflow the error stack and result in a PANIC.
This PANIC is innocuous, because DocDB automatically aborts the transaction after a period of inactivity.

**Fix**
This revision makes the AbortTransaction flow a best-effort approach so that errors from this flow are handled and not propagated further.
The flow is as follows:
 - If a DML transaction is sought to be aborted (enclosing DDL transaction will also be aborted) via `YBCAbortTransaction`:
   - Two FinishTransaction RPCs with commit = false are sent to the tserver, first for the DDL, second for the DML.
   - Irrespective of the success/failure of the RPCs, the transaction state in pggate is cleared.
   - The status of the RPC is propagated to the pg layer.
   - In case of any errors, pg closes the backend connection connection.

The above flow is also used as part of PG error recovery to abort any ongoing transaction (DDL or DML or both) and clear any transaction state via `YBCAbortTransaction`:

- If a DDL transaction is sought to be aborted via `YBResetDdlState` (ie. enclosing DML transaction
   does not need to be aborted)
  - A FinishTransaction RPC with commit = false is sent to the local tserver to abort the DDL transaction.
  - Irrespective of the success/failure of the above RPC, the local DDL transaction state in pggate is cleared.
  - The status of the RPC is propagated to the pg layer.
  - In case of any errors, the PG error recovery flow is invoked to abort any enclosing DML transaction.
Jira: DB-7215

Test Plan:
This revision does not introduce new functionality; it only simplifies existing flows.
Testing against a Jenkins run should be sufficient.

Reviewers: pjain

Reviewed By: pjain

Subscribers: smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D34725
  • Loading branch information
karthik-ramanathan-3006 committed May 13, 2024
1 parent 166ef51 commit fcbdb09
Show file tree
Hide file tree
Showing 8 changed files with 70 additions and 36 deletions.
28 changes: 24 additions & 4 deletions src/postgres/src/backend/utils/misc/pg_yb_utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -946,17 +946,37 @@ YBCCommitTransaction()
if (!IsYugaByteEnabled())
return;

HandleYBStatus(YBCPgCommitTransaction());
HandleYBStatus(YBCPgCommitPlainTransaction());
}

void
YBCAbortTransaction()
{
if (!IsYugaByteEnabled())
if (!IsYugaByteEnabled() || !YBTransactionsEnabled())
return;

if (YBTransactionsEnabled())
HandleYBStatus(YBCPgAbortTransaction());
/*
* If a DDL operation during a DDL txn fails, the txn will be aborted before
* we get here. However if there are failures afterwards (i.e. during
* COMMIT or catalog version increment), then we might get here as part of
* top level error recovery in PostgresMain() with the DDL txn state still
* set in pggate. Clean it up in that case.
*/
YBCStatus status = YBCPgClearSeparateDdlTxnMode();

/*
* Aborting a transaction is likely to fail only when there are issues
* communicating with the tserver. Close the backend connection in such
* scenarios to avoid a recursive loop of aborting again and again as part
* of error handling in PostgresMain() because of the error faced during
* abort.
*/
if (unlikely(status))
elog(FATAL, "Failed to abort DDL transaction: %s", YBCMessageAsCString(status));

status = YBCPgAbortPlainTransaction();
if (unlikely(status))
elog(FATAL, "Failed to abort DML transaction: %s", YBCMessageAsCString(status));
}

void
Expand Down
46 changes: 30 additions & 16 deletions src/yb/yql/pggate/pg_txn_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,8 @@ PgTxnManager::PgTxnManager(
PgTxnManager::~PgTxnManager() {
// Abort the transaction before the transaction manager gets destroyed.
WARN_NOT_OK(AbortTransaction(), "Failed to abort transaction in dtor");
WARN_NOT_OK(ExitSeparateDdlTxnModeWithAbort(), "Failed to abort DDL transaction in dtor");
WARN_NOT_OK(AbortPlainTransaction(), "Failed to abort DML transaction in dtor");
}
Status PgTxnManager::BeginTransaction(int64_t start_time) {
Expand Down Expand Up @@ -360,16 +361,21 @@ void PgTxnManager::SetActiveSubTransactionId(SubTransactionId id) {
active_sub_transaction_id_ = id;
}
Status PgTxnManager::CommitTransaction() {
return FinishTransaction(Commit::kTrue);
Status PgTxnManager::CommitPlainTransaction() {
return FinishPlainTransaction(Commit::kTrue);
}
Status PgTxnManager::FinishTransaction(Commit commit) {
// If a DDL operation during a DDL txn fails the txn will be aborted before we get here.
// However if there are failures afterwards (i.e. during COMMIT or catalog version increment),
// then we might get here with a ddl_txn_. Clean it up in that case.
if (IsDdlMode() && !commit) {
RETURN_NOT_OK(ExitSeparateDdlTxnModeWithAbort());
Status PgTxnManager::AbortPlainTransaction() {
return FinishPlainTransaction(Commit::kFalse);
}
Status PgTxnManager::FinishPlainTransaction(Commit commit) {
if (PREDICT_FALSE(IsDdlMode())) {
// GH #22353 - A DML txn must be aborted or committed only when there is no active DDL txn
// (ie. after any active DDL txn has itself committed or aborted). Silently ignoring this
// scenario may lead to errors in the future. Convert this to an SCHECK once the GH issue is
// resolved.
LOG(WARNING) << "Received DML txn commit with DDL txn state active";
}
if (!txn_in_progress_) {
Expand All @@ -383,17 +389,17 @@ Status PgTxnManager::FinishTransaction(Commit commit) {
return Status::OK();
}
// Aborting a transaction is on a best-effort basis. In the event that the abort RPC to the
// tserver fails, we simply reset the transaction state here and return any error back to the
// caller. In the event that the tserver recovers, it will eventually expire the transaction due
// to inactivity.
VLOG_TXN_STATE(2) << (commit ? "Committing" : "Aborting") << " transaction.";
Status status = client_->FinishTransaction(commit);
VLOG_TXN_STATE(2) << "Transaction " << (commit ? "commit" : "abort") << " status: " << status;
ResetTxnAndSession();
return status;
}
Status PgTxnManager::AbortTransaction() {
return FinishTransaction(Commit::kFalse);
}
void PgTxnManager::ResetTxnAndSession() {
txn_in_progress_ = false;
isolation_level_ = IsolationLevel::NON_TRANSACTIONAL;
Expand Down Expand Up @@ -437,9 +443,17 @@ Status PgTxnManager::ExitSeparateDdlTxnMode(const std::optional<DdlCommitInfo>&
if (commit_info && commit_info->is_silent_altering) {
ddl_mode->silently_altered_db = commit_info->db_oid;
}
RETURN_NOT_OK(client_->FinishTransaction(commit_info ? Commit::kTrue : Commit::kFalse, ddl_mode));
ddl_mode_.reset();
return Status::OK();
Commit commit = commit_info ? Commit::kTrue : Commit::kFalse;
Status status = client_->FinishTransaction(commit, ddl_mode);
WARN_NOT_OK(status, Format("Failed to $0 DDL transaction", commit ? "commit" : "abort"));
if (PREDICT_TRUE(status.ok() || !commit)) {
// In case of an abort, reset the DDL mode here as we may later re-enter this function and retry
// the abort as part of transaction error recovery if the status is not ok.
ddl_mode_.reset();
}
return status;
}
void PgTxnManager::SetDdlHasSyscatalogChanges() {
Expand Down
6 changes: 3 additions & 3 deletions src/yb/yql/pggate/pg_txn_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ class PgTxnManager : public RefCountedThreadSafe<PgTxnManager> {
Status RestartReadPoint();
bool IsRestartReadPointRequested();
void SetActiveSubTransactionId(SubTransactionId id);
Status CommitTransaction();
Status AbortTransaction();
Status CommitPlainTransaction();
Status AbortPlainTransaction();
Status SetPgIsolationLevel(int isolation);
PgIsolationLevel GetPgIsolationLevel();
Status SetReadOnly(bool read_only);
Expand Down Expand Up @@ -108,7 +108,7 @@ class PgTxnManager : public RefCountedThreadSafe<PgTxnManager> {

std::string TxnStateDebugStr() const;

Status FinishTransaction(Commit commit);
Status FinishPlainTransaction(Commit commit);

void IncTxnSerialNo();

Expand Down
8 changes: 4 additions & 4 deletions src/yb/yql/pggate/pggate.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2115,16 +2115,16 @@ bool PgApiImpl::IsRestartReadPointRequested() {
return pg_txn_manager_->IsRestartReadPointRequested();
}

Status PgApiImpl::CommitTransaction() {
Status PgApiImpl::CommitPlainTransaction() {
DCHECK(pg_session_->explicit_row_lock_buffer().IsEmpty());
pg_session_->InvalidateForeignKeyReferenceCache();
RETURN_NOT_OK(pg_session_->FlushBufferedOperations());
return pg_txn_manager_->CommitTransaction();
return pg_txn_manager_->CommitPlainTransaction();
}

Status PgApiImpl::AbortTransaction() {
Status PgApiImpl::AbortPlainTransaction() {
ClearSessionState();
return pg_txn_manager_->AbortTransaction();
return pg_txn_manager_->AbortPlainTransaction();
}

Status PgApiImpl::SetTransactionIsolationLevel(int isolation) {
Expand Down
4 changes: 2 additions & 2 deletions src/yb/yql/pggate/pggate.h
Original file line number Diff line number Diff line change
Expand Up @@ -647,8 +647,8 @@ class PgApiImpl {
Status ResetTransactionReadPoint();
Status RestartReadPoint();
bool IsRestartReadPointRequested();
Status CommitTransaction();
Status AbortTransaction();
Status CommitPlainTransaction();
Status AbortPlainTransaction();
Status SetTransactionIsolationLevel(int isolation);
Status SetTransactionReadOnly(bool read_only);
Status SetTransactionDeferrable(bool deferrable);
Expand Down
2 changes: 1 addition & 1 deletion src/yb/yql/pggate/test/pggate_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ void PggateTest::BeginTransaction() {
}

void PggateTest::CommitTransaction() {
CHECK_YBC_STATUS(YBCPgCommitTransaction());
CHECK_YBC_STATUS(YBCPgCommitPlainTransaction());
}

void PggateTest::ExecCreateTableTransaction(YBCPgStatement pg_stmt) {
Expand Down
8 changes: 4 additions & 4 deletions src/yb/yql/pggate/ybc_pggate.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1689,12 +1689,12 @@ bool YBCIsRestartReadPointRequested() {
return pgapi->IsRestartReadPointRequested();
}

YBCStatus YBCPgCommitTransaction() {
return ToYBCStatus(pgapi->CommitTransaction());
YBCStatus YBCPgCommitPlainTransaction() {
return ToYBCStatus(pgapi->CommitPlainTransaction());
}

YBCStatus YBCPgAbortTransaction() {
return ToYBCStatus(pgapi->AbortTransaction());
YBCStatus YBCPgAbortPlainTransaction() {
return ToYBCStatus(pgapi->AbortPlainTransaction());
}

YBCStatus YBCPgSetTransactionIsolationLevel(int isolation) {
Expand Down
4 changes: 2 additions & 2 deletions src/yb/yql/pggate/ybc_pggate.h
Original file line number Diff line number Diff line change
Expand Up @@ -659,8 +659,8 @@ YBCStatus YBCPgRestartTransaction();
YBCStatus YBCPgResetTransactionReadPoint();
YBCStatus YBCPgRestartReadPoint();
bool YBCIsRestartReadPointRequested();
YBCStatus YBCPgCommitTransaction();
YBCStatus YBCPgAbortTransaction();
YBCStatus YBCPgCommitPlainTransaction();
YBCStatus YBCPgAbortPlainTransaction();
YBCStatus YBCPgSetTransactionIsolationLevel(int isolation);
YBCStatus YBCPgSetTransactionReadOnly(bool read_only);
YBCStatus YBCPgSetTransactionDeferrable(bool deferrable);
Expand Down

0 comments on commit fcbdb09

Please sign in to comment.