Skip to content

PIP XX: Transaction coordinator loading mechanism

Lari Hotari edited this page Feb 13, 2024 · 1 revision
  • Status: Proposal
  • Authors: Bo Cong, Penghui Li
  • Pull Request:
  • Mailing List discussion:
  • Release:

Motivation

Now transaction coordinator load use NamespaceBundleOwnershipListener, but when bundle onload and the process NamespaceBundleOwnershipListener.onLoad() and then transaction coordinator load fail, we did not try again, it will make this transaction coordinator unavailable. Therefore, we must change this implementation so that the transaction coordinator can remain in a usable state.

Approach

We also can use NamespaceBundleOwnershipListener and retry on failure but this will cause the when bundle on load, we should promise that we can do NamespaceBundleOwnershipListener.onLoad(). And under the condition of concurrency, it is guaranteed that the corresponding transaction coordinatorId has only one transaction coordinator loading. And when unload, unload must be correct to ensure that load and unload cannot be performed at the same time. This seems to be very difficult, this will cause the loading of onload and transaction coordinator to be synchronous. So we use another solution to implement it.

When tcClient do lookup and find the broker witch own the topic, we can send a command named CommandTcClientConnect. When broker receive CommandTcClientConnect it can do load transaction coordinator op. What time we unload the transaction coordinator? We can use managedLedger fence mechanism, when coordinator append transaction log to managedLedger fail and receive response is ManagedLedgerFenceException, we can remove it from this broker. When transaction coordinator topic unload, broker will do internalUnloadTransactionCoordinator, client will disconnect from this coordinator and then resend CommandTcClientConnect.

Protocal change

CommandTcClientConnect

message CommandTcClientConnect {
    required uint64 request_id = 1;
    required uint64 tc_id = 2 [default = 0];
}

now CommandTcClientConnect have two fields, request_id and tc_id, we can also add client name later. When tcClient reconnect, we can abort all the transaction which this client name open. Prevent the program from stopping problems caused by long transaction timeout.

Compatibility

We changed the transaction coordinator load mechanism and added CommandTcClientConnect, We have changed the transaction coordinator load mechanism and added CommandTcClientConnect, which will cause the old version of the client to be unable to connect to the new version of the broker, and also cause the new version of the client to be unable to use the old version of the broker.

Therefore, we proposed an excessive solution, not to delete the original unload and load mechanism, when addNamespaceBundleOwnershipListener load and unload, we will still load and unload transaction coordinator. This requires us to have used transaction before this change. If you want to upgrade the client or broker, you must upgrade the broker first. If you upgrade the client first, the transaction will be unavailable.

Note

  1. Transaction client reconnect when do transaction op(open, commit or abort) and receive fail response TransactionCoordinatorNotFound.
  2. It must be ensured that there is only one operation load or unload transaction coordinator with the same transaction coordinatorID in a broker.
Clone this wiki locally