Skip to content
This repository has been archived by the owner on Feb 26, 2024. It is now read-only.

fix: nonce generation race conditions #3498

Draft
wants to merge 21 commits into
base: develop
Choose a base branch
from

Conversation

MicaiahReid
Copy link
Contributor

@MicaiahReid MicaiahReid commented Aug 4, 2022

#2489 documents a race condition in the transaction pool when nonces are automatically being assigned by Ganache. This ended up running pretty deep in the causes of the race condition.

Check inProgress queue for highest nonce

Before this change, we would look at the pending pool first to assign a highest nonce. This is good. The pending transactions are up next to be run, so if we fetched the account's nonce from the chain instead, we'd be assigning this transaction a nonce that is about to be out of date.

However, we wouldn't check the inProgress pool. When a transaction is run in the miner, it is removed from the pending pool and added to the inProgress pool. However, at this point, the block hasn't been saved so the account's nonce hasn't been updated yet. The pending pool won't contain transactions from this origin, and the account nonce is out of date.

To resolve this, I've added the #getLatestInProgressFromOrigin function, which will search the inProgress queue for transactions from this origin and get the one with the highest nonce. This lead to changing the shape of the inProgress pool a bit - previously it was a Set<TypedTransaction>, which we'd have to loop over to get all transactions from this origin. Now it is Map<string, Set<TypedTransaction>> so that we can reduce the number of transactions to look through - now we only check the transactions from this origin.

As a note, we don't care about the inProgress pool if there are transactions in the pending pool. The pending pool will have the account's highest nonces, followed by the inProgress pool, and finally the account. So that's the priority we put on finding the nonce.

Use Semaphore to queue same-origin transactions

If transactions come into the pool quickly enough, it's possible that multiple transactions are looking to have a nonce generated before any of them have been added to the pool. All of these transactions will have the same pool conditions - the same (or no) transactions in the pending and inProgress pools, so they will all be assigned the same nonce. This will cause all but the first to be mined to throw with an incorrect nonce error.

To solve this, we now use a semaphore to queue transactions from the same origin. Unique-origins transactions can still be prepared in parallel, but same-origin transactions are prepared in series.

Cache account balance with inProgress transactions

Regardless of the above enhancement, we'd still have to make a slow trip to the database to get the account's balance. This is to verify that the sender has sufficient funds to pay for the transaction. This async call was causing a lot of race conditions, so I did my best to minimize them.

Now, when a transaction is run in the miner, we fetch the account balance (which is still in memory from running the transaction) and store it with the transaction in the inProgress pool. This lead to further modifying the shape of the inProgress pool - from Map<string, Set<TypedTransaction>> to Map<string, Set<{transaction: TypedTransaction, originBalance: Quantity}>>

Fixes #2489 and Fixes #3794


it("rejects transactions whose gasLimit is greater than the block gas limit", async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This big block is just rearranging to prevent subsequent tests from relying on previous ones being run (.only was broken before this)

found.serialized.toString(),
futureNonceTx.serialized.toString()
);
describe("nonce generation and validation", async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new describe block. I don't know what github is doing here with this diff, I'd just look at this new describe block in your editor instead of here.

return await this._prepareTransaction(transaction, secretKey);
} finally {
// free up the semaphore once we finish
queueForOrigin.leave();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think we could use the semaphore's available() function to see if there's any transaction waiting to be prepared in this semaphore. If not, we could dispose of this semaphore. But then, if we get another transaction from that origin we have to make a whole new semaphore for it.

So currently we don't drain the originsQueue. I'm interested in hearing y'all's thoughts on if we should drain it.

It's easy to imagine someone sending a transaction at regular intervals from the same origin, and this interval being slower than it takes to queue up the previous transaction. In that case, we'd be making a new semaphore for every transaction, which is why I decided against draining.

However, someone could also send a bunch of transactions, each from a unique origin. In that case, this originsQueue will just keep getting wider and wider without taking advantage of its purpose.

@@ -280,23 +293,51 @@ export default class TransactionPool extends Emittery<{ drain: undefined }> {
// origin's heap we are executable.
transactionPlacement = TriageOption.Executable;
}
// we've gotten the account's nonce the synchronous way (by looking at
// the pending queue), but we still need to look up the account's balance
const { balance } = await this.#blockchain.accounts.getNonceAndBalance(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case that we get the nonce from the pending transaction pool, we still need to fetch the account's balance. I've moved this to be after we find the nonce so that finding the nonce is all synchronous.

);
});

/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "in series" and the next "in parallel" its are new - review these.

@MicaiahReid MicaiahReid changed the title fix: generate correct nonce in race conditions fix: nonce generation race conditions Aug 9, 2022
@MicaiahReid MicaiahReid marked this pull request as ready for review August 9, 2022 15:12
@MicaiahReid MicaiahReid changed the title fix: nonce generation race conditions nonce generation race conditions Aug 16, 2022
@MicaiahReid MicaiahReid changed the title nonce generation race conditions fix: nonce generation race conditions Aug 16, 2022
@MicaiahReid MicaiahReid marked this pull request as draft August 17, 2022 16:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Stalled
1 participant