UnitOfWork should run UPDATES and DELETES in specific order to avoid ShareLocks on concurrent db transactions during commit() #11345

d-ph · 2024-03-08T14:58:55Z

Feature Request

Q	A
New Feature	no
RFC	yes
BC Break	no

Summary

I would like to propose that the UnitOfWork::commit() should run UPDATEs (and ideally DELETEs too) in a certain order, so that the risk of running into deadlocks on "waiting for ShareLock on transaction X" (in even moderately concurrent situations) is vastly reduced. I would like to propose that the UPDATE statements should be ordered alphabetically by the FQCNs of the entities being updated, and then ascending by the ids of the entities.

In other words, I would like to propose that the UnitOfWork should run the following updates in the following order:

// contents of $this->entityUpdates

MyApp\Entity\User, id: 10
MyApp\Entity\Order, id: 1
MyApp\Entity\Job, id: 3
MyApp\Entity\Job, id: 2
MyApp\Entity\Order\Voucher, id: 1
MyApp\Entity\Job, id: 1
MyApp\Entity\Order\Voucher, id: 2

// contents of $this->entityUpdates, ordered by FQCN and ids

MyApp\Entity\Job, id: 1
MyApp\Entity\Job, id: 2
MyApp\Entity\Job, id: 3
MyApp\Entity\Order, id: 1
MyApp\Entity\Order\Voucher, id: 1
MyApp\Entity\Order\Voucher, id: 2
MyApp\Entity\User, id: 10

By ordering the statements as described above, an attempt is made to avoid the following error from happening (error log from postgres, but there is a rumour that it would happen in mysql/mariadb also):

An exception occurred while executing a query: SQLSTATE[40P01]: Deadlock detected: 7 ERROR: deadlock detected
DETAIL: Process 6258 waits for ShareLock on transaction 10320; blocked by process 6259.
Process 6259 waits for ShareLock on transaction 10319; blocked by process 6258.
HINT: See server log for query details.
CONTEXT: while updating tuple (70,3) in relation "users"

Reproduction of the problem using a real-world scenario

The problem can be reproduced by simulating a "long flush" using the sleep() instruction in the following way:

public function index(Request $request)
{
  $case = $request->query->get('case', 'a');

  /** @var EntityManager $em */
  $em = $this->managerRegistry->getManager();
  $em->wrapInTransaction(function () use ($em, $case) {
      $dbal = $em->getConnection();

      if ($case === 'a') {
          $dbal->executeStatement('UPDATE "users" SET points_balance = 100 WHERE id = 1');
      } else {
          $dbal->executeStatement('UPDATE "order" SET total_voucher_discount = 200 WHERE id = 1');
      }

      $em->flush();

      if ($case === 'a') {
          sleep(4);
      }

      if ($case === 'a') {
          $dbal->executeStatement('UPDATE "order" SET total_voucher_discount = 100 WHERE id = 1');
      } else {
          $dbal->executeStatement('UPDATE "users" SET points_balance = 200 WHERE id = 1');
      }

      $em->flush();
  });
  
  return new Response("done, case '{$case}'");
}

Running the code in two browser tabs, one with the case query-parameter set to 'a', and one with the same parameter set to 'b', leads to a situation where both tabs run for 4 seconds ("tab a" sleeps for 4 seconds, and "tab b" is sync-blocked by postgres when the tab attempts to run the "UPDATE users SET 200" statement), and then one of the tabs crashes, and the other one succeeds. The crash happens because "tab a" wakes up and attempts to run "UPDATE order SET 100" for a row which is already "locked for write" in a db transaction owned by "tab b". Postgres detects that deadlock and immediately terminates one of the conflicting db transaction.

This is also explained in postgres documentation (however again: please note that this is not only postgres-specific): https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-DEADLOCKS

Note that deadlocks can also occur as the result of row-level locks (and thus, they can occur even if explicit locking is not used) (...) Transaction one attempts to acquire a row-level lock on the specified row, but it cannot: transaction two already holds such a lock. So it waits for transaction two to complete. Thus, transaction one is blocked on transaction two, and transaction two is blocked on transaction one: a deadlock condition. PostgreSQL will detect this situation and abort one of the transactions. (...) The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order. In the example above, if both transactions had updated the rows in the same order, no deadlock would have occurred.

The severity of this problem

This problem is bound to happen sooner than later, when a dev starts using cron scripts or implements a webhook events retriever from a 3rd party api. This leads to a situation where the same rows in the database (especially of the "user" or "order" kind) may be attempted to be UPDATEd concurrently (because e.g. the website user makes changes in some sort of "My Account" panel, but at the same time a webhook event is received from a 3rd party api that attempts to UPDATE the same user record).

Note how this is not only about updating fields that are indexed or that are primary/foreign keys. A simultaneous update of a "plain & boring" text field in a record will lock that record for writing for as long a db transaction lasts (and the UnitOfWork runs commit() in a db transaction (for a very good reason)).

Also, note how currently the order of UPDATEs issued by the UnitOfWork::commit() is determined by the order of when entities are hydrated in the app. This makes it quite difficult to reproduce reliably in complex codebases, but the db error logs don't lie, and the "ShareLock deadlocks" do happen.

Further notes

As mentioned in the summary, the DELETE statements are also likely to be subject to the same problem, and therefore would also benefit from a similar ordering. However, I do see how DELETE statements would be harder to order than UPDATEs because DELETEs are already being ordered. I wouldn't say that it's impossible to "order them even further", but just ordering UPDATES would alone reduce the problem significantly.

I would love to hear your comments to this proposition.

Thank you.

The text was updated successfully, but these errors were encountered:

greg0ire · 2024-03-09T10:20:55Z

I don't see an issue with this proposal except for DELETE statements, where this cannot be enforced in all situations. Cc @mpdude

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnitOfWork should run UPDATES and DELETES in specific order to avoid ShareLocks on concurrent db transactions during commit() #11345

UnitOfWork should run UPDATES and DELETES in specific order to avoid ShareLocks on concurrent db transactions during commit() #11345

d-ph commented Mar 8, 2024 •

edited

greg0ire commented Mar 9, 2024

UnitOfWork should run UPDATES and DELETES in specific order to avoid ShareLocks on concurrent db transactions during commit() #11345

UnitOfWork should run UPDATES and DELETES in specific order to avoid ShareLocks on concurrent db transactions during commit() #11345

Comments

d-ph commented Mar 8, 2024 • edited

Feature Request

Summary

Reproduction of the problem using a real-world scenario

The severity of this problem

Further notes

greg0ire commented Mar 9, 2024

d-ph commented Mar 8, 2024 •

edited