Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: serializable behavior is broken #2057

Open
al8n opened this issue Apr 25, 2024 · 5 comments
Open

[BUG]: serializable behavior is broken #2057

al8n opened this issue Apr 25, 2024 · 5 comments
Labels
kind/bug Something is broken.

Comments

@al8n
Copy link

al8n commented Apr 25, 2024

Hi! Currently, the transaction model cannot handle this kind of write skew: https://wiki.postgresql.org/wiki/SSI#Intersecting_Data.

What steps will reproduce the bug?

// https://wiki.postgresql.org/wiki/SSI#Intersecting_Data
func TestTxnWriteSkew2(t *testing.T) {
	runBadgerTest(t, nil, func(t *testing.T, db *DB) {
		// Setup
		txn := db.NewTransaction(true)
		defer txn.Discard()
		txn.SetEntry(NewEntry([]byte("a1"), []byte("10")))
		txn.SetEntry(NewEntry([]byte("a2"), []byte("20")))
		txn.SetEntry(NewEntry([]byte("b1"), []byte("100")))
		txn.SetEntry(NewEntry([]byte("b2"), []byte("200")))
		require.NoError(t, txn.Commit())

		txn1 := db.NewTransaction(true)
		defer txn1.Discard()

		itr := txn1.NewIterator(DefaultIteratorOptions)
		sum := 0;
		{
			for itr.Rewind(); itr.Valid(); itr.Next() {
				if itr.Item().Key()[0] == 'a' {
					a, _ := itr.Item().ValueCopy(nil)
					val, _ := strconv.ParseUint(string(a), 10, 64)
					sum += int(val)
				}
			}
			itr.Close()
		}
		
		txn1.SetEntry(NewEntry([]byte("b3"), []byte("30")))

		txn2 := db.NewTransaction(true)
		defer txn2.Discard()
		{
			itr = txn2.NewIterator(DefaultIteratorOptions)
			sum = 0;
			for itr.Rewind(); itr.Valid(); itr.Next() {
				if itr.Item().Key()[0] == 'b' {
					a, _ := itr.Item().ValueCopy(nil)
					val, _ := strconv.ParseUint(string(a), 10, 64)
					sum += int(val)
				}
			}
			itr.Close()
		}
		
		txn2.SetEntry(NewEntry([]byte("a3"), []byte("300")))

		// Each transaction has modified what the other transaction would have read.
		// If both were allowed to commit, this would break serializable behavior,
		// because if they were run one at a time, one of the transactions would have seen the INSERT the other committed.
		// We wait for a successful COMMIT of one of the transactions before we roll anything back,
		// though, to ensure progress and prevent thrashing.
		require.NoError(t, txn2.Commit())
		require.Error(t, txn1.Commit())
	})
}

Error trace:

Error Trace:	txn_test.go:386
        	            				db_test.go:164
        	            				txn_test.go:335
        	Error:      	Received unexpected error:
        	            	Trying to commit a discarded txn
        	            	github.com/dgraph-io/badger/v4.(*Txn).commitPrecheck
        	            		/Users/al/Developer/source-learner/badger/txn.go:623
        	            	github.com/dgraph-io/badger/v4.(*Txn).Commit
        	            		/Users/al/Developer/source-learner/badger/txn.go:670
        	            	github.com/dgraph-io/badger/v4.TestTxnWriteSkew2.func1
        	            		/Users/al/Developer/source-learner/badger/txn_test.go:386
        	            	github.com/dgraph-io/badger/v4.runBadgerTest
        	            		/Users/al/Developer/source-learner/badger/db_test.go:164
        	            	github.com/dgraph-io/badger/v4.TestTxnWriteSkew2
        	            		/Users/al/Developer/source-learner/badger/txn_test.go:335
        	            	testing.tRunner
        	            		/opt/homebrew/Cellar/go/1.20.1/libexec/src/testing/testing.go:1576
        	            	runtime.goexit
        	            		/opt/homebrew/Cellar/go/1.20.1/libexec/src/runtime/asm_arm64.s:1172
        	Test:       	TestTxnWriteSkew2
@al8n al8n added the kind/bug Something is broken. label Apr 25, 2024
@harshil-goel
Copy link
Contributor

We provide only SSI: Serializable Snapshot Isolation. Did you turn managed mode off? By default, we don't do that. Only in managed mode we provide this SSI.

@al8n
Copy link
Author

al8n commented Apr 25, 2024

We provide only SSI: Serializable Snapshot Isolation. Did you turn managed mode off? By default, we don't do that. Only in managed mode we provide this SSI.

Hi, I have not turned the managed mode off. I just use the default database options which is used for testing purpose.

@harshil-goel
Copy link
Contributor

Can you try it once after turning managed mode off?

@al8n
Copy link
Author

al8n commented Apr 26, 2024

Can you try it once after turning managed mode off?

Even when I turn the managed mode off, the test case still does not work. Could you try it in case I missed something? It seems that the transaction model is OCC but not really SSI.

@al8n
Copy link
Author

al8n commented Apr 26, 2024

I think this is because, in a badger transaction commit, the code checks if two transactions have read-write conflict, if txn1 read [a1, a2, b1, b2], and write [b3], txn2 read [a1, a2, b1, b2] and write [a3]. The current transaction implementation will think this is no problem because of the two transactions do not write the key which is in reads.

In some situations, this is correct, but if the behavior in txn1: write [b3] and txn2: write [a3] is based on the sum of the current values in the database, then it becomes wrong because after the txn2 commit, the sum of the values in the database has been changed. So the condition to write [b3] in txn1 is changed, and in respect of SSI, the commit of txn1 should fail.

In the provided example:

  • Transaction 1 reads certain values and based on that information decides to write to a new key (b3).
  • Transaction 2 also reads the same set of values and writes to a different new key (a3).

The essence of the problem is that each transaction's decision to write is based on a snapshot of the database that becomes outdated once the other transaction commits. This is precisely the kind of issue that Serializable Snapshot Isolation (SSI) aims to prevent.

Why the current model in BadgerDB might miss this?

BadgerDB’s OCC checks for read-write conflicts by examining if any of the read keys of a transaction have been modified before it commits. However, this model does not account for logical dependencies where the decision to write a new key is based on the values of read keys rather than the keys themselves. This situation can lead to anomalies where:

  • The total values read by both transactions are used to make decisions.
  • Each transaction writes to a new key that does not exist in the other's read or write set.

Enhancing the model to handle this case

To handle this scenario in BadgerDB, the conflict detection mechanism needs to be sophisticated enough to understand these indirect dependencies. Here's a conceptual approach to extending the BadgerDB model:

  • Track Logical Dependencies: Extend the transaction metadata to include not just the keys read and written, but also a summary or digest of how the read data influences the write operations. This could be as simple as a hash or checksum of read values that influence writes.
  • Check Logical Consistency: Before committing a transaction, check if the logical basis (e.g., sum of values) for any write operation has been altered by recently committed transactions. This could involve re-calculating the sums or checksums based on current data and comparing them with what was calculated when the transaction started.
    Implement Dependency Graphs: Use a graph structure to track dependencies between transactions. This structure would help identify cycles or conflicts based on the logical dependencies, not just direct key accesses.
  • Use a More Granular Locking or Tagging System: Each transaction could "tag" the logical conditions under which it operates (like the sum of certain values). Other transactions would then check these tags before committing their changes.

In conclusion, this is a classic problem in concurrent transaction systems where two transactions don't directly write to the same data items but their operations are logically interdependent based on the read data. When such dependencies exist, it's possible for transactions to commit changes that, while individually valid, result in inconsistencies when combined.

Please correct me if I am wrong @harshil-goel, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken.
Development

Successfully merging a pull request may close this issue.

2 participants