Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trie: Fix concurrent map access on trie.dirties #23674

Closed
wants to merge 1 commit into from

Conversation

piersy
Copy link
Contributor

@piersy piersy commented Oct 1, 2021

The dirties map of trie.Database was not read protected inside the
commit function, since the trie.Database is called from many go-routines
this sometimes lead to concurrent map read and write errors.

The solution is to read lock over trie.dirties in commit

The dirties map of trie.Database was not read protected inside the
commit function, since the trie.Database is called from many go-routines
this sometimes lead to concurrent map read and write errors.

The solution is to read lock over trie.dirties in commit
@holiman
Copy link
Contributor

holiman commented Oct 1, 2021

The commit caller should already hold the lock (the docstring says "commit is the private locked version of Commit."). If so, this PR introduces a double-lock.
Do you actually have a stack trace from a read/write error? E.g. running with the race detector?

@piersy
Copy link
Contributor Author

piersy commented Oct 4, 2021

Hi @holiman, I do have a stack trace, it's from https://github.com/celo-org/celo-blockchain (a fork) but I think it is still representative of the go-ethereum behaviour. I've added it below in case that helps.

So I'm not sure whether the comment or the implementation of Commit is at fault here, but if you look at Commit you can see that it does not acquire the lock before calling commit

func (db *Database) Commit(node common.Hash, report bool, callback func(common.Hash)) error {
// Create a database batch to flush persistent data out. It is important that
// outside code doesn't see an inconsistent state (referenced data removed from
// memory cache during commit but not yet in persistent storage). This is ensured
// by only uncaching existing data when the database write finalizes.
start := time.Now()
batch := db.diskdb.NewBatch()
// Move all of the accumulated preimages into a write batch
if db.preimages != nil {
rawdb.WritePreimages(batch, db.preimages)
// Since we're going to replay trie node writes into the clean cache, flush out
// any batched pre-images before continuing.
if err := batch.Write(); err != nil {
return err
}
batch.Reset()
}
// Move the trie itself into the batch, flushing if enough data is accumulated
nodes, storage := len(db.dirties), db.dirtiesSize
uncacher := &cleaner{db}
if err := db.commit(node, batch, uncacher, callback); err != nil {
log.Error("Failed to commit trie from trie database", "err", err)
return err
}

fatal error: concurrent map read and map write

goroutine 19213 [running]:
runtime.throw(0x1404e87, 0x21)
	/usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc006986ef8 sp=0xc006986ec8 pc=0x4def12
runtime.mapaccess2(0x12ab400, 0xc012f84360, 0xc006986fa0, 0x20, 0xc01a5ad5c0)
	/usr/local/go/src/runtime/map.go:469 +0x255 fp=0xc006986f38 sp=0xc006986ef8 pc=0x4b5ff5
github.com/celo-org/celo-blockchain/trie.(*Database).commit(0xc012938f70, 0xb4437b13ba81435f, 0xa8bb0a7fd9a4ecb4, 0xa5876800b1d37e3a, 0xbcd94e42d5940f2f, 0x1707288, 0xc01a5fe540, 0xc015b1e0f0, 0x0, 0x0, ...)
	/home/piersy/projects/celo-blockchain/trie/database.go:781 +0x9f fp=0xc006987030 sp=0xc006986f38 pc=0x7715df
github.com/celo-org/celo-blockchain/trie.(*Database).commit.func1(0xb4437b13ba81435f, 0xa8bb0a7fd9a4ecb4, 0xa5876800b1d37e3a, 0xbcd94e42d5940f2f)
	/home/piersy/projects/celo-blockchain/trie/database.go:788 +0x85 fp=0xc0069870a0 sp=0xc006987030 pc=0x782a85
github.com/celo-org/celo-blockchain/trie.forGatherChildren(0x16f9d70, 0xc018c37ed8, 0xc006987520)
	/home/piersy/projects/celo-blockchain/trie/database.go:209 +0x1f8 fp=0xc006987230 sp=0xc0069870a0 pc=0x76c218
github.com/celo-org/celo-blockchain/trie.forGatherChildren(0x16f9d98, 0xc019096240, 0xc006987520)
	/home/piersy/projects/celo-blockchain/trie/database.go:206 +0xe5 fp=0xc0069873c0 sp=0xc006987230 pc=0x76c105
github.com/celo-org/celo-blockchain/trie.(*cachedNode).forChilds(0xc0170e3f80, 0xc006987520)
	/home/piersy/projects/celo-blockchain/trie/database.go:194 +0x110 fp=0xc006987470 sp=0xc0069873c0 pc=0x76c010
github.com/celo-org/celo-blockchain/trie.(*Database).commit(0xc012938f70, 0x5230d078f0e96fa3, 0xafc9ae992a2ca510, 0x29bdd84110a0a1c5, 0x3a45b8c51e4c1dd3, 0x1707288, 0xc01a5fe540, 0xc015b1e0f0, 0x0, 0x41, ...)
	/home/piersy/projects/celo-blockchain/trie/database.go:786 +0x16e fp=0xc006987568 sp=0xc006987470 pc=0x7716ae
github.com/celo-org/celo-blockchain/trie.(*Database).Commit(0xc012938f70, 0x5230d078f0e96fa3, 0xafc9ae992a2ca510, 0x29bdd84110a0a1c5, 0x3a45b8c51e4c1dd3, 0x1, 0x0, 0x0, 0x0)
	/home/piersy/projects/celo-blockchain/trie/database.go:740 +0x4d1 fp=0xc006987778 sp=0xc006987568 pc=0x770af1
github.com/celo-org/celo-blockchain/core.(*BlockChain).Stop(0xc016113200)
	/home/piersy/projects/celo-blockchain/core/blockchain.go:1021 +0x685 fp=0xc006987a78 sp=0xc006987778 pc=0xb8a245
github.com/celo-org/celo-blockchain/eth.(*Ethereum).Stop(0xc00f184b00, 0xc01a2feb40, 0x4acb85)
	/home/piersy/projects/celo-blockchain/eth/backend.go:592 +0x125 fp=0xc006987ac8 sp=0xc006987a78 pc=0xda0805
github.com/celo-org/celo-blockchain/node.(*Node).stopServices(0xc00f1ef790, 0xc012d3c890, 0x1, 0x1, 0xb00c0b, 0xc00fff1920)
	/home/piersy/projects/celo-blockchain/node/node.go:309 +0xc2 fp=0xc006987b48 sp=0xc006987ac8 pc=0xd66f62
github.com/celo-org/celo-blockchain/node.(*Node).Close(0xc00f1ef790, 0x0, 0x0)
	/home/piersy/projects/celo-blockchain/node/node.go:221 +0x147 fp=0xc006987be8 sp=0xc006987b48 pc=0xd66167
github.com/celo-org/celo-blockchain/test.(*Node).Close(0xc0004fbe40, 0x0, 0x0)
	/home/piersy/projects/celo-blockchain/test/node.go:317 +0x465 fp=0xc006987d20 sp=0xc006987be8 pc=0xdf7345
github.com/celo-org/celo-blockchain/test.Network.Shutdown(0xc01c6091a0, 0x4, 0x4)
	/home/piersy/projects/celo-blockchain/test/node.go:510 +0x6b fp=0xc006987dd8 sp=0xc006987d20 pc=0xdf814b
github.com/celo-org/celo-blockchain/e2e_test_test.TestStartStopValidators(0xc00ff05b00)
	/home/piersy/projects/celo-blockchain/e2e_test/e2e_test.go:171 +0x122d fp=0xc006987f80 sp=0xc006987dd8 pc=0xdfde6d
testing.tRunner(0xc00ff05b00, 0x15fa3a8)
	/usr/local/go/src/testing/testing.go:1193 +0xef fp=0xc006987fd0 sp=0xc006987f80 pc=0x5ce42f
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc006987fd8 sp=0xc006987fd0 pc=0x519201
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1238 +0x2b3

@holiman
Copy link
Contributor

holiman commented Oct 4, 2021

Indeed, the read of dirties during Commit/commit is not RLock-protected!

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct. @karalabe PTAL

@fjl
Copy link
Contributor

fjl commented Oct 8, 2021

I also think it's correct, but gotta say the locking looks a bit messed-up in trie.Database. The method comment of commit says it's the private 'locked' version of Commit, but that's not true at all. It only really takes the lock while actually writing to the database.

IMHO we should just take the lock once in Commit, it would remove any doubts about correctness here. The lock becoming available intermittently during Commit doesn't seem like a feature to me.

Comment on lines +759 to +765
db.lock.RLock()
node, ok := db.dirties[hash]
if !ok {
db.lock.RUnlock()
return nil
}
db.lock.RUnlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
db.lock.RLock()
node, ok := db.dirties[hash]
if !ok {
db.lock.RUnlock()
return nil
}
db.lock.RUnlock()
db.lock.RLock()
node, ok := db.dirties[hash]
db.lock.RUnlock()
if !ok {
return nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @gballet 👍

@rjl493456442
Copy link
Member

Btw, the dirty read access in Cap function is also not read lock protected. If we eventually decide to fix it, please fix the Cap too.

@karalabe
Copy link
Member

This PR doesn't seem right to me.

The trie.Database is meant to be "mutated" sequentially: insert trie nodes; commit, insert trie nodes, commit. This is even mentioned in the type docs:

// Note, the trie Database is **not** thread safe in its mutations, but it **is**
// thread safe in providing individual, independent node access. The rationale
// behind this split design is to provide read access to RPC handlers and sync
// servers even while the trie is executing expensive garbage collection.

If your code crashes, it means that this requirement is violated somewhere. Your crash dump seems to be only half of the dump, it should ideally contain the other goroutine which accessed it at the same time? Perhaps there's something wrong with the shutdown and it doesn't wait for block imports to finish? Is this custom code or stock Geth?

@piersy
Copy link
Contributor Author

piersy commented Oct 12, 2021

Hi @karalabe, the stack trace was from https://github.com/celo-org/celo-blockchain (a geth fork).

Unfortunately golang doesn't guarantee showing both sides of a concurrent map access error in the stack trace (see here) there was only one side in the stack I previously posted.

It may be that the changes in celo-blockchain are violating the use of trie.Database and I can also see some places in the geth code where I think violations are occurring.

For example here commitLoop is executed in its own go routine:

go-ethereum/trie/trie.go

Lines 527 to 555 in 633e7ef

func (t *Trie) Commit(onleaf LeafCallback) (common.Hash, int, error) {
if t.db == nil {
panic("commit called on trie with nil database")
}
if t.root == nil {
return emptyRoot, 0, nil
}
// Derive the hash for all dirty nodes first. We hold the assumption
// in the following procedure that all nodes are hashed.
rootHash := t.Hash()
h := newCommitter()
defer returnCommitterToPool(h)
// Do a quick check if we really need to commit, before we spin
// up goroutines. This can happen e.g. if we load a trie for reading storage
// values, but don't write to it.
if _, dirty := t.root.cache(); !dirty {
return rootHash, 0, nil
}
var wg sync.WaitGroup
if onleaf != nil {
h.onleaf = onleaf
h.leafCh = make(chan *leaf, leafChanSize)
wg.Add(1)
go func() {
defer wg.Done()
h.commitLoop(t.db)
}()
}

And then here in blockchain.Stop trie.Database.Commit is called:

go-ethereum/core/blockchain.go

Lines 1012 to 1065 in 633e7ef

func (bc *BlockChain) Stop() {
if !atomic.CompareAndSwapInt32(&bc.running, 0, 1) {
return
}
// Unsubscribe all subscriptions registered from blockchain.
bc.scope.Close()
// Signal shutdown to all goroutines.
close(bc.quit)
bc.StopInsert()
// Now wait for all chain modifications to end and persistent goroutines to exit.
//
// Note: Close waits for the mutex to become available, i.e. any running chain
// modification will have exited when Close returns. Since we also called StopInsert,
// the mutex should become available quickly. It cannot be taken again after Close has
// returned.
bc.chainmu.Close()
bc.wg.Wait()
// Ensure that the entirety of the state snapshot is journalled to disk.
var snapBase common.Hash
if bc.snaps != nil {
var err error
if snapBase, err = bc.snaps.Journal(bc.CurrentBlock().Root()); err != nil {
log.Error("Failed to journal state snapshot", "err", err)
}
}
// Ensure the state of a recent block is also stored to disk before exiting.
// We're writing three different states to catch different restart scenarios:
// - HEAD: So we don't need to reprocess any blocks in the general case
// - HEAD-1: So we don't do large reorgs if our HEAD becomes an uncle
// - HEAD-127: So we have a hard limit on the number of blocks reexecuted
if !bc.cacheConfig.TrieDirtyDisabled {
triedb := bc.stateCache.TrieDB()
for _, offset := range []uint64{0, 1, TriesInMemory - 1} {
if number := bc.CurrentBlock().NumberU64(); number > offset {
recent := bc.GetBlockByNumber(number - offset)
log.Info("Writing cached state to disk", "block", recent.Number(), "hash", recent.Hash(), "root", recent.Root())
if err := triedb.Commit(recent.Root(), true, nil); err != nil {
log.Error("Failed to commit recent state trie", "err", err)
}
}
}
if snapBase != (common.Hash{}) {
log.Info("Writing snapshot state to disk", "root", snapBase)
if err := triedb.Commit(snapBase, true, nil); err != nil {
log.Error("Failed to commit recent state trie", "err", err)
}
}

And blockchain.Stop can be called as a result of geth receiving the SIGTERM signal:

func StartNode(ctx *cli.Context, stack *node.Node) {
if err := stack.Start(); err != nil {
Fatalf("Error starting protocol stack: %v", err)
}
go func() {
sigc := make(chan os.Signal, 1)
signal.Notify(sigc, syscall.SIGINT, syscall.SIGTERM)
defer signal.Stop(sigc)
minFreeDiskSpace := ethconfig.Defaults.TrieDirtyCache
if ctx.GlobalIsSet(MinFreeDiskSpaceFlag.Name) {
minFreeDiskSpace = ctx.GlobalInt(MinFreeDiskSpaceFlag.Name)
} else if ctx.GlobalIsSet(CacheFlag.Name) || ctx.GlobalIsSet(CacheGCFlag.Name) {
minFreeDiskSpace = ctx.GlobalInt(CacheFlag.Name) * ctx.GlobalInt(CacheGCFlag.Name) / 100
}
if minFreeDiskSpace > 0 {
go monitorFreeDiskSpace(sigc, stack.InstanceDir(), uint64(minFreeDiskSpace)*1024*1024)
}
<-sigc
log.Info("Got interrupt, shutting down...")
go stack.Close()
for i := 10; i > 0; i-- {
<-sigc
if i > 1 {
log.Warn("Already shutting down, interrupt more to panic.", "times", i-1)
}
}
debug.Exit() // ensure trace and CPU profile data is flushed.
debug.LoudPanic("boom")
}()
}

So it seems that these 2 go routines could be concurrently attempting to modify the trie.Database, commitLoop locks over its call to trie.Database.insert but trie.Commit does not lock before trying to read the dirties map. This seems to be one way that this violation could occur in the current geth code.

Also I just noticed this ticket:

@fjl
Copy link
Contributor

fjl commented Oct 12, 2021

Regarding BlockChain.Stop, we have already merged another PR that should provide the proper exclusion between updates in Stop and other chain mutations.

@piersy if you found this race during your testing with celo, could you retry with the changes in go-ethereum master? This issue may already be fixed due to the improved synchronization in BlockChain.

@piersy
Copy link
Contributor Author

piersy commented Oct 13, 2021

Hi @fjl I'm assuming you meant #22853, I merged that change into https://github.com/celo-org/celo-blockchain and it does seem to have solved this issue, thank you :)

However it seems that this problem could arise again, all it would take is for some code to try and access some combination methods of trie.Database or trie.Trie concurrently. Given the number of places that trie.Database or trie.Trie are accessed it doesn't seem straightforward to verify that the usage will be safe. What do you think about just making trie.Database thread safe by simply locking over its mutating methods and read locking over its non mutating methods?

@karalabe
Copy link
Member

Given the number of places that trie.Database or trie.Trie are accessed it doesn't seem straightforward to verify that the usage will be safe.

The only requirement is that you can't execute transactions and commit the state at the same time concurrently. I think that's a legit requirement: one mutates the state while another is saving it. There's no possible scenario where this can meaningfully happen outside of a programming error; so IMHO let it crash, it's a bug that needs fixing.

You also can't just call commit in between two inserts as the ref counts would be invalid (trie node insertions goes from bottom upwards, so you can end up with dangling references until all the trie nodes are inserted). Calling commit in between - even if lock protected - would corrupt the garbage collector's internal state and will lead to losing trie nodes from memory and eventually from disk.

@karalabe
Copy link
Member

If this issue happens in Geth, sure, we need to fix it, but the fix isn't in the trie database, it's at the callsite.

@piersy
Copy link
Contributor Author

piersy commented Oct 13, 2021

Hi @karalabe thank you for the clarification.

I'm also in favour of crashing for programming errors, but if possible I try to ensure that programming errors are very easy to detect. In this case I ran hundreds of iterations of a test before I encountered the concurrent map modification error. In the case of geth it seems that the programming error responsible for the concurrent map access existed since at least the 18th of May when #22892 was raised.

So I'm wondering is this code safe when it is executed concurrently but the concurrent hash map error is not hit?
Also can you think of a way that this programming error could be made more prominent, so that this programming error does not go unnoticed for a long time?

@fjl
Copy link
Contributor

fjl commented Oct 13, 2021

While it would be nice to detect misuse of trie.Database, it isn't really possible. We don't want to add additional locking because the database is really not meant to be mutated concurrently.

I am closing this PR now because it will not be merged in this form. We are very thankful that you pushed us toward fixing the shutdown races with your end-to-end test. Please do let us know if you find similar issues in the future!

@fjl fjl closed this Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants