Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: sync: Unlock of unlocked RWMutex #1660

Closed
darren opened this issue Nov 12, 2020 · 9 comments · Fixed by #1813
Closed

fatal error: sync: Unlock of unlocked RWMutex #1660

darren opened this issue Nov 12, 2020 · 9 comments · Fixed by #1813

Comments

@darren
Copy link

darren commented Nov 12, 2020

020-11-12 02:56:09.377 [INF] CHAN: FORK: Block 0000000000000000000821678efe7f53c388fd2c530ef46e6dd2d2f369c346cd forks the chain at height 656476/block 000000000000000000103c23bc9a56b6c08340f973fab41c7dfd6535700ad8f4, but does not ca
use a reorganize
2020-11-12 02:56:09.442 [INF] CHAN: REORGANIZE: Block 0000000000000000000aaf4126ad8beb7f643462d5111cf89068c2f25f5742b1 is causing a reorganize.
fatal error: sync: Unlock of unlocked RWMutex

goroutine 19 [running]:
runtime.throw(0xb0eb6e, 0x20)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:1116 +0x72 fp=0xc025388f30 sp=0xc025388f00 pc=0x437392
sync.throw(0xb0eb6e, 0x20)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:1102 +0x35 fp=0xc025388f50 sp=0xc025388f30 pc=0x46a3b5
sync.(*RWMutex).Unlock(0xc0006f2c7c)
        /usr/local/Cellar/go/1.15.2/libexec/src/sync/rwmutex.go:129 +0xc6 fp=0xc025388f88 sp=0xc025388f50 pc=0x4877e6
runtime.call32(0x0, 0xc55ed8, 0xc01c49a0e8, 0x800000008)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/asm_amd64.s:540 +0x3e fp=0xc025388fb8 sp=0xc025388f88 pc=0x46d99e
runtime.reflectcallSave(0xc0253890e8, 0xc55ed8, 0xc01c49a0e8, 0x8)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:881 +0x58 fp=0xc025388fe8 sp=0xc025388fb8 pc=0x436a78
runtime.runOpenDeferFrame(0xc000269200, 0xc01c49a0a0, 0x0)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:855 +0x2cd fp=0xc025389078 sp=0xc025388fe8 pc=0x43692d
panic(0xaa9d20, 0xc002554020)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:969 +0x175 fp=0xc025389130 sp=0xc025389078 pc=0x436cb5
runtime.goPanicIndex(0xffffffffffffffff, 0x19)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:88 +0xa5 fp=0xc025389178 sp=0xc025389130 pc=0x434ac5
github.com/btcsuite/btcd/mempool.(*FeeEstimator).RegisterBlock(0xc0000d8000, 0xc0058652d0, 0x0, 0x0)
        /private/tmp/btcd/mempool/estimatefee.go:283 +0xcd2 fp=0xc025389510 sp=0xc025389178 pc=0x8e3b32
github.com/btcsuite/btcd/netsync.(*SyncManager).handleBlockchainNotification(0xc00002c0c0, 0xc0093a32e0)
        /private/tmp/btcd/netsync/manager.go:1436 +0x205 fp=0xc0253895e0 sp=0xc025389510 pc=0x93b2e5
github.com/btcsuite/btcd/netsync.(*SyncManager).handleBlockchainNotification-fm(0xc0093a32e0)
        /private/tmp/btcd/netsync/manager.go:1389 +0x34 fp=0xc025389600 sp=0xc0253895e0 pc=0x93c954
github.com/btcsuite/btcd/blockchain.(*BlockChain).sendNotification(0xc0006f2c00, 0x1, 0xaaaee0, 0xc0058652d0)
        /private/tmp/btcd/blockchain/notifications.go:78 +0xaf fp=0xc025389648 sp=0xc025389600 pc=0x83224f
github.com/btcsuite/btcd/blockchain.(*BlockChain).connectBlock(0xc0006f2c00, 0xc00e0589a0, 0xc0058652d0, 0xc026ab3410, 0xc03bcaa000, 0x14cc, 0x14cc, 0x0, 0x0)
        /private/tmp/btcd/blockchain/chain.go:665 +0x3da fp=0xc0253896d0 sp=0xc025389648 pc=0x821afa
github.com/btcsuite/btcd/blockchain.(*BlockChain).reorganizeChain(0xc0006f2c00, 0xc031f07950, 0xc031f07860, 0xc015640980, 0x1)
        /private/tmp/btcd/blockchain/chain.go:1040 +0x14eb fp=0xc025389908 sp=0xc0253896d0 pc=0x82384b
github.com/btcsuite/btcd/blockchain.(*BlockChain).connectBestChain(0xc0006f2c00, 0xc00e059650, 0xc00f02ec40, 0x0, 0x0, 0x0, 0x0)
        /private/tmp/btcd/blockchain/chain.go:1194 +0xa68 fp=0xc025389a38 sp=0xc025389908 pc=0x824968
github.com/btcsuite/btcd/blockchain.(*BlockChain).maybeAcceptBlock(0xc0006f2c00, 0xc00f02ec40, 0xc000000000, 0xc00000a888, 0xc00a1a01f0, 0xc017f01da0)
        /private/tmp/btcd/blockchain/accept.go:79 +0x23c fp=0xc025389ad8 sp=0xc025389a38 pc=0x81f03c
github.com/btcsuite/btcd/blockchain.(*BlockChain).processOrphans(0xc0006f2c00, 0xc048128000, 0xffffffff00000000, 0x0, 0x0)
        /private/tmp/btcd/blockchain/process.go:118 +0x28e fp=0xc025389bb8 sp=0xc025389ad8 pc=0x8326ce
github.com/btcsuite/btcd/blockchain.(*BlockChain).ProcessBlock(0xc0006f2c00, 0xc0001868c0, 0xc000000000, 0xc024a80000, 0x0, 0x0)
        /private/tmp/btcd/blockchain/process.go:236 +0x305 fp=0xc025389cd0 sp=0xc025389bb8 pc=0x832b45
github.com/btcsuite/btcd/netsync.(*SyncManager).handleBlockMsg(0xc00002c0c0, 0xc0024f1860)
        /private/tmp/btcd/netsync/manager.go:710 +0x219 fp=0xc025389eb0 sp=0xc025389cd0 pc=0x936ff9
github.com/btcsuite/btcd/netsync.(*SyncManager).blockHandler(0xc00002c0c0)
        /private/tmp/btcd/netsync/manager.go:1325 +0x4fb fp=0xc025389fd8 sp=0xc025389eb0 pc=0x93aebb
runtime.goexit()
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc025389fe0 sp=0xc025389fd8 pc=0x46f221
created by github.com/btcsuite/btcd/netsync.(*SyncManager).Start
        /private/tmp/btcd/netsync/manager.go:1565 +0xd7
goroutine 1 [chan receive, 36896 minutes]:

Full log:
btcd_crash.log

@onyb onyb added the bug label Nov 12, 2020
@onyb
Copy link
Collaborator

onyb commented Nov 12, 2020

It seems to be because of yesterday's chainsplit at height 656476: https://twitter.com/BitMEXResearch/status/1326651647871873024. The btcd node on forkmonitor is down too.

Is anyone already looking into this? @Roasbeef @jakesyl

@jakesylvestre
Copy link
Collaborator

jakesylvestre commented Nov 13, 2020

Yep - I'm constructing a test to replicate now.

It looks like this happened before on testnet in #1492. That appeared to be slightly different since the node was not working on restart where as here it is (at least on our @xplorfin nodes):

image

Then a few solutions were suggested:

image

@Roasbeef
Copy link
Member

Roasbeef commented Dec 2, 2020

In the past, I traced it down to an actual panic in one of the slices used in the re-org/rollback code in the fee estimator. At times Go when unwinding the stack due to a panic can trigger defers which themselves can obfuscate the true issue. So the mutex issue is fine, but there's an incorrect assumption w.r.t re-org handling in the fee estimator itself.

A first step would be to write a new unit test for the estimator that triggers a series of re-orgs within it. Note that we have unit tests for re-orgs elsewhere, but they're written at the level of the blockchain package. When I dug into this issue last, it was triggered by a re-org larger than estimateFeeDepth, which is relatively common on testnet.

@Roasbeef
Copy link
Member

Roasbeef commented Dec 2, 2020

Glanced at the code again, and in this case, I think the blocksToConfirm computation is to blame. If a transaction is observed at height N, but is eligible for inclusion still at height N-1, then blocksToConfirm will end up being negative here as o.observed > o.mined.

@Roasbeef
Copy link
Member

Roasbeef commented Dec 2, 2020

I think this would also apply to RegisterBlock (referenced in this new panic trace) if a transaction ends up confirming in a height lower than when we first "observed" it.

@Roasbeef
Copy link
Member

Roasbeef commented Dec 2, 2020

If we look at the trace itself, we see line 283 referenced:

github.com/btcsuite/btcd/mempool.(*FeeEstimator).RegisterBlock(0xc0000d8000, 0xc0058652d0, 0x0, 0x0)
        /private/tmp/btcd/mempool/estimatefee.go:283 +0xcd2 fp=0xc025389510 sp=0xc025389178 pc=0x8e3b32

This corresponds to:

if replacementCounts[blocksToConfirm] == int(ef.maxReplacements) {
    continue
}

Which is where the panic ls likely being triggered in this instance. So the root culprit here is a naive assumption w.r.t post re-org transaction inclusion in Bitcoin. This could make for an fun interview question ;)

There's a check earlier that tries to catch something like this:

// This shouldn't happen but check just in case to avoid
// an out-of-bounds array index later.
 if blocksToConfirm >= estimateFeeDepth {
    continue
}

But it doesn't factor in the other type of OOB error in Go: a negative index.

kcalvinalvin added a commit to kcalvinalvin/utreexod that referenced this issue Dec 21, 2021
Fixes a negative index bug that makes the node crash on chain
reorganizations.  The bug is detailed in
github.com/btcsuite/btcd/issues/1660.

A better design than just skipping the transaction would make
the fee estimator more accurate and that should implemented
at a later date.
@kcalvinalvin
Copy link
Collaborator

The bug still exists and the fix is known. What's stopping the fix from getting into master?

@Roasbeef
Copy link
Member

Roasbeef commented Feb 19, 2022 via email

kcalvinalvin added a commit to kcalvinalvin/btcd that referenced this issue Feb 21, 2022
Fixes a negative index bug that makes the node crash on chain
reorganizations.  The bug is detailed in
github.com/btcsuite/issues/1660.

A better design than just skipping the transaction would make
the fee estimator more accurate and that should implemented
at a later date.
@kcalvinalvin
Copy link
Collaborator

I've made a PR #1813 with the fix ported from utreexod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants