fatal error: sync: Unlock of unlocked RWMutex #1660

darren · 2020-11-12T02:49:06Z

020-11-12 02:56:09.377 [INF] CHAN: FORK: Block 0000000000000000000821678efe7f53c388fd2c530ef46e6dd2d2f369c346cd forks the chain at height 656476/block 000000000000000000103c23bc9a56b6c08340f973fab41c7dfd6535700ad8f4, but does not ca
use a reorganize
2020-11-12 02:56:09.442 [INF] CHAN: REORGANIZE: Block 0000000000000000000aaf4126ad8beb7f643462d5111cf89068c2f25f5742b1 is causing a reorganize.
fatal error: sync: Unlock of unlocked RWMutex

goroutine 19 [running]:
runtime.throw(0xb0eb6e, 0x20)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:1116 +0x72 fp=0xc025388f30 sp=0xc025388f00 pc=0x437392
sync.throw(0xb0eb6e, 0x20)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:1102 +0x35 fp=0xc025388f50 sp=0xc025388f30 pc=0x46a3b5
sync.(*RWMutex).Unlock(0xc0006f2c7c)
        /usr/local/Cellar/go/1.15.2/libexec/src/sync/rwmutex.go:129 +0xc6 fp=0xc025388f88 sp=0xc025388f50 pc=0x4877e6
runtime.call32(0x0, 0xc55ed8, 0xc01c49a0e8, 0x800000008)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/asm_amd64.s:540 +0x3e fp=0xc025388fb8 sp=0xc025388f88 pc=0x46d99e
runtime.reflectcallSave(0xc0253890e8, 0xc55ed8, 0xc01c49a0e8, 0x8)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:881 +0x58 fp=0xc025388fe8 sp=0xc025388fb8 pc=0x436a78
runtime.runOpenDeferFrame(0xc000269200, 0xc01c49a0a0, 0x0)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:855 +0x2cd fp=0xc025389078 sp=0xc025388fe8 pc=0x43692d
panic(0xaa9d20, 0xc002554020)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:969 +0x175 fp=0xc025389130 sp=0xc025389078 pc=0x436cb5
runtime.goPanicIndex(0xffffffffffffffff, 0x19)
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/panic.go:88 +0xa5 fp=0xc025389178 sp=0xc025389130 pc=0x434ac5
github.com/btcsuite/btcd/mempool.(*FeeEstimator).RegisterBlock(0xc0000d8000, 0xc0058652d0, 0x0, 0x0)
        /private/tmp/btcd/mempool/estimatefee.go:283 +0xcd2 fp=0xc025389510 sp=0xc025389178 pc=0x8e3b32
github.com/btcsuite/btcd/netsync.(*SyncManager).handleBlockchainNotification(0xc00002c0c0, 0xc0093a32e0)
        /private/tmp/btcd/netsync/manager.go:1436 +0x205 fp=0xc0253895e0 sp=0xc025389510 pc=0x93b2e5
github.com/btcsuite/btcd/netsync.(*SyncManager).handleBlockchainNotification-fm(0xc0093a32e0)
        /private/tmp/btcd/netsync/manager.go:1389 +0x34 fp=0xc025389600 sp=0xc0253895e0 pc=0x93c954
github.com/btcsuite/btcd/blockchain.(*BlockChain).sendNotification(0xc0006f2c00, 0x1, 0xaaaee0, 0xc0058652d0)
        /private/tmp/btcd/blockchain/notifications.go:78 +0xaf fp=0xc025389648 sp=0xc025389600 pc=0x83224f
github.com/btcsuite/btcd/blockchain.(*BlockChain).connectBlock(0xc0006f2c00, 0xc00e0589a0, 0xc0058652d0, 0xc026ab3410, 0xc03bcaa000, 0x14cc, 0x14cc, 0x0, 0x0)
        /private/tmp/btcd/blockchain/chain.go:665 +0x3da fp=0xc0253896d0 sp=0xc025389648 pc=0x821afa
github.com/btcsuite/btcd/blockchain.(*BlockChain).reorganizeChain(0xc0006f2c00, 0xc031f07950, 0xc031f07860, 0xc015640980, 0x1)
        /private/tmp/btcd/blockchain/chain.go:1040 +0x14eb fp=0xc025389908 sp=0xc0253896d0 pc=0x82384b
github.com/btcsuite/btcd/blockchain.(*BlockChain).connectBestChain(0xc0006f2c00, 0xc00e059650, 0xc00f02ec40, 0x0, 0x0, 0x0, 0x0)
        /private/tmp/btcd/blockchain/chain.go:1194 +0xa68 fp=0xc025389a38 sp=0xc025389908 pc=0x824968
github.com/btcsuite/btcd/blockchain.(*BlockChain).maybeAcceptBlock(0xc0006f2c00, 0xc00f02ec40, 0xc000000000, 0xc00000a888, 0xc00a1a01f0, 0xc017f01da0)
        /private/tmp/btcd/blockchain/accept.go:79 +0x23c fp=0xc025389ad8 sp=0xc025389a38 pc=0x81f03c
github.com/btcsuite/btcd/blockchain.(*BlockChain).processOrphans(0xc0006f2c00, 0xc048128000, 0xffffffff00000000, 0x0, 0x0)
        /private/tmp/btcd/blockchain/process.go:118 +0x28e fp=0xc025389bb8 sp=0xc025389ad8 pc=0x8326ce
github.com/btcsuite/btcd/blockchain.(*BlockChain).ProcessBlock(0xc0006f2c00, 0xc0001868c0, 0xc000000000, 0xc024a80000, 0x0, 0x0)
        /private/tmp/btcd/blockchain/process.go:236 +0x305 fp=0xc025389cd0 sp=0xc025389bb8 pc=0x832b45
github.com/btcsuite/btcd/netsync.(*SyncManager).handleBlockMsg(0xc00002c0c0, 0xc0024f1860)
        /private/tmp/btcd/netsync/manager.go:710 +0x219 fp=0xc025389eb0 sp=0xc025389cd0 pc=0x936ff9
github.com/btcsuite/btcd/netsync.(*SyncManager).blockHandler(0xc00002c0c0)
        /private/tmp/btcd/netsync/manager.go:1325 +0x4fb fp=0xc025389fd8 sp=0xc025389eb0 pc=0x93aebb
runtime.goexit()
        /usr/local/Cellar/go/1.15.2/libexec/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc025389fe0 sp=0xc025389fd8 pc=0x46f221
created by github.com/btcsuite/btcd/netsync.(*SyncManager).Start
        /private/tmp/btcd/netsync/manager.go:1565 +0xd7
goroutine 1 [chan receive, 36896 minutes]:

Full log:
btcd_crash.log

onyb · 2020-11-12T09:30:07Z

It seems to be because of yesterday's chainsplit at height 656476: https://twitter.com/BitMEXResearch/status/1326651647871873024. The btcd node on forkmonitor is down too.

Is anyone already looking into this? @Roasbeef @jakesyl

jakesylvestre · 2020-11-13T15:37:10Z

Yep - I'm constructing a test to replicate now.

It looks like this happened before on testnet in #1492. That appeared to be slightly different since the node was not working on restart where as here it is (at least on our @xplorfin nodes):

Then a few solutions were suggested:

cubancorona@804b9a6 Looks very promising. The error is I'm looking at a way to test against this runtime.goPanicIndex indicating an overflow
[Bug] Add notificationLock mutex to protect against double unlock gcash/bchd#308 has an interesting solution. It adds a notificationlock indicating the problem was thought to be serial notifications firing. It looks like the reorganization test isn't failing because there are no callbacks that would get caught on the lock in the TestFullBlocks case:

Roasbeef · 2020-12-02T01:29:41Z

In the past, I traced it down to an actual panic in one of the slices used in the re-org/rollback code in the fee estimator. At times Go when unwinding the stack due to a panic can trigger defers which themselves can obfuscate the true issue. So the mutex issue is fine, but there's an incorrect assumption w.r.t re-org handling in the fee estimator itself.

A first step would be to write a new unit test for the estimator that triggers a series of re-orgs within it. Note that we have unit tests for re-orgs elsewhere, but they're written at the level of the blockchain package. When I dug into this issue last, it was triggered by a re-org larger than estimateFeeDepth, which is relatively common on testnet.

Roasbeef · 2020-12-02T01:35:57Z

Glanced at the code again, and in this case, I think the blocksToConfirm computation is to blame. If a transaction is observed at height N, but is eligible for inclusion still at height N-1, then blocksToConfirm will end up being negative here as o.observed > o.mined.

Roasbeef · 2020-12-02T01:40:57Z

I think this would also apply to RegisterBlock (referenced in this new panic trace) if a transaction ends up confirming in a height lower than when we first "observed" it.

Roasbeef · 2020-12-02T01:43:51Z

If we look at the trace itself, we see line 283 referenced:

github.com/btcsuite/btcd/mempool.(*FeeEstimator).RegisterBlock(0xc0000d8000, 0xc0058652d0, 0x0, 0x0)
        /private/tmp/btcd/mempool/estimatefee.go:283 +0xcd2 fp=0xc025389510 sp=0xc025389178 pc=0x8e3b32

This corresponds to:

if replacementCounts[blocksToConfirm] == int(ef.maxReplacements) {
    continue
}

Which is where the panic ls likely being triggered in this instance. So the root culprit here is a naive assumption w.r.t post re-org transaction inclusion in Bitcoin. This could make for an fun interview question ;)

There's a check earlier that tries to catch something like this:

// This shouldn't happen but check just in case to avoid
// an out-of-bounds array index later.
 if blocksToConfirm >= estimateFeeDepth {
    continue
}

But it doesn't factor in the other type of OOB error in Go: a negative index.

Fixes a negative index bug that makes the node crash on chain reorganizations. The bug is detailed in github.com/btcsuite/btcd/issues/1660. A better design than just skipping the transaction would make the fee estimator more accurate and that should implemented at a later date.

kcalvinalvin · 2022-02-19T20:10:01Z

The bug still exists and the fix is known. What's stopping the fix from getting into master?

Roasbeef · 2022-02-19T22:13:16Z

A pull request, care to port over your fix from utreexod?

…

On Sat, Feb 19, 2022, 12:10 PM Calvin Kim ***@***.***> wrote: The bug still exists and the fix is known. What's stopping the fix from getting into master? — Reply to this email directly, view it on GitHub <#1660 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHTWLQO2SC5JMPBZCXWG3LU372KHANCNFSM4TSXGG3Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Fixes a negative index bug that makes the node crash on chain reorganizations. The bug is detailed in github.com/btcsuite/issues/1660. A better design than just skipping the transaction would make the fee estimator more accurate and that should implemented at a later date.

kcalvinalvin · 2022-02-21T16:09:37Z

I've made a PR #1813 with the fix ported from utreexod

onyb added the bug label Nov 12, 2020

onyb added the high priority label Nov 12, 2020

johnsonjh mentioned this issue Nov 19, 2020

blockchain: Avoid deadlock with additional locking pkt-cash/pktd#156

Closed

zquestz mentioned this issue Dec 9, 2020

[Bug] Fix panic in fee estimator from reorgs gcash/bchd#434

Merged

Roasbeef mentioned this issue Jul 17, 2021

blockchain+mempool: deep re-orgs on testnet lead to double mutex unlock #1563

Closed

kcalvinalvin mentioned this issue Dec 21, 2021

mempool/estimatefee: Fix negative index bug utreexo/utreexod#11

Merged

kcalvinalvin mentioned this issue Feb 21, 2022

mempool/estimatefee: Fix negative index bug #1813

Merged

Roasbeef closed this as completed in #1813 Feb 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fatal error: sync: Unlock of unlocked RWMutex #1660

fatal error: sync: Unlock of unlocked RWMutex #1660

darren commented Nov 12, 2020 •

edited

onyb commented Nov 12, 2020

jakesylvestre commented Nov 13, 2020 •

edited

Roasbeef commented Dec 2, 2020

Roasbeef commented Dec 2, 2020

Roasbeef commented Dec 2, 2020

Roasbeef commented Dec 2, 2020

kcalvinalvin commented Feb 19, 2022

Roasbeef commented Feb 19, 2022 via email

kcalvinalvin commented Feb 21, 2022

fatal error: sync: Unlock of unlocked RWMutex #1660

fatal error: sync: Unlock of unlocked RWMutex #1660

Comments

darren commented Nov 12, 2020 • edited

onyb commented Nov 12, 2020

jakesylvestre commented Nov 13, 2020 • edited

Roasbeef commented Dec 2, 2020

Roasbeef commented Dec 2, 2020

Roasbeef commented Dec 2, 2020

Roasbeef commented Dec 2, 2020

kcalvinalvin commented Feb 19, 2022

Roasbeef commented Feb 19, 2022 via email

kcalvinalvin commented Feb 21, 2022

darren commented Nov 12, 2020 •

edited

jakesylvestre commented Nov 13, 2020 •

edited