New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fatal error: sync: Unlock of unlocked RWMutex #1660
Comments
It seems to be because of yesterday's chainsplit at height 656476: https://twitter.com/BitMEXResearch/status/1326651647871873024. The btcd node on forkmonitor is down too. |
Yep - I'm constructing a test to replicate now. It looks like this happened before on testnet in #1492. That appeared to be slightly different since the node was not working on restart where as here it is (at least on our @xplorfin nodes): Then a few solutions were suggested:
|
In the past, I traced it down to an actual panic in one of the slices used in the re-org/rollback code in the fee estimator. At times Go when unwinding the stack due to a panic can trigger defers which themselves can obfuscate the true issue. So the mutex issue is fine, but there's an incorrect assumption w.r.t re-org handling in the fee estimator itself. A first step would be to write a new unit test for the estimator that triggers a series of re-orgs within it. Note that we have unit tests for re-orgs elsewhere, but they're written at the level of the |
Glanced at the code again, and in this case, I think the |
I think this would also apply to |
If we look at the trace itself, we see line
This corresponds to: if replacementCounts[blocksToConfirm] == int(ef.maxReplacements) {
continue
} Which is where the panic ls likely being triggered in this instance. So the root culprit here is a naive assumption w.r.t post re-org transaction inclusion in Bitcoin. This could make for an fun interview question ;) There's a check earlier that tries to catch something like this: // This shouldn't happen but check just in case to avoid
// an out-of-bounds array index later.
if blocksToConfirm >= estimateFeeDepth {
continue
} But it doesn't factor in the other type of OOB error in Go: a negative index. |
Fixes a negative index bug that makes the node crash on chain reorganizations. The bug is detailed in github.com/btcsuite/btcd/issues/1660. A better design than just skipping the transaction would make the fee estimator more accurate and that should implemented at a later date.
The bug still exists and the fix is known. What's stopping the fix from getting into master? |
A pull request, care to port over your fix from utreexod?
…On Sat, Feb 19, 2022, 12:10 PM Calvin Kim ***@***.***> wrote:
The bug still exists and the fix is known. What's stopping the fix from
getting into master?
—
Reply to this email directly, view it on GitHub
<#1660 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHTWLQO2SC5JMPBZCXWG3LU372KHANCNFSM4TSXGG3Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Fixes a negative index bug that makes the node crash on chain reorganizations. The bug is detailed in github.com/btcsuite/issues/1660. A better design than just skipping the transaction would make the fee estimator more accurate and that should implemented at a later date.
I've made a PR #1813 with the fix ported from utreexod |
Full log:
btcd_crash.log
The text was updated successfully, but these errors were encountered: