routing/localchans: fix nested db tx #5643

joostjager · 2021-08-19T15:32:33Z

This PR fixes a bug where a db tx is opened within another db tx.

bhandras · 2021-08-19T15:54:47Z

routing/localchans/manager.go

@@ -76,20 +76,34 @@ func (r *Manager) UpdatePolicy(newSchema routing.ChannelPolicy,
 			return nil
 		}

+		// Add edge to a list. We can't yet apply the new policy,
+		// because the updateEdge method will open a new db transaction.
+		edges = append(edges, discovery.EdgeWithInfo{


I think edges should be initialized on first access. We can't be sure if ForAllOutgoingChannels is ever retried. Even though all of this is going to be ram cached soon.

Hmm yes interesting. But it can't be fixed here I think? It seems that the deeper graph method needs a reset function passed in as well. In fact every graph method that accepts a callback needs a reset function?

There may be non-graph db methods too with a callback. We are accidentally stumbling into this, but perhaps there is more unsafe usage of external variables. The pattern to store items in an external list is quite common.

I do want to re-emphasize my earlier comments btcsuite/btcwallet#757 (comment) and #5484 (comment). I still think that switching to the serializable locking model is a risk that doesn't need to be taken at this time.

I'd just add a flag and check it? If false => create the array, set flag true.

Edit: yeah, that won't help, need a closure, it's pretty simple to add.

There may be non-graph db methods too with a callback. We are accidentally stumbling into this, but perhaps there is more unsafe usage of external variables. The pattern to store items in an external list is quite common.

Also re: accidentally stumbling: I think during the fully remote PR we discussed that we need to make sure all DB transactions are in their closed forms and retries are handled. It may have been forgotten somehow as there was too many other threads of dicussions. Maybe worth merging this PR with #5642 and doing these fixes there.

@guggero what do you think? I am not sure if it is a good idea to leak details of the database locking model into the graph interface.

Is there an issue in the repo to check whether all transactions are in their closed forms? I had an idea that could help with that: modify the code to artificially always run the code inside each transaction twice. It is then as if everything needs a retry. Itests, bench, etc, should still work.

Add a tx kvdb.RTx parameter to the FetchChannel method of the channel DB.

That is a great suggestion. The whole thing will be retried then with a fresh slice.

What I don't understand is: Why did this never lead to a deadlock with bbolt?

Good question. I will look into it.

~~That the deadlock doesn't occur with bbolt almost makes me think that somehow two different locks are being used. As if there are two instances of something.~~

This doesn't seem to be the case.

~~I am now looking at indeed the type of lock that postgres uses. Maybe you are right @guggero that it isn't exactly the same.~~

Explanation why getting an RLock twice can lead to a deadlock: https://github.com/sasha-s/go-deadlock

The same problem exists for bbolt. It just doesn't show up in the itest because timing is different. For a recursive read lock to go wrong, there needs to be a write that comes in between. By adding a sleep(5) inside the ForAllOutgoingChannels callback, the issue reproduces with bbolt on master.

bhandras · 2021-08-19T15:57:43Z

Nice fix, I remember seeing this bug once before during the initial etcd port but totally forgot about it since we were using mixed local/remote mode.

guggero

Nice investigation and fix! We'll need to think of a way to detect and fix other such cases, going to track this in #5557.

This particular case looks good to me 💯

chanbackup/backup.go

bhandras

LGTM, nice to have this fixed! 👍

One minor thing I'd like to suggest is to squash the two commits and separate the release note update as it's a bit easier to follow the changes that way.

joostjager · 2021-08-23T12:49:44Z

squash the two commits

I just separated them this morning :)

I thought it would be easier for you to read if the non-functional refactor is on its own.

bhandras · 2021-08-24T11:40:16Z

Needs rebase.

Adds an optional tx parameter to ForAllOutgoingChannels and FetchChannel so that data can be queried within the context of an existing database transaction.

This commit fixes a bug where a db tx is opened within another db tx.

joostjager · 2021-08-24T11:44:03Z

Rebased

guggero · 2021-08-24T15:42:25Z

Going to merge this as both itest failures are known flakes.

joostjager mentioned this pull request Aug 19, 2021

kvdb: add postgres #5366

Merged

bhandras reviewed Aug 19, 2021

View reviewed changes

joostjager force-pushed the updatepolicy-nested-tx branch from 358befd to 241c32f Compare August 19, 2021 16:26

joostjager mentioned this pull request Aug 20, 2021

kvdb: concurrency issue tracker #5557

Open

3 tasks

joostjager force-pushed the updatepolicy-nested-tx branch 3 times, most recently from 46de99b to ffa7b04 Compare August 23, 2021 10:19

guggero approved these changes Aug 23, 2021

View reviewed changes

chanbackup/backup.go Show resolved Hide resolved

joostjager force-pushed the updatepolicy-nested-tx branch 3 times, most recently from bb4008c to a5446f0 Compare August 23, 2021 11:43

bhandras approved these changes Aug 23, 2021

View reviewed changes

joostjager added 2 commits August 24, 2021 13:43

channeldb+routing: add tx parameter

3f77577

Adds an optional tx parameter to ForAllOutgoingChannels and FetchChannel so that data can be queried within the context of an existing database transaction.

routing/localchans: fix nested db tx

bd07b5f

This commit fixes a bug where a db tx is opened within another db tx.

joostjager force-pushed the updatepolicy-nested-tx branch from a5446f0 to bd07b5f Compare August 24, 2021 11:43

guggero merged commit 93d12cd into lightningnetwork:master Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

routing/localchans: fix nested db tx #5643

routing/localchans: fix nested db tx #5643

joostjager commented Aug 19, 2021

bhandras Aug 19, 2021

joostjager Aug 19, 2021 •

edited

bhandras Aug 19, 2021 •

edited

bhandras Aug 19, 2021 •

edited

joostjager Aug 20, 2021

joostjager Aug 20, 2021

joostjager Aug 20, 2021 •

edited

joostjager Aug 20, 2021 •

edited

joostjager Aug 20, 2021

joostjager Aug 20, 2021

bhandras commented Aug 19, 2021 •

edited

guggero left a comment

bhandras left a comment

joostjager commented Aug 23, 2021 •

edited

bhandras commented Aug 24, 2021 •

edited

joostjager commented Aug 24, 2021

guggero commented Aug 24, 2021

routing/localchans: fix nested db tx #5643

routing/localchans: fix nested db tx #5643

Conversation

joostjager commented Aug 19, 2021

bhandras Aug 19, 2021

Choose a reason for hiding this comment

joostjager Aug 19, 2021 • edited

Choose a reason for hiding this comment

bhandras Aug 19, 2021 • edited

Choose a reason for hiding this comment

bhandras Aug 19, 2021 • edited

Choose a reason for hiding this comment

joostjager Aug 20, 2021

Choose a reason for hiding this comment

joostjager Aug 20, 2021

Choose a reason for hiding this comment

joostjager Aug 20, 2021 • edited

Choose a reason for hiding this comment

joostjager Aug 20, 2021 • edited

Choose a reason for hiding this comment

joostjager Aug 20, 2021

Choose a reason for hiding this comment

joostjager Aug 20, 2021

Choose a reason for hiding this comment

bhandras commented Aug 19, 2021 • edited

guggero left a comment

Choose a reason for hiding this comment

bhandras left a comment

Choose a reason for hiding this comment

joostjager commented Aug 23, 2021 • edited

bhandras commented Aug 24, 2021 • edited

joostjager commented Aug 24, 2021

guggero commented Aug 24, 2021

joostjager Aug 19, 2021 •

edited

bhandras Aug 19, 2021 •

edited

bhandras Aug 19, 2021 •

edited

joostjager Aug 20, 2021 •

edited

joostjager Aug 20, 2021 •

edited

bhandras commented Aug 19, 2021 •

edited

joostjager commented Aug 23, 2021 •

edited

bhandras commented Aug 24, 2021 •

edited