Optimize ChannelMonitor persistence on block connections. #2966

G8XSU · 2024-03-25T13:34:16Z

Currently, every block connection triggers the persistence of all
ChannelMonitors with an updated best_block. This approach poses
challenges for large node operators managing thousands of channels.
Furthermore, it leads to a thundering herd problem
(https://en.wikipedia.org/wiki/Thundering_herd_problem), overwhelming
the storage with simultaneous requests.

To address this issue, we now persist ChannelMonitors at a
regular cadence, spreading their persistence across blocks to
mitigate spikes in write operations.

Tasks:

Don't pause events for chainsync persistence Don't pause events for chainsync persistence #2957 and base it on that.
Concept/Approach Ack
Decide a good default for partition_factor
Maybe we can make partition_factor user-configurable. (Can also do this later, depends on our default value)
Write more tests for persistence with partition_factor.

Closes #2647

We used to wait on ChannelMonitor persistence to avoid duplicate payment events. But this can still happen in cases where ChannelMonitor handed the event to ChannelManager and we did not persist ChannelManager after event handling. It is expected to receive payment duplicate events and clients should handle these events in an idempotent manner. Removing this hold-up of events simplifies the logic and makes it easier to not persist ChannelMonitors on every block connect.

Currently, every block connection triggers the persistence of all ChannelMonitors with an updated best_block. This approach poses challenges for large node operators managing thousands of channels. Furthermore, it leads to a thundering herd problem (https://en.wikipedia.org/wiki/Thundering_herd_problem), overwhelming the storage with simultaneous requests. To address this issue, we now persist ChannelMonitors at a regular cadence, spreading their persistence across blocks to mitigate spikes in write operations.

wpaulino · 2024-04-08T16:27:18Z

Is this something we might want to consider not doing on mobile? Thinking that we won't be able to RBF onchain claims properly if the fee estimator is broken and we're not persisting the most recent feerate we tried within the OnchainTxHandler.

TheBlueMatt · 2024-04-11T13:26:39Z

I guess we should/could consider always persisting if there's pending claims (eg channel has been closed but has balances to claim)? Alternatively, we could always persist if we only have < 5 channels.

G8XSU added 2 commits March 21, 2024 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ChannelMonitor persistence on block connections. #2966

Optimize ChannelMonitor persistence on block connections. #2966

G8XSU commented Mar 25, 2024 •

edited

wpaulino commented Apr 8, 2024

TheBlueMatt commented Apr 11, 2024

Optimize ChannelMonitor persistence on block connections. #2966

Are you sure you want to change the base?

Optimize ChannelMonitor persistence on block connections. #2966

Conversation

G8XSU commented Mar 25, 2024 • edited

wpaulino commented Apr 8, 2024

TheBlueMatt commented Apr 11, 2024

G8XSU commented Mar 25, 2024 •

edited