Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ChannelMonitor persistence on block connections. #2966

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

G8XSU
Copy link
Contributor

@G8XSU G8XSU commented Mar 25, 2024

Currently, every block connection triggers the persistence of all
ChannelMonitors with an updated best_block. This approach poses
challenges for large node operators managing thousands of channels.
Furthermore, it leads to a thundering herd problem
(https://en.wikipedia.org/wiki/Thundering_herd_problem), overwhelming
the storage with simultaneous requests.

To address this issue, we now persist ChannelMonitors at a
regular cadence, spreading their persistence across blocks to
mitigate spikes in write operations.

Tasks:

  • Don't pause events for chainsync persistence Don't pause events for chainsync persistence #2957 and base it on that.
  • Concept/Approach Ack
  • Decide a good default for partition_factor
  • Maybe we can make partition_factor user-configurable. (Can also do this later, depends on our default value)
  • Write more tests for persistence with partition_factor.

Closes #2647

We used to wait on ChannelMonitor persistence to avoid
duplicate payment events. But this can still happen in cases where
ChannelMonitor handed the event to ChannelManager and we did not persist
ChannelManager after event handling.
It is expected to receive payment duplicate events and clients should handle these
events in an idempotent manner. Removing this hold-up of events simplifies
the logic and makes it easier to not persist ChannelMonitors on every block connect.
Currently, every block connection triggers the persistence of all
ChannelMonitors with an updated best_block. This approach poses
challenges for large node operators managing thousands of channels.
Furthermore, it leads to a thundering herd problem
(https://en.wikipedia.org/wiki/Thundering_herd_problem), overwhelming
the storage with simultaneous requests.

To address this issue, we now persist ChannelMonitors at a
regular cadence, spreading their persistence across blocks to
mitigate spikes in write operations.
@wpaulino
Copy link
Contributor

wpaulino commented Apr 8, 2024

Is this something we might want to consider not doing on mobile? Thinking that we won't be able to RBF onchain claims properly if the fee estimator is broken and we're not persisting the most recent feerate we tried within the OnchainTxHandler.

@TheBlueMatt
Copy link
Collaborator

I guess we should/could consider always persisting if there's pending claims (eg channel has been closed but has balances to claim)? Alternatively, we could always persist if we only have < 5 channels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Persistence] Don't persist ALL channel_monitors on every bitcoin block connection.
3 participants