Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop reading monitors when persisting in updating persister #2706

Closed
TheBlueMatt opened this issue Nov 4, 2023 · 14 comments · Fixed by #2779
Closed

Stop reading monitors when persisting in updating persister #2706

TheBlueMatt opened this issue Nov 4, 2023 · 14 comments · Fixed by #2779
Assignees
Milestone

Comments

@TheBlueMatt
Copy link
Collaborator

Turns out this line is brutalizing our heap and leading to fragmentation, it needs to go away -

let maybe_old_monitor = self.read_monitor(&monitor_name);

@domZippilli
Copy link
Contributor

domZippilli commented Nov 5, 2023

cc: @G8XSU

I decided to try this out real quick, and it passes unit tests in the file. I wouldn't do it exactly like this (probably would at least get the TLV macro to work) but I would imagine this allocates very little beyond the monitor read's Vec<u8>?

18ef772

@tnull
Copy link
Contributor

tnull commented Nov 6, 2023

@domZippilli Wouldn't it be easier for the MUP to store some (conservatively updated and persisted) tracking state mapping the monitor id to latest update id? This might allow us to not read the stored monitor during persist at all?

@TheBlueMatt
Copy link
Collaborator Author

I mean if we can avoid additional state that'd be ideal - additional state means we have to check that its consistent with the existing state and potentially handle inconsistency between them.

@domZippilli
Copy link
Contributor

domZippilli commented Nov 6, 2023

Yeah, there was a monitor_id,u64 map in here at some point in the process. When we had the discussions on the RFC I think there was a general preference to avoid duplicate state in the MUP, which is how we ended up reading it from storage like this. It'd be ideal, I suppose, to read just the required bytes, but ranged reads in KVStore seems like asking a lot.

@G8XSU
Copy link
Contributor

G8XSU commented Nov 6, 2023

B/w the redundant deletes and reading monitor_update_id bytes I prefer reading update_id bytes. (after reading full_monitor from storage and this will avoid allocating for some of the bigger things in that struct, which is our main pain-point)
This is because issuing unnecessary delete calls over the network is significant overhead. (if we think apart from fs_store).

And this can be specifically problematic if consolidation threshold (maximum_pending_updates) is something like 1000 or more.

@tnull
Copy link
Contributor

tnull commented Nov 7, 2023

I mean if we can avoid additional state that'd be ideal - additional state means we have to check that its consistent with the existing state and potentially handle inconsistency between them.

Yes, we would need to implement it in a way that would never break anything if it gotten out-of-date. I think worst case we would issue a bunch of redundant/nop delete operations to catch up.

Yeah, there was a monitor_id,u64 map in here at some point in the process. When we had the discussions on the RFC I think there was a general preference to avoid duplicate state in the MUP, which is how we ended up reading it from storage like this.

Yeah, I think I also had suggested going with some tracking state in the PR originally. There def. is a trade-off between performance and robustness/complexity here. If we now find that these CMU reads are substantially increasing heap fragmentation, we might need to reconsider. However, we might still think the superfluous reads are not worth introducing the additional complexity. I guess that depends how bad the 'brutalizing' really is.

B/w the redundant deletes and reading monitor_update_id bytes I prefer reading update_id bytes. (after reading full_monitor from storage and this will avoid allocating for some of the bigger things in that struct, which is our main pain-point) This is because issuing unnecessary delete calls over the network is significant overhead. (if we think apart from fs_store).

I'm not sure I'm following here? Reading the full monitor will allocate and return a Vec the size of the serialized monitor. Yes, by reading just the required number of bytes we'll avoid additional allocations, however, I imagine reading the monitor is already our main pain point as it's a huge chunk of allocated data without clean boundaries?

@TheBlueMatt
Copy link
Collaborator Author

Tentatively assigning to @G8XSU

@TheBlueMatt
Copy link
Collaborator Author

Doing a monitor read for deletes will mean reading the full monitor bytes, which does kinda suck, vs not doing a monitor read will just mean 100 deletes - in cases where users have a real need for the ChannelMonitorUpdatePersister they're doing a lot of updates and regularly have to issue 100 deletes at a time either way, we're just adding an extra up-to-100 deletes per block. When its per-block-per-monitor that may add up, but once we land #2647 that'll go down to something like 100 deletes per block, which should be well within what any user of ChannelMonitorUpdatePersister can tolerate (since they see that many at once all the time).

however, I imagine reading the monitor is already our main pain point as it's a huge chunk of allocated data without clean boundaries?

I don't think this is really the case. One huge allocation followed by it being deallocated should be mostly tolerable, especially if we're talking about something substantially larger than a page or two, where its just gonna get its own special handling by allocating a few pages. That will still lead to memory bloat since those pages are unlikely to be returned to the OS, but at least it won't be a huge amount of fragmentation that we can never reuse.

So, all that said, I'm okay with either solution, but marginally prefer to just issue the deletes and move on, cause it feels simpler than trying to figure out partial reading. I don't think there's a huge performance argument to either, which generally means I'd prefer to avoid yet more allocations, which even if they don't create more fragmentation, does mean we use yet more memory.

@G8XSU
Copy link
Contributor

G8XSU commented Dec 1, 2023

There are 3 possible approaches:

  1. Read monitor bytes
    As pointed out by matt earlier a big allocation which is immediately followed by dealloc shouldn't be a big concern, our main concern was fragmentation due to series of small allocs after de-serializing full monitor. Problem at hand will be solved.

    • Cons:
      • minor but we are still reading full monitor bytes from storage.
  2. Issue redundant deletes

    • till (current_update_id - maximum_pending_updates)..current_update_id on consolidation.

    • Straightforward to implement.

    • Con:

      • But main concern is cost of doing ~100/1000 service calls. A cloud database might charge for IOPS and even for service calls which donot result in actual delete.
      • Also this is worrisome for clients who already have MUP deployed, since it will issue 100 deletes per monitor on every block for now (until 2647). Needs to be in same release as [Persistence] Don't persist ALL channel_monitors on every bitcoin block connection.  #2647.
      • If we compare the cost of reading a monitor vs redundant ~100/1000 write calls, former looks like a better trade-off. In (1) even though we use up some RAM for minuscule amount of time, we heavily save on IO. RAM is comparatively cheap and an MB or so for a millisecond should be affordable.
  3. Don't cleanup when there is monitor persistence resulting from block-update.

    • In this, we will only cleanup when we reach consolidation threshold. Currently we cleanup on both consolidation-threshold and block-connect.
    • After this, we will almost always have maximum_pending_updates to delete, hence we can implement this together with (2).
    • Independent of 2647
    • Con
      • Bigger consolidations.

Overall my preference for implementation would be (3+2)> (1) > (2)

Let me know if everyone is on same page.

@tnull
Copy link
Contributor

tnull commented Dec 1, 2023

Overall my preference for implementation would be (3+2)> (1) > (2)

Let me know if everyone is on same page.

SGTM. I still think we might be able to improve on the laid out options if we'd take on some additional complexity for tracking state. However, since that seems to be off the table (3+2)> (1) > (2) seems reasonable to me. In particular if we're talking about operation about remote backends, 100 calls might introduce a lot more overhead in terms of latency than just biting the bullet and reading the whole monitor.

@TheBlueMatt
Copy link
Collaborator Author

All sounds good to me.

@G8XSU
Copy link
Contributor

G8XSU commented Dec 7, 2023

Minor hiccup: when we are persisting with update_id == CLOSED_CHANNEL_UPDATE_ID
we can't subtract maximum_pending_updates and cleanup. We will be left with some monitor_updates (at max maximum_pending_updates)

Option-1 It is fine to leave some updates since we have clean_stale_updates fn. (Imo this is not good because now our cleanup logic is for sure missing updates to clean)
Option-2 Read monitor in this case, this is what we are currently doing. (we end up doing almost everything in (1), (2), (3) 😓)
Option-3 Do (1) approach instead, as mentioned earlier.

@G8XSU
Copy link
Contributor

G8XSU commented Dec 7, 2023

^ Will be going with Option-2

@TheBlueMatt
Copy link
Collaborator Author

That seems fine. Note that there can be multiple CLOSED_CHANNEL_UPDATE_ID monitor updates and all must be persisted (or just the full monitor each time, which seems fine for a closed channel).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants