Stop reading monitors when persisting in updating persister #2706

TheBlueMatt · 2023-11-04T17:16:27Z

Turns out this line is brutalizing our heap and leading to fragmentation, it needs to go away -

rust-lightning/lightning/src/util/persist.rs

Line 613 in 281a0ae

let maybe_old_monitor = self.read_monitor(&monitor_name);

domZippilli · 2023-11-05T21:17:30Z

I decided to try this out real quick, and it passes unit tests in the file. I wouldn't do it exactly like this (probably would at least get the TLV macro to work) but I would imagine this allocates very little beyond the monitor read's Vec<u8>?

18ef772

tnull · 2023-11-06T10:00:31Z

@domZippilli Wouldn't it be easier for the MUP to store some (conservatively updated and persisted) tracking state mapping the monitor id to latest update id? This might allow us to not read the stored monitor during persist at all?

TheBlueMatt · 2023-11-06T16:34:59Z

I mean if we can avoid additional state that'd be ideal - additional state means we have to check that its consistent with the existing state and potentially handle inconsistency between them.

domZippilli · 2023-11-06T16:53:50Z

Yeah, there was a monitor_id,u64 map in here at some point in the process. When we had the discussions on the RFC I think there was a general preference to avoid duplicate state in the MUP, which is how we ended up reading it from storage like this. It'd be ideal, I suppose, to read just the required bytes, but ranged reads in KVStore seems like asking a lot.

G8XSU · 2023-11-06T23:30:13Z

B/w the redundant deletes and reading monitor_update_id bytes I prefer reading update_id bytes. (after reading full_monitor from storage and this will avoid allocating for some of the bigger things in that struct, which is our main pain-point)
This is because issuing unnecessary delete calls over the network is significant overhead. (if we think apart from fs_store).

And this can be specifically problematic if consolidation threshold (maximum_pending_updates) is something like 1000 or more.

tnull · 2023-11-07T10:06:40Z

I mean if we can avoid additional state that'd be ideal - additional state means we have to check that its consistent with the existing state and potentially handle inconsistency between them.

Yes, we would need to implement it in a way that would never break anything if it gotten out-of-date. I think worst case we would issue a bunch of redundant/nop delete operations to catch up.

Yeah, there was a monitor_id,u64 map in here at some point in the process. When we had the discussions on the RFC I think there was a general preference to avoid duplicate state in the MUP, which is how we ended up reading it from storage like this.

Yeah, I think I also had suggested going with some tracking state in the PR originally. There def. is a trade-off between performance and robustness/complexity here. If we now find that these CMU reads are substantially increasing heap fragmentation, we might need to reconsider. However, we might still think the superfluous reads are not worth introducing the additional complexity. I guess that depends how bad the 'brutalizing' really is.

B/w the redundant deletes and reading monitor_update_id bytes I prefer reading update_id bytes. (after reading full_monitor from storage and this will avoid allocating for some of the bigger things in that struct, which is our main pain-point) This is because issuing unnecessary delete calls over the network is significant overhead. (if we think apart from fs_store).

I'm not sure I'm following here? Reading the full monitor will allocate and return a Vec the size of the serialized monitor. Yes, by reading just the required number of bytes we'll avoid additional allocations, however, I imagine reading the monitor is already our main pain point as it's a huge chunk of allocated data without clean boundaries?

TheBlueMatt · 2023-11-28T23:41:39Z

Tentatively assigning to @G8XSU

TheBlueMatt · 2023-11-29T00:02:10Z

Doing a monitor read for deletes will mean reading the full monitor bytes, which does kinda suck, vs not doing a monitor read will just mean 100 deletes - in cases where users have a real need for the ChannelMonitorUpdatePersister they're doing a lot of updates and regularly have to issue 100 deletes at a time either way, we're just adding an extra up-to-100 deletes per block. When its per-block-per-monitor that may add up, but once we land #2647 that'll go down to something like 100 deletes per block, which should be well within what any user of ChannelMonitorUpdatePersister can tolerate (since they see that many at once all the time).

however, I imagine reading the monitor is already our main pain point as it's a huge chunk of allocated data without clean boundaries?

I don't think this is really the case. One huge allocation followed by it being deallocated should be mostly tolerable, especially if we're talking about something substantially larger than a page or two, where its just gonna get its own special handling by allocating a few pages. That will still lead to memory bloat since those pages are unlikely to be returned to the OS, but at least it won't be a huge amount of fragmentation that we can never reuse.

So, all that said, I'm okay with either solution, but marginally prefer to just issue the deletes and move on, cause it feels simpler than trying to figure out partial reading. I don't think there's a huge performance argument to either, which generally means I'd prefer to avoid yet more allocations, which even if they don't create more fragmentation, does mean we use yet more memory.

G8XSU · 2023-12-01T00:50:13Z

There are 3 possible approaches:

Read monitor bytes
As pointed out by matt earlier a big allocation which is immediately followed by dealloc shouldn't be a big concern, our main concern was fragmentation due to series of small allocs after de-serializing full monitor. Problem at hand will be solved.
- Cons:
  - minor but we are still reading full monitor bytes from storage.
Issue redundant deletes
- till (current_update_id - maximum_pending_updates)..current_update_id on consolidation.
- Straightforward to implement.
- Con:
  - But main concern is cost of doing ~100/1000 service calls. A cloud database might charge for IOPS and even for service calls which donot result in actual delete.
  - Also this is worrisome for clients who already have MUP deployed, since it will issue 100 deletes per monitor on every block for now (until 2647). Needs to be in same release as [Persistence] Don't persist ALL channel_monitors on every bitcoin block connection. #2647.
  - If we compare the cost of reading a monitor vs redundant ~100/1000 write calls, former looks like a better trade-off. In (1) even though we use up some RAM for minuscule amount of time, we heavily save on IO. RAM is comparatively cheap and an MB or so for a millisecond should be affordable.
Don't cleanup when there is monitor persistence resulting from block-update.
- In this, we will only cleanup when we reach consolidation threshold. Currently we cleanup on both consolidation-threshold and block-connect.
- After this, we will almost always have maximum_pending_updates to delete, hence we can implement this together with (2).
- Independent of 2647
- Con
  - Bigger consolidations.

Overall my preference for implementation would be (3+2)> (1) > (2)

Let me know if everyone is on same page.

tnull · 2023-12-01T13:13:08Z

Overall my preference for implementation would be (3+2)> (1) > (2)

Let me know if everyone is on same page.

SGTM. I still think we might be able to improve on the laid out options if we'd take on some additional complexity for tracking state. However, since that seems to be off the table (3+2)> (1) > (2) seems reasonable to me. In particular if we're talking about operation about remote backends, 100 calls might introduce a lot more overhead in terms of latency than just biting the bullet and reading the whole monitor.

TheBlueMatt · 2023-12-06T20:42:08Z

All sounds good to me.

G8XSU · 2023-12-07T05:23:26Z

Minor hiccup: when we are persisting with update_id == CLOSED_CHANNEL_UPDATE_ID
we can't subtract maximum_pending_updates and cleanup. We will be left with some monitor_updates (at max maximum_pending_updates)

Option-1 It is fine to leave some updates since we have clean_stale_updates fn. (Imo this is not good because now our cleanup logic is for sure missing updates to clean)
Option-2 Read monitor in this case, this is what we are currently doing. (we end up doing almost everything in (1), (2), (3) 😓)
Option-3 Do (1) approach instead, as mentioned earlier.

G8XSU · 2023-12-07T18:08:02Z

^ Will be going with Option-2

TheBlueMatt · 2023-12-08T03:07:44Z

That seems fine. Note that there can be multiple CLOSED_CHANNEL_UPDATE_ID monitor updates and all must be persisted (or just the full monitor each time, which seems fine for a closed channel).

TheBlueMatt added this to the 0.0.119 milestone Nov 4, 2023

This was referenced Nov 4, 2023

Use buffers in filesystem read APIs #2707

Open

Reduce common allocations across the codebase #2708

Merged

TheBlueMatt assigned G8XSU Nov 28, 2023

G8XSU mentioned this issue Dec 8, 2023

Stop cleaning monitor updates on new block connect #2779

Merged

tnull closed this as completed in #2779 Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop reading monitors when persisting in updating persister #2706

Stop reading monitors when persisting in updating persister #2706

TheBlueMatt commented Nov 4, 2023

domZippilli commented Nov 5, 2023 •

edited

tnull commented Nov 6, 2023

TheBlueMatt commented Nov 6, 2023

domZippilli commented Nov 6, 2023 •

edited

G8XSU commented Nov 6, 2023

tnull commented Nov 7, 2023

TheBlueMatt commented Nov 28, 2023

TheBlueMatt commented Nov 29, 2023

G8XSU commented Dec 1, 2023 •

edited

tnull commented Dec 1, 2023

TheBlueMatt commented Dec 6, 2023

G8XSU commented Dec 7, 2023 •

edited

G8XSU commented Dec 7, 2023

TheBlueMatt commented Dec 8, 2023

Stop reading monitors when persisting in updating persister #2706

Stop reading monitors when persisting in updating persister #2706

Comments

TheBlueMatt commented Nov 4, 2023

domZippilli commented Nov 5, 2023 • edited

tnull commented Nov 6, 2023

TheBlueMatt commented Nov 6, 2023

domZippilli commented Nov 6, 2023 • edited

G8XSU commented Nov 6, 2023

tnull commented Nov 7, 2023

TheBlueMatt commented Nov 28, 2023

TheBlueMatt commented Nov 29, 2023

G8XSU commented Dec 1, 2023 • edited

tnull commented Dec 1, 2023

TheBlueMatt commented Dec 6, 2023

G8XSU commented Dec 7, 2023 • edited

G8XSU commented Dec 7, 2023

TheBlueMatt commented Dec 8, 2023

domZippilli commented Nov 5, 2023 •

edited

domZippilli commented Nov 6, 2023 •

edited

G8XSU commented Dec 1, 2023 •

edited

G8XSU commented Dec 7, 2023 •

edited