Support intercepting onion messages for offline peers #2973

valentinewallace · 2024-03-27T16:13:31Z

As part of implementing async payments, we need a way to store onion message forwards on behalf of often-offline next-hop peers. This allows payers to send a held_htlc_available onion message to the mobile recipient, where the recipient's counterparty holds onto this onion message until the recipient comes back online.

Rather than storing onion messages internally in LDK, we generate events on OM interception and later signal to the user to re-inject these onion messages for forwarding.

This implements basic offline peer OM interception support. As a follow-up, we should support indicating to users when an intercepted OM has successfully been forwarded and is safe to delete, see #2950.

Partially addresses #2950, partially addresses #2298.

valentinewallace · 2024-03-27T16:15:12Z

lightning/src/onion_message/messenger.rs

@@ -921,6 +921,17 @@ where
 		}
 		msgs
 	}
+
+	fn enqueue_event(&self, event: Event) {
+		const MAX_EVENTS_BUFFER_SIZE: usize = (1 << 10) * 256;


Would like to get some thoughts on this event buffer size (currently 0.34MB). I can also add logging if an event is dropped, though I believe that should be rare.

We should define it as N / ::core::mem::size_of::<Event>(), but 1MiB seems fine? I don't think we're worried about a machine running with this that is super duper memory constrained.

Wouldn't that exclude the majority of the bytes in the onion packet, which are on the heap? Are we trying to limit the number of events or the total size of those events?

@valentinewallace Not sure if this was missed.

Sorry about that, I thought we were trying to limit the total size of the events. I'm not sure I'm understanding @TheBlueMatt's suggestion :(

Ah, no, @jkczyz is right, I was too used to serialization buffers.

valentinewallace · 2024-03-27T17:30:13Z

lightning/src/onion_message/messenger.rs

@@ -983,6 +1065,13 @@ where
 						e.get_mut().enqueue_message(onion_message);
 						log_trace!(logger, "Forwarding an onion message to peer {}", next_node_id);
 					},
+					_ if self.intercept_oms_for_offline_peers => {
+						self.enqueue_event(


Note: we could avoid introducing a lock order dependency between message_recipients and pending_events by pushing the pending events at the end of the method. I didn't bother but lmk if that seems worth it.

I know nobody has responded to your comment, but I think it should be fine as is. Any thoughts, @TheBlueMatt, @jkczyz?

Yeah, should be fine.

TheBlueMatt · 2024-03-27T20:51:51Z

Basically LGTM, but we should figure out how to pipe the events through for default users. Generally our users don't expect to actually hook up to events providers manually, they expect to do it via the BP, so maybe we need to take an Optional OnionMessenger to the BP?

valentinewallace · 2024-03-27T21:06:47Z

Basically LGTM, but we should figure out how to pipe the events through for default users. Generally our users don't expect to actually hook up to events providers manually, they expect to do it via the BP, so maybe we need to take an Optional OnionMessenger to the BP?

I believe that was already done when we started generating the ConnectionNeeded events, see: https://github.com/lightningdevkit/rust-lightning/blob/main/lightning-background-processor/src/lib.rs#L305

arik-so · 2024-03-28T02:28:59Z

Looks good to me, too!

shaavan

The PR looks great! 🚀
Just a few points I would love to discuss!

shaavan · 2024-04-06T13:26:43Z

lightning/src/onion_message/messenger.rs

+		match message_recipients.entry(*peer_node_id) {
+			hash_map::Entry::Occupied(mut e) if e.get().is_connected() => {
+				e.get_mut().enqueue_message(message);
+				Ok(())
+			},
+			_ => Err(SendError::InvalidFirstHop(*peer_node_id))


I was giving a thought on how we are handling this situation. It might be possible that the peer is known, but just not connected at the instant. Sending an InvalidFirstHop error might confuse users since it reads “The first hop is not a peer and …”. Maybe we could introduce a new Error type for cases like this?

Hmm, why is that confusing? At this stage we don't have any way of knowing whether the peer is known or not. I think "not a peer" captures that pretty well?

If we're not sure if the peer's known or not, InvalidFirstHop could work. But in this code snippet, we're checking if the peer's connected with if e.get().is_connected(), not if it's known. So, something like a PeerNotConnected error would make more sense for the error message. Still, I'm cool with whatever you think works best!

lightning/src/onion_message/messenger.rs

shaavan · 2024-04-06T13:29:25Z

lightning/src/onion_message/messenger.rs

+	/// LDK will not rate limit how many [`Event::OnionMessageForOfflinePeer`]s
+	/// are generated, so it is the caller's responsibility to limit how many


Technically, that's correct, but we do have a hard cap on the total number of events we can store (enforced by enqueue_events). Do you think it would be a good idea to update the docs to clarify that users can't set an infinitely high cap on the number of events they can receive?

The events queue should never get close to filling up, if it is we have bigger problems and the user's offer implementation is probably broken :) Not sure what you mean about users setting a cap on the number of events they can receive.

Well, that makes sense! The upper cap by enqueue_events is more like a fail-safe, than a hard cap, so it makes sense to not mention it here!

Not sure what you mean about users setting a cap on the number of events they can receive.

So I was trying to point to this line

"it is the caller's responsibility to limit how many..."

But considering the context, I don't think we need to make any changes here.
Thanks for clarifying it! ✌️

codecov-commenter · 2024-04-16T20:13:19Z

Codecov Report

Attention: Patch coverage is 85.89744% with 33 lines in your changes are missing coverage. Please review.

Project coverage is 91.37%. Comparing base (3b1b0a5) to head (a5ada64).
Report is 130 commits behind head on main.

Files	Patch %	Lines
lightning/src/events/mod.rs	28.12%	21 Missing and 2 partials ⚠️
lightning/src/onion_message/messenger.rs	93.02%	5 Missing and 1 partial ⚠️
lightning/src/onion_message/packet.rs	0.00%	3 Missing ⚠️
lightning/src/onion_message/functional_tests.rs	99.11%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2973      +/-   ##
==========================================
+ Coverage   89.35%   91.37%   +2.02%     
==========================================
  Files         117      118       +1     
  Lines       96335   109415   +13080     
  Branches    96335   109415   +13080     
==========================================
+ Hits        86080    99980   +13900     
+ Misses       8033     6972    -1061     
- Partials     2222     2463     +241

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lightning/src/onion_message/messenger.rs

jkczyz · 2024-04-19T20:06:37Z

lightning/src/onion_message/messenger.rs

+		}
+
+		match message_recipients.entry(*peer_node_id) {
+			hash_map::Entry::Occupied(mut e) if e.get().is_connected() => {


Would it be useful to hit the ConnectionNeeded path, too? Say a peer comes back online, the user gets the OnionMessagePeerConnected event, but the peer goes offline before forward_onion_message is called. When the user calls forward_onion_message, it errors and consumes the OnionMessage, which is now lost.

Yeah, it should be rare that a peer is disconnected by the time we go to forward, but it is currently possible. As mentioned in the PR description, planning to address this in follow-up per the race condition TODO here: #2950

Ah, thanks for pointing that out. How would the ping/pong solution prevent storing data for a non-existent peer? (e.g., when a malicious actor asks us to forward a message to a made up node id)

Not sure I understand your question, but it's up to the LDK user's event handling code to decide which peers it should store OM forwards for and which should be discarded. If the user decides an OM is worth storing and later calls forward_onion_message when the outbound peer comes back online, then we would go through the ping/pong flow to let them know when the OM is officially safe to delete.

Oh, I think I misunderstood the proposed solution. I thought LDK would manage whether to forget the message (in OnionMessenger) based on whether the ping/pong completed. But, IIUC, the code in this PR would remain as is and LDK would instead surface some event when a peer comes online and has completed a ping/pong. Is that correct?

So, in the scenario I gave, the user would simply not store the message in the first place?

the code in this PR would remain as is and LDK would instead surface some event when a peer comes online and has completed a ping/pong. Is that correct?
So, in the scenario I gave, the user would simply not store the message in the first place?

Yes to both!

jkczyz · 2024-04-19T20:20:23Z

lightning/src/onion_message/messenger.rs

@@ -921,6 +921,17 @@ where
 		}
 		msgs
 	}
+
+	fn enqueue_event(&self, event: Event) {
+		const MAX_EVENTS_BUFFER_SIZE: usize = (1 << 10) * 256;


Wouldn't that exclude the majority of the bytes in the onion packet, which are on the heap? Are we trying to limit the number of events or the total size of those events?

lightning/src/onion_message/messenger.rs

jkczyz · 2024-04-19T20:35:42Z

lightning/src/onion_message/messenger.rs

@@ -175,6 +175,8 @@ where
 	message_router: MR,
 	offers_handler: OMH,
 	custom_handler: CMH,
+	intercept_oms_for_offline_peers: bool,
+	pending_events: Mutex<Vec<Event>>,


Why use a separate queue if these aren't persisted? If we instead use message_recipients for buffering -- with a new variant -- then we can re-use outbound_buffer_full and generate events in the same manner as ConnectionNeeded. Otherwise, we are left with two different systems for buffering messages and generating events.

Hmm, I believe with this approach an attacker may be able to fragment our heap by continually sending bogus forwards to a new fake peer each time (where each fake forward results in a VecDeque allocation). I think my other reasoning is that because the events generated may all be bogus, they shouldn't be allowed to count towards outbound_buffer_full, since currently all of that traffic is known to be more legitimate (either an outbound OM initiated by the LDK user or a forward to a peer that is connected).

Hmm, I believe with this approach an attacker may be able to fragment our heap by continually sending bogus forwards to a new fake peer each time (where each fake forward results in a VecDeque allocation).

Yeah, though this seems to be an issue already with onion messages, FWIW. Processing an OnionMessage involves allocating a Vec<u8> for Packet::hop_data.

I think my other reasoning is that because the events generated may all be bogus, they shouldn't be allowed to count towards outbound_buffer_full, since currently all of that traffic is known to be more legitimate (either an outbound OM initiated by the LDK user or a forward to a peer that is connected).

Right, though this makes me wonder if the user should specify which peers it supports forwarding to. See other comment about the race condition.

Discussed offline. Yeah, I suppose specifying which peers support forwarding adds a bit of user to the burden. They are already doing this when processing events, so no sense forcing them to duplicate it here an risk inconsistencies.

valentinewallace · 2024-04-22T17:19:58Z

Addressed feedback.

lightning/src/onion_message/messenger.rs

lightning/src/onion_message/functional_tests.rs

jkczyz · 2024-04-30T20:33:43Z

lightning/src/onion_message/messenger.rs

@@ -921,6 +921,17 @@ where
 		}
 		msgs
 	}
+
+	fn enqueue_event(&self, event: Event) {
+		const MAX_EVENTS_BUFFER_SIZE: usize = (1 << 10) * 256;


@valentinewallace Not sure if this was missed.

lightning/src/events/mod.rs

valentinewallace · 2024-04-30T21:41:13Z

Thanks @jkczyz, addressed feedback.

lightning/src/events/mod.rs

jkczyz · 2024-05-08T19:59:21Z

lightning/src/onion_message/messenger.rs

@@ -983,6 +1065,13 @@ where
 						e.get_mut().enqueue_message(onion_message);
 						log_trace!(logger, "Forwarding an onion message to peer {}", next_node_id);
 					},
+					_ if self.intercept_oms_for_offline_peers => {
+						self.enqueue_event(


Yeah, should be fine.

jkczyz · 2024-05-08T20:05:26Z

lightning/src/onion_message/messenger.rs

+				self.enqueue_event(
+					Event::OnionMessagePeerConnected { peer_node_id: *their_node_id }
+				);


Seems like it could be a problem if we enqueue a bunch of Event::OnionMessageForOfflinePeer but end up dropping an Event::OnionMessagePeerConnected for the corresponding peer because of a full buffer. Maybe events for a different peer cause the buffer to fill up, for instance. Should we unconditionally enqueue the latter type of events?

As mentioned #2973 (comment), the events queue should never get close to filling up. So I'm not sure it's worth adding special handling for event dropping cases?

arik-so

Looks good to me overall! I was about to ask a question about peers disconnecting prior to the message getting forwarded, but realized you already addressed it: #2973 (comment)

Should be good to squash imo.

jkczyz · 2024-05-09T19:27:12Z

Feel free to squash

Docs will be added in upcoming commits.

Useful if we are in the mode of interception OMs for offline peers, so users know when to re-inject intercepted OMs.

Will be used in the next commit when another config parameter is added.

Previously we derived Debug, but that caused a lot of unreadable encrypted bytes to be printed.

valentinewallace · 2024-05-09T19:37:01Z

Squashed.

jkczyz · 2024-05-09T19:48:06Z

lightning/src/onion_message/messenger.rs

+					_ if self.intercept_messages_for_offline_peers => {
+						self.pending_events.lock().unwrap().push(
+							Event::OnionMessageIntercepted {
+								peer_node_id: next_node_id, message: onion_message
+							}
+						);
+					},


Could we log_trace here like the other arms?

I omitted it because of our past discussions of avoiding adding logs that would allow someone to adversarially blow up our logs on disk (there are some pre-existing ones in this method, ofc). So I'm not sure it's a good idea?

Hmm... I thought that was only applicable for higher-level logging like log_error. Will let @TheBlueMatt provide guidance.

Sure, happy to add it in follow-up too, no reason to hold the PR up.

jkczyz · 2024-05-09T19:53:00Z

lightning/src/events/mod.rs

+			&Event::OnionMessageIntercepted { ref peer_node_id, ref message } => {
+				37u8.write(writer)?;
+				write_tlv_fields!(writer, {
+					(0, peer_node_id, required),
+					(2, message, required),
+				});
+			},
+			&Event::OnionMessagePeerConnected { ref peer_node_id } => {
+				39u8.write(writer)?;
+				write_tlv_fields!(writer, {
+					(0, peer_node_id, required),
+				});
+			}


IIUC, these are never persisted by LDK, but this lets us check the buffer size. Clients could persist them, I suppose, but would probably want to persists the parts rather than the enum.

arik-so

🚢

valentinewallace commented Mar 27, 2024

View reviewed changes

jkczyz self-requested a review March 28, 2024 17:24

shaavan reviewed Apr 6, 2024

View reviewed changes

TheBlueMatt mentioned this pull request Apr 15, 2024

Make event handling fallible #2995

Draft

1 task

valentinewallace force-pushed the 2024-03-om-mailbox branch from 6a3e37c to 343ca62 Compare April 16, 2024 20:12

valentinewallace force-pushed the 2024-03-om-mailbox branch from 343ca62 to 313b89d Compare April 16, 2024 21:55

arik-so reviewed Apr 18, 2024

View reviewed changes

lightning/src/onion_message/messenger.rs Show resolved Hide resolved

lightning/src/onion_message/messenger.rs Show resolved Hide resolved

jkczyz reviewed Apr 19, 2024

View reviewed changes

valentinewallace force-pushed the 2024-03-om-mailbox branch from 313b89d to 925b529 Compare April 22, 2024 16:52

valentinewallace requested review from arik-so and jkczyz April 22, 2024 17:20

jkczyz reviewed Apr 30, 2024

View reviewed changes

valentinewallace force-pushed the 2024-03-om-mailbox branch from 925b529 to 1b40d56 Compare April 30, 2024 21:39

jkczyz reviewed May 8, 2024

View reviewed changes

valentinewallace force-pushed the 2024-03-om-mailbox branch from 1b40d56 to bcb55fe Compare May 9, 2024 14:43

arik-so reviewed May 9, 2024

View reviewed changes

valentinewallace added 6 commits May 9, 2024 15:29

Support generating events when an OM for an offline peer is received.

e8f7fe1

Docs will be added in upcoming commits.

OnionMessenger: support generating peer connection events.

1c28cc0

Useful if we are in the mode of interception OMs for offline peers, so users know when to re-inject intercepted OMs.

Support forwarding prebuilt onion messages in OnionMessenger.

7213458

Refactor MessengerNode test util construction to take config.

1fc8f11

Will be used in the next commit when another config parameter is added.

Fix outdated comment in onion message functional test.

4c7ecaa

Test offline peer onion message interception.

be31e63

valentinewallace added 3 commits May 9, 2024 15:29

Manually implement Debug for onion message packets.

6613f1f

Previously we derived Debug, but that caused a lot of unreadable encrypted bytes to be printed.

Limit OnionMessenger event buffer size.

edc86e3

Fill in top-level docs for onion message offline peer interception.

a5ada64

valentinewallace force-pushed the 2024-03-om-mailbox branch from bcb55fe to a5ada64 Compare May 9, 2024 19:36

jkczyz reviewed May 9, 2024

View reviewed changes

jkczyz approved these changes May 9, 2024

View reviewed changes

arik-so approved these changes May 9, 2024

View reviewed changes

arik-so merged commit 8f1dc54 into lightningdevkit:main May 9, 2024
16 checks passed

jkczyz mentioned this pull request May 14, 2024

[0.0.123-bindings] Bindings changes for 0.0.123 #3062

Merged

		/// LDK will not rate limit how many [`Event::OnionMessageForOfflinePeer`]s
		/// are generated, so it is the caller's responsibility to limit how many

Support intercepting onion messages for offline peers #2973

Support intercepting onion messages for offline peers #2973

Conversation

valentinewallace commented Mar 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt commented Mar 27, 2024

valentinewallace commented Mar 27, 2024

arik-so commented Mar 28, 2024

shaavan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Apr 16, 2024 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valentinewallace commented Apr 22, 2024

Choose a reason for hiding this comment

valentinewallace commented Apr 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arik-so left a comment

Choose a reason for hiding this comment

jkczyz commented May 9, 2024

valentinewallace commented May 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arik-so left a comment

Choose a reason for hiding this comment

codecov-commenter commented Apr 16, 2024 •

edited