Make timer_tick_occurred a second long timer #3065

shaavan · 2024-05-14T13:12:55Z

This PR updates timer_tick_occurred to operate on a second-long timer, providing finer timing control.

Reasoning:

This PR is the prerequisite of Introduce Retry InvoiceRequest Flow #3010, which introduces the ability to retry the InvoiceRequest message in a finite time if we have not received the corresponding BOLT12 Invoice back.
The retries need to be faster than a standard minute-long timer.
The discussion was between introducing a new counter vs changing timer_tick duration (see comment). This PR is an attempt at the second approach.

…ogress 1. This function implicitly assumed expiry after 1 timer_tick_occurred. 2. Introduce an explicit timer tick so that the TIMER_LIMIT can be set to an arbitrary number. 3. This addition is utilized in the following commit.

codecov-commenter · 2024-05-14T13:21:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.03%. Comparing base (1890e80) to head (9a344b6).
Report is 11 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3065      +/-   ##
==========================================
+ Coverage   89.82%   90.03%   +0.20%     
==========================================
  Files         116      116              
  Lines       96466    98074    +1608     
  Branches    96466    98074    +1608     
==========================================
+ Hits        86655    88305    +1650     
+ Misses       7264     7244      -20     
+ Partials     2547     2525      -22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jkczyz · 2024-05-14T14:12:51Z

This enhancement enhances the system's flexibility and accuracy in timing operations.

This doesn't really say anything. Please clearly state the practical reason for the change.

TheBlueMatt · 2024-05-14T19:36:54Z

lightning/src/ln/channelmanager.rs

@@ -2260,15 +2260,15 @@ const CHECK_CLTV_EXPIRY_SANITY: u32 = MIN_CLTV_EXPIRY_DELTA as u32 - LATENCY_GRA
 const CHECK_CLTV_EXPIRY_SANITY_2: u32 = MIN_CLTV_EXPIRY_DELTA as u32 - LATENCY_GRACE_PERIOD_BLOCKS - 2*CLTV_CLAIM_BUFFER;

 /// The number of ticks of [`ChannelManager::timer_tick_occurred`] until expiry of incomplete MPPs
-pub(crate) const MPP_TIMEOUT_TICKS: u8 = 3;
+pub(crate) const MPP_TIMEOUT_TICKS: u8 = 3 * 60;


Lets define a constant for the ticks per minute or whatever so that we can change it in the future.

wpaulino · 2024-05-14T20:11:46Z

What functionality do we plan to add that will need sub-minute ticks? Also, how do we ensure users migrate to the new duration on their end? This is basically a "silent" breaking change to the API.

TheBlueMatt · 2024-05-14T20:16:11Z

The motivation here is to retry invoice_requests. For non-mobile in a world where we can route onion messages, its probably not useful because we can route onion messages along many paths and retrying is probably not useful. On mobile its kinda useful, especially before we can route, where we may have lost connectivity and we just need to retry a second later. Sadly, there's not really a great way to do this sans-std.

TheBlueMatt · 2024-05-14T20:17:26Z

The alternative options here are basically (a) do it during message handling using std to detect time passing, (b) do it during message handling without time passing, (c) this (maybe with renaming the method to break the API). I'm a bit split between (b) and (c), honestly.

wpaulino · 2024-05-14T23:12:19Z

I see, (b) is probably still not reliable enough for our needs though if we're not receiving messages (pings don't make it through the ChannelManager)? I guess (c) is fine, but have we explored doing this in OnionMessageHandler::release_pending_messages instead?

1. Previously, timer_tick_occurred was designed to be called every minute, leading to TIMER_LIMITS set in minutes. 2. However, this restricted the ability to set timer_tick length in smaller durations. 3. This commit updates timer_tick_occurred to be called every second instead of every minute. 4. All TIMER_LIMITS are adjusted accordingly to reflect this change. 5. Additionally, a test is updated to ensure successful compilation post-update.

shaavan · 2024-05-15T12:06:27Z

Updated from pr3065.01 to pr3065.02 (diff):
Addressed @TheBlueMatt comment:

Changes:

Introduce a new constant TICKS_PER_MINUTE and use it as the multiplying factor for other constants. This will allow us to keep track of the multiplying factor and make maintenance easy, in case we change this later.

@TheBlueMatt
I think renaming the function to cause an explicit API failure will be a good idea. However, I haven't implemented it yet. Would love to make sure that it is the right move before I move forward with the name change.
Thanks for the discussion, suggestions, and pointers, everyone.

jkczyz · 2024-05-15T13:26:36Z

I guess (c) is fine, but have we explored doing this in OnionMessageHandler::release_pending_messages instead?

Hmm... that may work as ChannelManager can query its PendingOutboundPayments for anything still awaiting an invoice that hasn't timed out. Though when initially sending the message, it will already be in the AwaitingInvoice state. Maybe we only retry if release_pending_messages would return an empty Vec as a simple way of not retrying before the initial request is sent? Even so, would there be sufficient enough time passing between calls to next_onion_message_for_peer before retrying?

wpaulino · 2024-05-15T17:45:54Z

It gets called in PeerManager::do_attempt_write_data, which can happen every time a ping needs to be sent out (currently every 10 secs via PeerManager::timer_tick_occurred), and on every call to PeerManager::process_events and PeerManager::write_buffer_space_avail. So, it may actually get called too often, but some simple rate limiting per retry request in release_pending_messages would help.

jkczyz · 2024-05-16T16:43:01Z

It gets called in PeerManager::do_attempt_write_data, which can happen every time a ping needs to be sent out (currently every 10 secs via PeerManager::timer_tick_occurred), and on every call to PeerManager::process_events and PeerManager::write_buffer_space_avail. So, it may actually get called too often, but some simple rate limiting per retry request in release_pending_messages would help.

Yeah... though I'm not sure how we'd do that rate limiting. Would an exponential backoff be fine? Also note that OffersMessageHandler::release_pending_messages is called each time next_onion_message_for_peer is called. So if you have a lot of peers, it will be called more frequently than if you have few. I guess ChannelManager knows how many peers it has, so it could take that into account?

We also may want to consider using different reply paths on retries (and maybe the initial sends for that matter) in case the reply path used is the issue.

wpaulino · 2024-05-16T20:57:47Z

Another alternative could be to extend the onion message handler to signal when we read a message from the peer that retries should be queued.

TheBlueMatt · 2024-05-17T15:50:11Z

ISTM if we want to avoid this we'll want a new method on ChannelMessageHandler indicating that we've received any message from any peer for any handler, which can then trigger the retry here. This avoids any kind of weird API nonsense where the OnionMessenger is getting the same and then just passing that info to the ChannelManager via the message fetching (to get a retry). This would need to be handled delicately to avoid performance impact, specifically we'll want to avoid a Mutex here and gate the message retrying on a relaxed AtomicBool read.

The other option, which @wpaulino suggested offline, is to hook deeply into the PeerManager, provide it some kind of flag saying "hey, I'm sending an important message here to peer A, please send A a ping immediately on the same socket and let me know if they responded, indicating they received the message cause they responded to a later one", then use that indicator to gate the retry. I'm not sure its worth that level of complexity, though.

shaavan · 2024-05-24T18:34:45Z

Hi @TheBlueMatt,
I’ve been diving into the approach you suggested and I have a few questions I could use your guidance on:

When retrying, should we be going through all the PaymentPending messages, or is there a way we would be prioritizing which ones to retry first?
How can we make sure we’re not retrying too frequently?
This method seems great for situations where we switch from offline to online (like coming out of the subway), but do you think it’s as effective for other scenarios, like when the receiver is the one going offline?

Thanks a lot for your guidance!

TheBlueMatt · 2024-05-28T20:29:15Z

When retrying, should we be going through all the PaymentPending messages, or is there a way we would be prioritizing which ones to retry first?

I don't see why we wouldn't use the normal flow - it tends to get drained very quickly.

How can we make sure we’re not retrying too frequently?

We should probably only retry once.

This method seems great for situations where we switch from offline to online (like coming out of the subway), but do you think it’s as effective for other scenarios, like when the receiver is the one going offline?

Indeed, it won't help in those cases, however I'm not entirely clear on what we can do for those cases. If the recipient is often-offline, they probably shouldn't be using BOLT12 to receive (pre-async-payments or unless they have a fairly robust notification/store+forward mechanism), and if they are at least somewhat robust trying again in one second seems unlikely to help and more than trying again via a separate path or some small time delta later.

shaavan mentioned this pull request May 14, 2024

Introduce Retry InvoiceRequest Flow #3010

Open

TheBlueMatt reviewed May 14, 2024

View reviewed changes

shaavan force-pushed the issue2836.0 branch from a9dce9c to 9a344b6 Compare May 15, 2024 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make timer_tick_occurred a second long timer #3065

Make timer_tick_occurred a second long timer #3065

shaavan commented May 14, 2024 •

edited

codecov-commenter commented May 14, 2024 •

edited

jkczyz commented May 14, 2024

TheBlueMatt May 14, 2024

wpaulino commented May 14, 2024

TheBlueMatt commented May 14, 2024

TheBlueMatt commented May 14, 2024

wpaulino commented May 14, 2024

shaavan commented May 15, 2024

jkczyz commented May 15, 2024

wpaulino commented May 15, 2024

jkczyz commented May 16, 2024

wpaulino commented May 16, 2024

TheBlueMatt commented May 17, 2024

shaavan commented May 24, 2024

TheBlueMatt commented May 28, 2024

Make timer_tick_occurred a second long timer #3065

Are you sure you want to change the base?

Make timer_tick_occurred a second long timer #3065

Conversation

shaavan commented May 14, 2024 • edited

codecov-commenter commented May 14, 2024 • edited

Codecov Report

jkczyz commented May 14, 2024

TheBlueMatt May 14, 2024

Choose a reason for hiding this comment

wpaulino commented May 14, 2024

TheBlueMatt commented May 14, 2024

TheBlueMatt commented May 14, 2024

wpaulino commented May 14, 2024

shaavan commented May 15, 2024

jkczyz commented May 15, 2024

wpaulino commented May 15, 2024

jkczyz commented May 16, 2024

wpaulino commented May 16, 2024

TheBlueMatt commented May 17, 2024

shaavan commented May 24, 2024

TheBlueMatt commented May 28, 2024

shaavan commented May 14, 2024 •

edited

codecov-commenter commented May 14, 2024 •

edited