Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure we don't ever retry a payment along a just-failed path #1252

Conversation

TheBlueMatt
Copy link
Collaborator

If we try to pay a mobile client behind an LSP, its not strange for
the singular last-hop hint to fail with a Temporary Channel Failure
(indicating the mobile app is not currently open and connected to
the LSP). In this case, we will penalize the last-hop channel but
try again along the same path anyway, because we have no other
path. This changes the retryer to simply refuse to do so, failing
the payment instead.

Fixes #1241.

If we try to pay a mobile client behind an LSP, its not strange for
the singular last-hop hint to fail with a Temporary Channel Failure
(indicating the mobile app is not currently open and connected to
the LSP). In this case, we will penalize the last-hop channel but
try again along the same path anyway, because we have no other
path. This changes the retryer to simply refuse to do so, failing
the payment instead.

Fixes lightningdevkit#1241.
@TheBlueMatt TheBlueMatt added this to the 0.1 milestone Jan 18, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jan 18, 2022

Codecov Report

Merging #1252 (51d9c54) into main (7b6a7bb) will increase coverage by 0.01%.
The diff coverage is 93.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1252      +/-   ##
==========================================
+ Coverage   90.41%   90.43%   +0.01%     
==========================================
  Files          70       70              
  Lines       38087    38117      +30     
==========================================
+ Hits        34437    34471      +34     
+ Misses       3650     3646       -4     
Impacted Files Coverage Δ
lightning-invoice/src/payment.rs 92.96% <93.18%> (+0.20%) ⬆️
lightning/src/ln/functional_tests.rs 97.36% <0.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b6a7bb...51d9c54. Read the comment docs.

fn retry_payment(
&self, payment_id: PaymentId, payment_hash: PaymentHash, params: &RouteParameters
fn retry_payment(&self, payment_id: PaymentId, payment_hash: PaymentHash,
params: &RouteParameters, avoid_scid: Option<u64>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would be cleaner if part of the RouteParameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And avoided directly in find_route, that is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean passing it through to the router itself and asking it to completely avoid an SCID? That feels like its better done via the Score implementer, which I guess is ultimately the problem here - that the Scorer in use int he sample (ie our default one) doesn't strictly refuse to pay over a channel that just failed. That said, I do feel like the InvoicePayer should be robust against a braindead scorer, whether its our own or a user-provided one, so it feels nice to have it here too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but also don't put the onus on the event handler to set it and pass it to find_route. Simply have the ChannelManager set it when creating the PaymentPathFailed event. Then it is completely transparent to anyone handling the event.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I do feel like "avoid this channel" is really more of a Score thing than a router thing - we have a whole interface for it, it seems annoying to duplicate that interface here. Its not a lot of code change, but still awkward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... but this use case is (a) ephemeral as it only applies to a specific payment -- lower payment amounts may be successful for another payment or even the failed path if further split on retry -- and (b) being handled by the caller not the scorer in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right in this instance its strangely dual-caller-scorer handling it - the scorer de-prioritizes and the caller handles the "oh, this went wrong, we cant do this, scorer or router are busted" case. I guess two more practical questions on behavior that may inform this more:

a) do we want to track this information across payment attempts - if there's two available last-hop hints do we want to just go back and forth between them until we run out of attempts,
b) do we care about avoiding the path in the router or are we okay with failing if we find the same path again (ie if the scorer is broken or doesn't learn, are we okay just failing the payment vs making sure the router picks another path)?

Both imply that the data should be in the RouteParameters, I think, if we care about either (I'm not sure we do), but (a) implies it should be in the Payee (to be renamed) not RouteParameters, even.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this more offline, sounds like we want to/should go with moving the logic as described here, will do.

@TheBlueMatt
Copy link
Collaborator Author

I'm gonna put this on ice until #1227 lands as I don't really want to touch the router until then as its all a bit in-flux.

@TheBlueMatt
Copy link
Collaborator Author

Supersceded by #1600

@TheBlueMatt TheBlueMatt closed this Jul 6, 2022
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jul 6, 2022
When an HTLC fails, we currently rely on the scorer learning the
failed channel and assigning an infinite (`u64::max_value()`)
penalty to the channel so as to avoid retrying over the exact same
path (if there's only one available path). This is common when
trying to pay a mobile client behind an LSP if the mobile client is
currently offline.

This leads to the scorer being overly conservative in some cases -
returning `u64::max_value()` when a given path hasn't been tried
for a given payment may not be the best decision, even if that
channel failed 50 minutes ago.

By tracking channels which failed on a payment level and explicitly
refusing to route over them we can relax the requirements on the
scorer, allowing it to make different decisions on how to treat
channels that failed relatively recently without causing payments
to retry the same path forever.

Closes lightningdevkit#1241, superseding lightningdevkit#1252.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jul 7, 2022
When an HTLC fails, we currently rely on the scorer learning the
failed channel and assigning an infinite (`u64::max_value()`)
penalty to the channel so as to avoid retrying over the exact same
path (if there's only one available path). This is common when
trying to pay a mobile client behind an LSP if the mobile client is
currently offline.

This leads to the scorer being overly conservative in some cases -
returning `u64::max_value()` when a given path hasn't been tried
for a given payment may not be the best decision, even if that
channel failed 50 minutes ago.

By tracking channels which failed on a payment level and explicitly
refusing to route over them we can relax the requirements on the
scorer, allowing it to make different decisions on how to treat
channels that failed relatively recently without causing payments
to retry the same path forever.

Closes lightningdevkit#1241, superseding lightningdevkit#1252.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jul 13, 2022
When an HTLC fails, we currently rely on the scorer learning the
failed channel and assigning an infinite (`u64::max_value()`)
penalty to the channel so as to avoid retrying over the exact same
path (if there's only one available path). This is common when
trying to pay a mobile client behind an LSP if the mobile client is
currently offline.

This leads to the scorer being overly conservative in some cases -
returning `u64::max_value()` when a given path hasn't been tried
for a given payment may not be the best decision, even if that
channel failed 50 minutes ago.

By tracking channels which failed on a payment level and explicitly
refusing to route over them we can relax the requirements on the
scorer, allowing it to make different decisions on how to treat
channels that failed relatively recently without causing payments
to retry the same path forever.

Closes lightningdevkit#1241, superseding lightningdevkit#1252.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jul 14, 2022
When an HTLC fails, we currently rely on the scorer learning the
failed channel and assigning an infinite (`u64::max_value()`)
penalty to the channel so as to avoid retrying over the exact same
path (if there's only one available path). This is common when
trying to pay a mobile client behind an LSP if the mobile client is
currently offline.

This leads to the scorer being overly conservative in some cases -
returning `u64::max_value()` when a given path hasn't been tried
for a given payment may not be the best decision, even if that
channel failed 50 minutes ago.

By tracking channels which failed on a payment part level and
explicitly refusing to route over them we can relax the
requirements on the scorer, allowing it to make different decisions
on how to treat channels that failed relatively recently without
causing payments to retry the same path forever.

This does have the drawback that it could allow two separate part
of a payment to traverse the same path even though that path just
failed, however this should only occur if the payment is going to
fail anyway, at least as long as the scorer is properly learning.

Closes lightningdevkit#1241, superseding lightningdevkit#1252.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jul 14, 2022
When an HTLC fails, we currently rely on the scorer learning the
failed channel and assigning an infinite (`u64::max_value()`)
penalty to the channel so as to avoid retrying over the exact same
path (if there's only one available path). This is common when
trying to pay a mobile client behind an LSP if the mobile client is
currently offline.

This leads to the scorer being overly conservative in some cases -
returning `u64::max_value()` when a given path hasn't been tried
for a given payment may not be the best decision, even if that
channel failed 50 minutes ago.

By tracking channels which failed on a payment part level and
explicitly refusing to route over them we can relax the
requirements on the scorer, allowing it to make different decisions
on how to treat channels that failed relatively recently without
causing payments to retry the same path forever.

This does have the drawback that it could allow two separate part
of a payment to traverse the same path even though that path just
failed, however this should only occur if the payment is going to
fail anyway, at least as long as the scorer is properly learning.

Closes lightningdevkit#1241, superseding lightningdevkit#1252.
G8XSU pushed a commit to G8XSU/rust-lightning that referenced this pull request Jul 18, 2022
When an HTLC fails, we currently rely on the scorer learning the
failed channel and assigning an infinite (`u64::max_value()`)
penalty to the channel so as to avoid retrying over the exact same
path (if there's only one available path). This is common when
trying to pay a mobile client behind an LSP if the mobile client is
currently offline.

This leads to the scorer being overly conservative in some cases -
returning `u64::max_value()` when a given path hasn't been tried
for a given payment may not be the best decision, even if that
channel failed 50 minutes ago.

By tracking channels which failed on a payment part level and
explicitly refusing to route over them we can relax the
requirements on the scorer, allowing it to make different decisions
on how to treat channels that failed relatively recently without
causing payments to retry the same path forever.

This does have the drawback that it could allow two separate part
of a payment to traverse the same path even though that path just
failed, however this should only occur if the payment is going to
fail anyway, at least as long as the scorer is properly learning.

Closes lightningdevkit#1241, superseding lightningdevkit#1252.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle only-last-hop temp failure better
3 participants