Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid reusing just-failed channels in the router, making the impossibility penalty configurable #1600

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

@TheBlueMatt TheBlueMatt commented Jul 6, 2022

Some users want to keep a very long scorer data half-life to maintain knowledge for a longer period of time. Its somewhat unclear if that's optimal, but it is clear that it can cause the scorer to refuse to build a route when the only available channel failed most recently within the halflife. This is rather unexpected behavior, and the fact that the scorer must behave this way to avoid #1241 is very annoying in that it prevents fixing this.

Here we move the avoidance of just-failed channels into the router itself, allowing us to make the impossibility penalty configurable, which we do as well.

Closes #1241, superseding #1252.

@codecov-commenter
Copy link

codecov-commenter commented Jul 6, 2022

Codecov Report

Merging #1600 (a863778) into main (4e5f74a) will increase coverage by 0.14%.
The diff coverage is 87.50%.

❗ Current head a863778 differs from pull request most recent head 5bff5f9. Consider uploading reports for the commit 5bff5f9 to get more accurate results

@@            Coverage Diff             @@
##             main    #1600      +/-   ##
==========================================
+ Coverage   90.86%   91.00%   +0.14%     
==========================================
  Files          80       80              
  Lines       44437    45502    +1065     
  Branches    44437    45502    +1065     
==========================================
+ Hits        40377    41409    +1032     
- Misses       4060     4093      +33     
Impacted Files Coverage Δ
lightning/src/routing/router.rs 92.45% <77.77%> (+0.06%) ⬆️
lightning/src/routing/scoring.rs 97.57% <95.00%> (+1.50%) ⬆️
lightning/src/ln/channelmanager.rs 84.90% <100.00%> (+0.02%) ⬆️
lightning/src/ln/functional_test_utils.rs 95.24% <100.00%> (+<0.01%) ⬆️
lightning/src/chain/onchaintx.rs 93.98% <0.00%> (-0.93%) ⬇️
lightning/src/util/events.rs 39.25% <0.00%> (-0.29%) ⬇️
lightning/src/ln/functional_tests.rs 96.95% <0.00%> (-0.17%) ⬇️
lightning/src/ln/channel.rs 88.77% <0.00%> (+0.02%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e5f74a...5bff5f9. Read the comment docs.

@tnull tnull self-requested a review July 7, 2022 07:07
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, generally looks good, just some comments.

lightning/src/routing/router.rs Outdated Show resolved Hide resolved
lightning/src/routing/router.rs Outdated Show resolved Hide resolved
lightning/src/routing/router.rs Outdated Show resolved Hide resolved
lightning/src/routing/router.rs Show resolved Hide resolved
lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved
lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved
lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved
lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved
@TheBlueMatt TheBlueMatt force-pushed the 2022-07-explicit-avoid-retries branch from 848afc3 to e6a114c Compare July 7, 2022 16:06
@TheBlueMatt
Copy link
Collaborator Author

Rebased and addressed feedback.

tnull
tnull previously approved these changes Jul 11, 2022
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lightning/src/routing/scoring.rs Show resolved Hide resolved
lightning/src/routing/scoring.rs Show resolved Hide resolved
Comment on lines 1004 to 1056
if contributes_sufficient_value && doesnt_exceed_max_path_length &&
doesnt_exceed_cltv_delta_limit && may_overpay_to_meet_path_minimum_msat {
doesnt_exceed_cltv_delta_limit && !payment_failed_on_this_channel &&
may_overpay_to_meet_path_minimum_msat
{
hit_minimum_limit = true;
} else if contributes_sufficient_value && doesnt_exceed_max_path_length &&
doesnt_exceed_cltv_delta_limit && over_path_minimum_msat {
doesnt_exceed_cltv_delta_limit && over_path_minimum_msat &&
!payment_failed_on_this_channel
{
// Note that low contribution here (limited by available_liquidity_msat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking payment_failed_on_this_channel twice, consider having a leading if expression.

if payment_failed_on_this_channel {
} else if /* ... */ {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good idea, I moved all the existing if conditions to that!

lightning/src/routing/router.rs Show resolved Hide resolved
lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved
Comment on lines +629 to +646
// Equivalent to hitting the else clause below with the amount equal to the effective
// capacity and without any certainty on the liquidity upper bound, plus the
// impossibility penalty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be updated now since it's not only when equal to the effective capacity. Seems like the earlier change that added this now removed logic is what caused calculates_log10_without_overflowing_u64_max_value to not exercise the correct code path as mentioned earlier on that test (i.e., it doesn't hit the negative_log10_times_2048 case below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the comment differently - I read the comment to say "the calculation we're doing here is equivalent to..." which is still true, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, you're right!

Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely looks good but one comment.

Comment on lines +629 to +646
// Equivalent to hitting the else clause below with the amount equal to the effective
// capacity and without any certainty on the liquidity upper bound, plus the
// impossibility penalty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, you're right!

Comment on lines +632 to +649
let negative_log10_times_2048 = NEGATIVE_LOG10_UPPER_BOUND * 2048;
self.combined_penalty_msat(amount_msat, negative_log10_times_2048, params)
.saturating_add(params.considered_impossible_penalty_msat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to do the max of these two rather than adding? Is the idea that we want this to be >= anything given in the else clause?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's the idea. I suppose we could do a max. Originally I had it not adding the combined_penalty call at all but that seemed to brittle to deal with so switched to adding. I don't honestly have a strong opinion between adding and max, either way the docs can tell users what's going on, but for max i kinda worry users will acidentally set it too low and get no penalty here, which seems strange?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would also need to set the other params lower, though, since max would select the combined penalty over the considered_impossible_penalty_msat if the latter were too low. So it's all kinda relative. Don't feel too strongly either though note that max may make debugging a little easier as otherwise you may need to mentally subtract some combined penalty if it is not obvious.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sure, I just meant it'd be easy for a user to look at the field and think "okay, let me pick something a bit higher than the liquidity offset, forget to multiply by 2 or whatever, and end up with a penalty equal to the liquidity penalty, which seems wrong? I dunno, I'm happy to mentally convert when we're debugging. Unless you feel strongly I'd suggest we leave it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sure, I just meant it'd be easy for a user to look at the field and think "okay, let me pick something a bit higher than the liquidity offset, forget to multiply by 2 or whatever, and end up with a penalty equal to the liquidity penalty, which seems wrong?

Plus base and amount penalty, FWIW.

I dunno, I'm happy to mentally convert when we're debugging. Unless you feel strongly I'd suggest we leave it.

Sure we can leave it. Though one thing I just realized is that either way now the penalty will be variable across channels depending on the amount. Maybe less so when using max but maybe that's an argument in favor of adding. That way if left to choose only channels exceeding the maximum liquidity, we'd prefer ones that would otherwise be penalized less.

jkczyz
jkczyz previously approved these changes Jul 13, 2022
@TheBlueMatt
Copy link
Collaborator Author

Squashed without further changes.

jkczyz
jkczyz previously approved these changes Jul 13, 2022
lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved
jkczyz
jkczyz previously approved these changes Jul 13, 2022
@TheBlueMatt
Copy link
Collaborator Author

Squashed yet again. Change since yesterday was:

diff --git a/lightning/src/routing/scoring.rs b/lightning/src/routing/scoring.rs
index 6aa19abaa..9fa62d83f 100644
--- a/lightning/src/routing/scoring.rs
+++ b/lightning/src/routing/scoring.rs
@@ -397,7 +397,8 @@ pub struct ProbabilisticScoringParameters {
 	/// current estimate of the channel's available liquidity.
 	///
-	/// Note that in this case the [`liquidity_penalty_multiplier_msat`] and
-	/// [`amount_penalty_multiplier_msat`]-based penalties are still included in the overall
-	/// penalty.
+	/// Note that in this case all other penalties, including the
+	/// [`liquidity_penalty_multiplier_msat`] and [`amount_penalty_multiplier_msat`]-based
+	/// penalties, as well as the [`base_penalty_msat`] and the [`anti_probing_penalty_msat`], if
+	/// applicable, are still included in the overall penalty.
 	///
 	/// If you wish to avoid creating paths with such channels entirely, setting this to a value of
@@ -408,4 +409,6 @@ pub struct ProbabilisticScoringParameters {
 	/// [`liquidity_penalty_multiplier_msat`]: Self::liquidity_penalty_multiplier_msat
 	/// [`amount_penalty_multiplier_msat`]: Self::amount_penalty_multiplier_msat
+	/// [`base_penalty_msat`]: Self::base_penalty_msat
+	/// [`anti_probing_penalty_msat`]: Self::anti_probing_penalty_msat
 	pub considered_impossible_penalty_msat: u64,
 }

jkczyz
jkczyz previously approved these changes Jul 13, 2022
tnull
tnull previously approved these changes Jul 14, 2022
@jkczyz
Copy link
Contributor

jkczyz commented Jul 14, 2022

66ca68a mentions failures on a payment-level, but isn't it really on a path-level? i.e., if two parts of an MPP fail on different channels, retrying one part could retry over the channel failed on the other part?

@TheBlueMatt TheBlueMatt dismissed stale reviews from tnull and jkczyz via 5bff5f9 July 14, 2022 15:59
@TheBlueMatt TheBlueMatt force-pushed the 2022-07-explicit-avoid-retries branch from 8a1b418 to 5bff5f9 Compare July 14, 2022 15:59
@TheBlueMatt
Copy link
Collaborator Author

Oops, right, sorry, rewrote that commit message without changes to the diff, added a note that it this does have the drawback of potentially retrying different parts along the same path, but hopefully the scorer doesn't let that happen unless the payment is gonna fail anyway.

@jkczyz
Copy link
Contributor

jkczyz commented Jul 14, 2022

Oops, right, sorry, rewrote that commit message without changes to the diff, added a note that it this does have the drawback of potentially retrying different parts along the same path, but hopefully the scorer doesn't let that happen unless the payment is gonna fail anyway.

Yeah, same across payments, but as you said hopefully the scorer will learn quickly enough.

jkczyz
jkczyz previously approved these changes Jul 14, 2022
When an HTLC fails, we currently rely on the scorer learning the
failed channel and assigning an infinite (`u64::max_value()`)
penalty to the channel so as to avoid retrying over the exact same
path (if there's only one available path). This is common when
trying to pay a mobile client behind an LSP if the mobile client is
currently offline.

This leads to the scorer being overly conservative in some cases -
returning `u64::max_value()` when a given path hasn't been tried
for a given payment may not be the best decision, even if that
channel failed 50 minutes ago.

By tracking channels which failed on a payment part level and
explicitly refusing to route over them we can relax the
requirements on the scorer, allowing it to make different decisions
on how to treat channels that failed relatively recently without
causing payments to retry the same path forever.

This does have the drawback that it could allow two separate part
of a payment to traverse the same path even though that path just
failed, however this should only occur if the payment is going to
fail anyway, at least as long as the scorer is properly learning.

Closes lightningdevkit#1241, superseding lightningdevkit#1252.
When we consider sending an HTLC over a given channel impossible
due to our current knowledge of the channel's liquidity, we
currently always assign a penalty of `u64::max_value()`. However,
because we now refuse to retry a payment along the same path in
the router itself, we can now make this value configurable. This
allows users to have a relatively high knowledge decay interval
without the side-effect of refusing to try the only available path
in cases where a channel is intermittently available.
In general we should avoid taking paths that we are confident will
not work as much possible, but we should be willing to try each
payment at least once, even if its over a channel that failed
recently. A full Bitcoin penalty for such a channel seems
reasonable - lightning fees are unlikely to ever reach that point
so such channels will be scored much worse than any other potential
path, while still being below `u64::max_value()`.
@TheBlueMatt
Copy link
Collaborator Author

Rebased to address merge conflict.

@TheBlueMatt TheBlueMatt merged commit f75b6cb into lightningdevkit:main Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle only-last-hop temp failure better
4 participants