Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Router Optimizations #1799

Merged
merged 7 commits into from
Jan 25, 2023

Conversation

TheBlueMatt
Copy link
Collaborator

@TheBlueMatt TheBlueMatt commented Oct 25, 2022

After discussion in #1722 we realized the A* stuff these days is entirely useless (thanks ZFR!), so its best to remove it. While we're at it, this also swaps our stupid BTree lookups for a HashMap, but keeps a sorted keys list for outbound gossip sync.

This gets us halfway to #1473, with a TODO to investigate swapping the BTreeSet for a sorted vec, which I have a strong feeling will be faster (and way more space-effecient!).

This is totally up-for-grabs - it needs documentation, real commit messages, benchmarks, etc. If no one else does it I'll pick it up eventually but this should be a nice improvement.

Supersedes #1722.

@TheBlueMatt
Copy link
Collaborator Author

Oh, would also be cool to add a fuzzer that demonstrates equivalence between the custom map and a btreemap.

The previous copy was more than one and a half years old, the
lightning network has changed a lot since!

As of this commit, performance on my Xeon W-10885M with a
SK hynix Gold P31 storing a BTRFS volume is as follows:

```
test ln::channelmanager::bench::bench_sends                                  ... bench:   5,896,492 ns/iter (+/- 512,421)
test routing::gossip::benches::read_network_graph                            ... bench: 1,645,740,604 ns/iter (+/- 47,611,514)
test routing::gossip::benches::write_network_graph                           ... bench: 234,870,775 ns/iter (+/- 8,301,775)
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 166,155,032 ns/iter (+/- 30,206,162)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer  ... bench: 136,843,661 ns/iter (+/- 67,111,218)
test routing::router::benches::generate_routes_with_probabilistic_scorer     ... bench:  52,954,598 ns/iter (+/- 11,360,547)
test routing::router::benches::generate_routes_with_zero_penalty_scorer      ... bench:  37,598,126 ns/iter (+/- 17,262,519)
test bench::bench_sends                                                      ... bench:  37,760,922 ns/iter (+/- 5,179,123)
test bench::bench_reading_full_graph_from_file                               ... bench:      25,615 ns/iter (+/- 1,149)
```
Historically we've had various bugs in keeping the
`lowest_inbound_channel_fees` field in `NodeInfo` up-to-date as we
go. This leaves the A* routing less efficient as it can't prune
hops as aggressively.

In order to get accurate benchmarks, this commit updates the
minimum-inbound-fees field on load. This is not the most efficient
way of doing so, but suffices for fetching benchmarks and will be
removed in the coming commits.

Note that this is *slower* than the non-updating version in the
previous commit. While I haven't dug into this incredibly deeply,
the graph snapshot in use has min-fee info for only 9,618 of
20,818 nodes. Thus, it is my guess that with the graph snapshot
as-is the branch predictor is able to largely remove the A*
heuristic lookups, but with this change it is forced to wait for
A* heuristic map lookups to complete, causing a performance
regression.

```
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 182,980,059 ns/iter (+/- 32,662,047)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer  ... bench: 151,170,457 ns/iter (+/- 75,351,011)
test routing::router::benches::generate_routes_with_probabilistic_scorer     ... bench:  58,187,277 ns/iter (+/- 11,606,440)
test routing::router::benches::generate_routes_with_zero_penalty_scorer      ... bench:  41,210,193 ns/iter (+/- 18,103,320)
```
@TheBlueMatt TheBlueMatt force-pushed the 2022-10-heap-nerdsnipe branch 2 times, most recently from 87bc732 to e572ae7 Compare January 19, 2023 21:31
@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2023

Codecov Report

Base: 90.71% // Head: 90.77% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (bde841e) compared to base (153b048).
Patch coverage: 89.43% of modified lines in pull request are covered.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1799      +/-   ##
==========================================
+ Coverage   90.71%   90.77%   +0.06%     
==========================================
  Files          97       99       +2     
  Lines       50677    51701    +1024     
  Branches    50677    51701    +1024     
==========================================
+ Hits        45971    46933     +962     
- Misses       4706     4768      +62     
Impacted Files Coverage Δ
lightning/src/routing/router.rs 91.15% <52.38%> (+0.23%) ⬆️
lightning/src/util/indexed_map.rs 96.29% <96.29%> (ø)
lightning/src/routing/gossip.rs 92.05% <100.00%> (-0.11%) ⬇️
lightning/src/ln/inbound_payment.rs 92.00% <0.00%> (-1.50%) ⬇️
lightning/src/chain/onchaintx.rs 94.56% <0.00%> (-0.84%) ⬇️
lightning/src/ln/functional_tests.rs 96.69% <0.00%> (-0.44%) ⬇️
lightning/src/util/ser.rs 91.41% <0.00%> (-0.30%) ⬇️
lightning/src/util/ser_macros.rs 86.73% <0.00%> (-0.30%) ⬇️
lightning-invoice/src/utils.rs 97.62% <0.00%> (-0.15%) ⬇️
lightning-invoice/src/lib.rs 87.37% <0.00%> (-0.11%) ⬇️
... and 15 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TheBlueMatt TheBlueMatt force-pushed the 2022-10-heap-nerdsnipe branch 2 times, most recently from 8682115 to 20bedad Compare January 19, 2023 21:41
@TheBlueMatt
Copy link
Collaborator Author

Cleaned up the commit messages, added benchmarks that demonstrate the performance advantages, and added a fuzzer that should catch and issues in the new map implementation.

@TheBlueMatt TheBlueMatt marked this pull request as ready for review January 19, 2023 21:44
@TheBlueMatt TheBlueMatt force-pushed the 2022-10-heap-nerdsnipe branch 3 times, most recently from c1efa29 to 3f91255 Compare January 19, 2023 21:51
@tnull tnull self-requested a review January 19, 2023 22:03
}

/// Returns an iterator which iterates over the `key`/`value` pairs in a random order.
pub fn unordered_iter(&self) -> impl Iterator<Item = (&K, &V)> {
Copy link
Contributor

@arik-so arik-so Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the benefit of a random order? I thought the entire point of this data structure was that the iteration order be deterministic? Should the doc comment be updated to reflect that this actually returns a deterministically sorted list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't need the ordering it's much more efficient. Most uses don't care about the order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But won't it technically always return an ordered version? Considering that, at least in this commit, the underlying structure is a BTreeMap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the next commit :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that's… a good thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get your question? The first commit changes callsites to make explicit whether they're relying on getting things in-order or not, the second commit actually changes the backing datastructure. Just because something happens to be in-order doesn't mean a caller can rely on it if the API contract clearly indicates they cant.

@@ -18,15 +20,18 @@ use core::ops::RangeBounds;
/// actually backed by a `HashMap`, with some additional tracking to ensure we can iterate over
/// keys in the order defined by [`Ord`].
#[derive(Clone, PartialEq, Eq)]
pub struct IndexedMap<K: Ord, V> {
map: BTreeMap<K, V>,
pub struct IndexedMap<K: Hash + Ord, V> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a separate commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make more sense to introduce IndexedMap as the desired type from the beginning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks it up a bit to be a tiny bit easier to review? Makes the commit that adds the new data structure implementation a freestanding commit.

use crate::utils::test_logger;

// Note that while we take the trees by &mut here
fn check_eq(btree: &BTreeMap<u8, u8>, indexed: &IndexedMap<u8, u8>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks super useful. You may wanna add that to an IndexedMap test_util perhaps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no IndexedMap test_util? Are you suggesting/requesting additional tests?

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Saw similar improvements on my hardware.

lightning/src/util/indexed_map.rs Outdated Show resolved Hide resolved
fuzz/src/indexedmap.rs Outdated Show resolved Hide resolved
lightning/src/util/indexed_map.rs Show resolved Hide resolved
lightning/src/routing/gossip.rs Show resolved Hide resolved
lightning/src/routing/gossip.rs Show resolved Hide resolved
lightning/src/util/indexed_map.rs Outdated Show resolved Hide resolved
lightning/src/util/indexed_map.rs Outdated Show resolved Hide resolved
lightning/src/util/indexed_map.rs Outdated Show resolved Hide resolved
lightning/src/util/indexed_map.rs Outdated Show resolved Hide resolved
lightning/src/util/indexed_map.rs Show resolved Hide resolved
Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to squash

fuzz/src/bin/msg_channel_details_target.rs Show resolved Hide resolved
fuzz/src/bin/indexedmap_target.rs Show resolved Hide resolved
As evidenced by the previous commit, it appears our A* router
does worse than a more naive approach. This isn't super surpsising,
as the A* heuristic calculation requires a map lookup, which is
relatively expensive.

```
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 169,991,943 ns/iter (+/- 30,838,048)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer  ... bench: 122,144,987 ns/iter (+/- 61,708,911)
test routing::router::benches::generate_routes_with_probabilistic_scorer     ... bench:  48,546,068 ns/iter (+/- 10,379,642)
test routing::router::benches::generate_routes_with_zero_penalty_scorer      ... bench:  32,898,557 ns/iter (+/- 14,157,641)
```
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM.

Some questions/suggestions, feel free to squash if you decide not to tackle them.

lightning/src/util/indexed_map.rs Show resolved Hide resolved
lightning/src/routing/router.rs Outdated Show resolved Hide resolved
@TheBlueMatt TheBlueMatt force-pushed the 2022-10-heap-nerdsnipe branch 2 times, most recently from 592d3ee to 78ac11e Compare January 25, 2023 17:23
@TheBlueMatt
Copy link
Collaborator Author

Squashed, updated the docs trivially, and added a commit to clean up a few more things in the router:

$ git diff-tree -U3 158a3f1 2173280f
diff --git a/lightning/src/routing/router.rs b/lightning/src/routing/router.rs
index eb6eede0e..8a18c44ba 100644
--- a/lightning/src/routing/router.rs
+++ b/lightning/src/routing/router.rs
@@ -885,18 +885,11 @@ impl<'a> PaymentPath<'a> {
 	}
 }
 
+#[inline(always)]
+/// Calculate the fees required to route the given amount over a channel with the given fees.
 fn compute_fees(amount_msat: u64, channel_fees: RoutingFees) -> Option<u64> {
-	let proportional_fee_millions =
-		amount_msat.checked_mul(channel_fees.proportional_millionths as u64);
-	if let Some(new_fee) = proportional_fee_millions.and_then(|part| {
-			(channel_fees.base_msat as u64).checked_add(part / 1_000_000) }) {
-
-		Some(new_fee)
-	} else {
-		// This function may be (indirectly) called without any verification,
-		// with channel_fees provided by a caller. We should handle it gracefully.
-		None
-	}
+	amount_msat.checked_mul(channel_fees.proportional_millionths as u64)
+		.and_then(|part| (channel_fees.base_msat as u64).checked_add(part / 1_000_000))
 }
 
 /// The default `features` we assume for a node in a route, when no `features` are known about that
@@ -1289,7 +1282,7 @@ where L::Target: Logger {
 							if !should_process { should_process = true; }
 						}
 
-						if should_process {
+						'processing_node: while should_process {
 							let mut hop_use_fee_msat = 0;
 							let mut total_fee_msat = $next_hops_fee_msat;
 
@@ -1299,7 +1292,7 @@ where L::Target: Logger {
 								match compute_fees(amount_to_transfer_over_msat, $candidate.fees()) {
 									// max_value means we'll always fail
 									// the old_entry.total_fee_msat > total_fee_msat check
-									None => total_fee_msat = u64::max_value(),
+									None => break 'processing_node,
 									Some(fee_msat) => {
 										hop_use_fee_msat = fee_msat;
 										total_fee_msat += hop_use_fee_msat;
@@ -1392,6 +1385,7 @@ where L::Target: Logger {
 									);
 								}
 							}
+							break 'processing_node;
 						}
 					}
 				}
diff --git a/lightning/src/util/indexed_map.rs b/lightning/src/util/indexed_map.rs
index 12c9c9dcd..cccbfe7bc 100644
--- a/lightning/src/util/indexed_map.rs
+++ b/lightning/src/util/indexed_map.rs
@@ -8,7 +8,7 @@ use core::ops::RangeBounds;
 
 /// A map which can be iterated in a deterministic order.
 ///
-/// This would traditionally be accomplished by simply using a `BTreeMap`, however B-Trees
+/// This would traditionally be accomplished by simply using a [`BTreeMap`], however B-Trees
 /// generally have very slow lookups. Because we use a nodes+channels map while finding routes
 /// across the network graph, our network graph backing map must be as performant as possible.
 /// However, because peers expect to sync the network graph from us (and we need to support that
@@ -16,9 +16,11 @@ use core::ops::RangeBounds;
 /// into our outbound message queue), we need an iterable map with a consistent iteration order we
 /// can jump to a starting point on.
 ///
-/// Thus, we have a custom data structure here - its API mimics that of Rust's `BTreeMap`, but is
+/// Thus, we have a custom data structure here - its API mimics that of Rust's [`BTreeMap`], but is
 /// actually backed by a [`HashMap`], with some additional tracking to ensure we can iterate over
 /// keys in the order defined by [`Ord`].
+///
+/// [`BTreeMap`]: alloc::collections::BTreeMap
 #[derive(Clone, Debug, PartialEq, Eq)]
 pub struct IndexedMap<K: Hash + Ord, V> {
 	map: HashMap<K, V>,

@TheBlueMatt
Copy link
Collaborator Author

Rewrote the last commit to do more like what @tnull suggested, taking advantage of the CMOVs that saturating_* compile down to rather than explicit branching.

tnull
tnull previously approved these changes Jan 25, 2023
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side.

lightning/src/routing/router.rs Show resolved Hide resolved
Our network graph has to be iterable in a deterministic order and
with the ability to iterate over a specific range. Thus,
historically, we've used a `BTreeMap` to do the iteration. This is
fine, except our map needs to also provide high performance lookups
in order to make route-finding fast. Sadly, `BTreeMap`s are quite
slow due to the branching penalty.

Here we replace the `BTreeMap`s in the scorer with a dummy wrapper.
In the next commit the internals thereof will be replaced with a
`HashMap`-based implementation.
Our network graph has to be iterable in a deterministic order and
with the ability to iterate over a specific range. Thus,
historically, we've used a `BTreeMap` to do the iteration. This is
fine, except our map needs to also provide high performance lookups
in order to make route-finding fast. Sadly, `BTreeMap`s are quite
slow due to the branching penalty.

Here we replace the implementation of our `IndexedMap` with a
`HashMap` to store the elements itself and a `BTreeSet` to store
the keys set in sorted order for iteration.

As of this commit on the same hardware as the above few commits,
the benchmark results are:

```
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 109,544,993 ns/iter (+/- 27,553,574)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer  ... bench:  81,164,590 ns/iter (+/- 55,422,930)
test routing::router::benches::generate_routes_with_probabilistic_scorer     ... bench:  34,726,569 ns/iter (+/- 9,646,345)
test routing::router::benches::generate_routes_with_zero_penalty_scorer      ... bench:  22,772,355 ns/iter (+/- 9,574,418)
```
Often when we call `compute_fees` we really just want it to
saturate and we deal with `u64::max_value` later. In that case,
we're much better off doing the saturating in the `compute_fees` as
it can use CMOVs rather than branching at each step and then
`unwrap_or`ing at the callsite.
@TheBlueMatt
Copy link
Collaborator Author

Fixed the doc comment in an intermediary commit without changing the full diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants