Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore replacing BTreeSet in IndexedMap with a sorted Vec #1473

Closed
TheBlueMatt opened this issue May 9, 2022 · 9 comments
Closed

Explore replacing BTreeSet in IndexedMap with a sorted Vec #1473

TheBlueMatt opened this issue May 9, 2022 · 9 comments
Milestone

Comments

@TheBlueMatt
Copy link
Collaborator

We currently use a BTreeMap for storing network graph data, as we need to be able to access it starting from a given key and continue iteration. This makes a sorted KV store like a BTree the obvious choice, but we don't actually need the keys to be sorted, we just need them to not change order as we iterate the graph.

Technically we also need to be able to check if a new key is below some previous iteration that is in-progress.

Supposedly the indexmap crate may offer something like what we need, but ultimately any hashtable that doesn't ever re-hash should suffice, and while creating our own probably won't be as efficient as hashbrown, it'd probably still be faster than a btree, and use way less memory compared to the substantial heap fragmentation that a btree creates.

@TheBlueMatt TheBlueMatt added this to the 0.2 milestone May 9, 2022
@dunxen
Copy link
Contributor

dunxen commented May 11, 2022

I'd be interested to look at this seeing as it is a way away still in terms of milestones. Could I be assigned this to track it better? :)

@TheBlueMatt
Copy link
Collaborator Author

You're welcome to work on it! We use milestones more to track "when do we need to work on this", but we welcome contributions to address any issue, even if it doesn't have a milestone!

That said, this one is gonna be more than a bit tricky, I imagine. Sadly the Rust std/hashbrown HashMap doesn't have any guarantees about when it rehashes, I don't believe, unlike the C++ HashMap where you get them as a function of capacity. Building our own HashMap is probably the first rabbit hole to go down, but that will need to be relatively carefully benchmarked.

@dunxen
Copy link
Contributor

dunxen commented May 11, 2022

Building our own HashMap is probably the first rabbit hole to go down, but that will need to be relatively carefully benchmarked.

Happy to start going down this rabbit hole :)

@TheBlueMatt
Copy link
Collaborator Author

Post-#1553 we'll probably want to also consider replacing the heap-allocated string there with a ([u8; 255], len: u8) pair. 255 bytes isn't cheap, and shoving it inside the address enum would bloat the address enum a lot, so I'll bet it won't be worth it, but it wouldn't be the first time I've been substantially surprised by heap fragmentation effects.

@G8XSU
Copy link
Contributor

G8XSU commented Aug 10, 2022

Requirement clarification :
I understand that on-insert/rehash, we want to complete ongoing iteration in some order just to cover all keys(existing) and not cover duplicate keys.

But how important is it to also iterate through newly added key in ongoing iteration ?
Is it just good to have? Or a necessary requirement to cover the newly added key.

@TheBlueMatt
Copy link
Collaborator Author

We don't need to iterate, but we do need to be able to decide whether a key would have already been sent to a peer or not. Specifically

fn should_forward_channel_announcement(&self, channel_id: u64) -> bool {
if self.their_features.as_ref().unwrap().supports_gossip_queries() &&
!self.sent_gossip_timestamp_filter {
return false;
}
match self.sync_status {
InitSyncTracker::NoSyncRequested => true,
InitSyncTracker::ChannelsSyncing(i) => i < channel_id,
InitSyncTracker::NodesSyncing(_) => true,
}
}
/// Similar to the above, but for node announcements indexed by node_id.
fn should_forward_node_announcement(&self, node_id: PublicKey) -> bool {
if self.their_features.as_ref().unwrap().supports_gossip_queries() &&
!self.sent_gossip_timestamp_filter {
return false;
}
match self.sync_status {
InitSyncTracker::NoSyncRequested => true,
InitSyncTracker::ChannelsSyncing(_) => false,
InitSyncTracker::NodesSyncing(pk) => pk < node_id,
. While we're sending gossip to a peer, if new gossip comes in, we need to be able to figure out if we should already have sent that gossip (and should forward it), or if we'll send that later during the gossip sync.

@wpaulino wpaulino changed the title Drop NetworkGraph BtreeMap in favor of a non-rehashing HashMap Explore replacing BTreeSet in IndexedMap with a sorted Vec Jan 26, 2023
@wpaulino
Copy link
Contributor

Updated the issue title due to the recent merge #1799.

@tnull
Copy link
Contributor

tnull commented Jan 26, 2023

Updated the issue title due to the recent merge #1799.

Hm, maybe we should just close this and open a new issue tagged "good first issue"?

@wpaulino
Copy link
Contributor

Sure, closing in favor of #1992

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants