neutrino: implement parallel header download via header checkpoints #71

Roasbeef · 2018-07-10T01:16:51Z

Currently, we sync the chain purely serially from a single peers. However, due to the existence of header checkpoints, it's actually possible for us to partition up the header space between the checkpoints and download them all in parallel from several distinct peers. This has the potential to speed up initial sync a good bit as we can start to cut off slow peers, and re-assign ranges to responsive peers.

Note that we'll likely want to address #66 before this, as that should refactor the code significantly and also switch over to pure usage of getheaders.

The text was updated successfully, but these errors were encountered:

Chinwendu20 · 2023-05-22T12:47:53Z

Hello all. I would be taking this issue as part of my summer of Bitcoin project. In this section of my proposal, I explained my current knowledge of the sync process. You can go through it here:
https://docs.google.com/document/d/1NyU3ZgmMJxvNUJ-DbziL2OSIY8Xgi3DlinDhE3NMJrc/edit#heading=h.evjy3wpowz
What I hope to do is:

Start each call to the sync process in a new goroutine
Limit each process to a particular checkpoint region
Synchronize the goroutines.

Chinwendu20 · 2023-06-03T19:59:03Z

We can partition and fetch headers in parallel but writing headers to file would still require an order and so maintain a synchronous process

Roasbeef · 2023-06-05T19:15:57Z

We can partition and fetch headers in parallel but writing headers to file would still require an order and so maintain a synchronous process

Correct, an internal message queue would be a nautral fit here. I think you should check out some of the recent PRs that start to batch write the fetched fitlers on disk.

positiveblue · 2023-06-06T06:11:02Z

@Chinwendu20 as @Roasbeef points out there are a couple of PRs that can be useful here: #273 and #274.

I would like to know your approach here but we could start with something simple:

Change the current logic to download the headers in parallel instead of sequentially from the same peer. for that we need
- A list of peers that can be used to fetch the headers from
- Split the span of blocks from a big (locator, stopHash) to a list of [(locator_i, stopHash_i)]
- Some way to organize the task distribution among those peers. So we need a way to assign a peer a task + track its status.

The logic for sending/receiving messages from a peer is defined in btcd. Neutrino uses the ServerPeer struct to connect the server with blockmanager. You can follow the logic after OnHeaders
We will then have to change the logic in handleHeadersMsg.

As you can see, most of the logic is controlled by queues (channels)

We can make the changes for being able to download multiple headers in parallel, with the task manager etc and then decide what to do while the tasks are being completed.

Should we fetch all the headers in parallel, store them in memory and validate them when they are downloaded?
Should we fetch them "in continuous batches" and check that they are validate that the batch itself is "valid" before asking for the next batch?

For sure we can have a nice performance boost by not storing them one by one but "in batches" when they have been validated.

Those look like enough changes for me in a first draft PR but I am more than happy to know how you would like to approach it

Chinwendu20 · 2023-06-13T19:04:43Z

Thanks a lot @positiveblue

I have gone through the work by @aakselrod @Roasbeef @halseth on parallelizing filter header download using the work manager and workers. It is a pattern that I am following to get this done.

However, in that scenario, checkpoints help spread the workload for download across different peers because the hashes within a range can be verified as double hashing the hashes within that range should give the checkpoint. I am wondering in this scenario how checkpoints would help in parallelizing the header as I do not think the same applies to block headers,

Roasbeef · 2023-06-14T01:46:45Z

I am wondering in this scenario how checkpoints would help in parallelizing the header as I do not think the same applies to block headers,

Block headers also have known checkpoints: https://github.com/btcsuite/btcd/blob/v0.23.4/chaincfg/params.go#L304-L331

Chinwendu20 · 2023-06-14T05:51:56Z

Thanks @Roasbeef . I know that already. I was referring to the verification procedure. The checkpoints in filter headers help in verification as double hashing the hashes within that range should give the checkpoint . I was wondering how checkpoints in block headers would also help in verification process for block headers as unlike filter headers, there is no causality between headers within a checkpoint range such that hashing headers within a checkpoint range should givevthe hash of the filter checkpoint.

All this to say that for the block headers, I do not see how fetching headers within a checkpoint range would help in verification as it would be no different between doing that abd fetching within a random range because of the points metioned above.

Chinwendu20 · 2023-06-15T08:29:53Z

I think since every header should have the header hash of the previous block header perhaps we can use that for verification using the checkpoints. I think I would work with this and see how it goes.

Chinwendu20 · 2023-06-15T08:40:47Z

I think since every header should have the header hash of the previous block header perhaps we can use that for verification using the checkpoints. I think I would work with this and see how it goes.

Never mind about this, I think I know the role checkpoints would play in parallelizing the header. Since every block header has the block hash of the previous header, checkpoints could then be used to verify a range of headers gotten from a peer.

Roasbeef · 2023-06-15T23:08:30Z

Since every block header has the block hash of the previous header, checkpoints could then be used to verify a range of headers gotten from a peer.

Yep, I think you're on the right track: all headers have the prev header hash, so given all the headers in a range within a checkpoint interval, one can hash then in series to verify that you end up with the checkpoint itself. There'a also code in btcd to check the headers against the checkpoints as they're received.

Chinwendu20 · 2023-06-26T11:48:35Z

Since every block header has the block hash of the previous header, checkpoints could then be used to verify a range of headers gotten from a peer.

Yep, I think you're on the right track: all headers have the prev header hash, so given all the headers in a range within a checkpoint interval, one can hash then in series to verify that you end up with the checkpoint itself. There'a also code in btcd to check the headers against the checkpoints as they're received.

Thank you

Chinwendu20 · 2023-06-26T12:01:37Z

Hmm would this involve creating a different method of constructing block locators? As I can see block locators are created from already existing blocks in the database. There would be no way of doing this if we are sending a request that should contain this block locator in parallel as we would be creating block locators for heights that do not exist in the DB.

Chinwendu20 · 2023-06-26T13:14:30Z

Hmm would this involve creating a different method of constructing block locators? As I can see block locators are created from already existing blocks in the database. There would be no way of doing this if we are sending a request that should contain this block locator in parallel as we would be creating block locators for heights that do not exist in the DB.

I guess I was just follow this.

neutrino/blockmanager.go

Lines 2328 to 2336 in d501210

    
           locator := make(blockchain.BlockLocator, 0, 
        
           	wire.MaxBlockLocatorsPerMsg) 
        
           locator = append(locator, &lastHash) 
        
           // Add locator from the database as backup. 
        
           knownLocator, err := b.cfg.BlockHeaders.LatestBlockLocator() 
        
           if err == nil { 
        
           	locator = append(locator, knownLocator...) 
        
           }

. Thank you.

Chinwendu20 · 2023-07-02T21:53:03Z

Since every block header has the block hash of the previous header, checkpoints could then be used to verify a range of headers gotten from a peer.

Yep, I think you're on the right track: all headers have the prev header hash, so given all the headers in a range within a checkpoint interval, one can hash then in series to verify that you end up with the checkpoint itself. There'a also code in btcd to check the headers against the checkpoints as they're received.

Could you please refer me to the code in btcd that does this? Verifies headers against checkpoint as it is being received as I cannot find it.

While testing this out. I just realized that the difference between a header checkpoint and the next could be more than 2000 headers which is the maximum number of headers that a message can contain. Initially, I thought that I could just partition the headers sending a pushgetheaders message to fetch a range of headers from one checkpoint to the next checkpoint but that would not be possible. I guess I would just partitition them in a range of the maximum header per message so the nextcheckpoint would not be the end hash in some cases.

So I would just fetch batch of headers and verify them using this function

This process makes me wonder how checkpoints were intended to help in parellizing headers in the first place.

From the handleheaderMsg it is not only the header hash that is used to carry out verification. There also other components of the header needed by the checksanity function. These components are not simply present in the Checkpoint struct.

Chinwendu20 · 2023-07-03T14:18:32Z

Since every block header has the block hash of the previous header, checkpoints could then be used to verify a range of headers gotten from a peer.

Yep, I think you're on the right track: all headers have the prev header hash, so given all the headers in a range within a checkpoint interval, one can hash then in series to verify that you end up with the checkpoint itself. There'a also code in btcd to check the headers against the checkpoints as they're received.

Could you please refer me to the code in btcd that does this? Verifies headers against checkpoint as it is being received as I cannot find it.

While testing this out. I just realized that the difference between a header checkpoint and the next could be more than 2000 headers which is the maximum number of headers that a message can contain. Initially, I thought that I could just partition the headers sending a pushgetheaders message to fetch a range of headers from one checkpoint to the next checkpoint but that would not be possible. I guess I would just partitition them in a range of the maximum header per message so the nextcheckpoint would not be the end hash in some cases.

So I would just fetch batch of headers and verify them using this function

This process makes me wonder how checkpoints were intended to help in parellizing headers in the first place.

From the handleheaderMsg it is not only the header hash that is used to carry out verification. There also other components of the header needed by the checksanity function. These components are not simply present in the Checkpoint struct.

Never mind about this, headers at a checkpoint height do not undergo verification rather their hashes only are sufficient enough to verify other headers after them. I guess that is how checkpoints can help in parallelizing headers as headers at a particular checkpoint height do not need to be in store to obtain their hash as their hash is already available in the checkpoint struct.

Roasbeef · 2023-07-04T19:23:09Z

Hmm would this involve creating a different method of constructing block locators

Block locators just describe the next set of intervals to fetch. Assuming you know the current height, and the next checkpoint, then you can create your own block locators. Check out the blockLocatorFromHash method.

Chinwendu20 · 2023-07-07T07:22:35Z

Hmm would this involve creating a different method of constructing block locators

Block locators just describe the next set of intervals to fetch. Assuming you know the current height, and the next checkpoint, then you can create your own block locators. Check out the blockLocatorFromHash method.

Thanks a lot

Chinwendu20 · 2023-07-07T07:30:57Z

Since every block header has the block hash of the previous header, checkpoints could then be used to verify a range of headers gotten from a peer.

Yep, I think you're on the right track: all headers have the prev header hash, so given all the headers in a range within a checkpoint interval, one can hash then in series to verify that you end up with the checkpoint itself. There'a also code in btcd to check the headers against the checkpoints as they're received.

Could you please refer me to the code in btcd that does this? Verifies headers against checkpoint as it is being received as I cannot find it.
While testing this out. I just realized that the difference between a header checkpoint and the next could be more than 2000 headers which is the maximum number of headers that a message can contain. Initially, I thought that I could just partition the headers sending a pushgetheaders message to fetch a range of headers from one checkpoint to the next checkpoint but that would not be possible. I guess I would just partitition them in a range of the maximum header per message so the nextcheckpoint would not be the end hash in some cases.
So I would just fetch batch of headers and verify them using this function
This process makes me wonder how checkpoints were intended to help in parellizing headers in the first place.
From the handleheaderMsg it is not only the header hash that is used to carry out verification. There also other components of the header needed by the checksanity function. These components are not simply present in the Checkpoint struct.

Never mind about this, headers at a checkpoint height do not undergo verification rather their hashes only are sufficient enough to verify other headers after them. I guess that is how checkpoints can help in parallelizing headers as headers at a particular checkpoint height do not need to be in store to obtain their hash as their hash is already available in the checkpoint struct.

Yeah, I was wrong about this, the checksanity carries out verification on the header that requires components of the header such as timestamps. So we need more data about the header at a particular checkpoint height to efficiently parallelize. I think we can do this by:

We could add the header fields required by the checksanity function in the checkpoint struct. That means opening a PR at btcd to effect that change
Or before the initial download I could send a pushgetheaders message for headers at the checkpoint height only, and verify by comparing them with the known hash only before proceeding with the initial header download.

Roasbeef · 2023-07-07T21:58:00Z

So we need more data about the header at a particular checkpoint height to efficiently parallelize. I think we can do this by:

Not sure I follow. Given the set of checkpoints, one can generate a GetHeaders message that fetches the next 2k headers from that point, and repeat that within a interval. Each of these is effectively a task, which can then be dispatched to the set of connected peers (see the query interface). From there, you can then have another sub-system that's feeding in the set of headers (as we need to write them serially, but can batch them). Once that has enough headers, it can perform the sanity checks before writing to disk.

Roasbeef added optimization intermediate p2p sync advanced and removed intermediate labels Jul 10, 2018

guggero mentioned this issue Mar 10, 2021

Fix memory usage of bbolt block header index #213

Merged

Chinwendu20 mentioned this issue Jun 29, 2023

Parallelize header download across peers to fetch headers within a chain's checkpointed region. #282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neutrino: implement parallel header download via header checkpoints #71

neutrino: implement parallel header download via header checkpoints #71

Roasbeef commented Jul 10, 2018

Chinwendu20 commented May 22, 2023

Chinwendu20 commented Jun 3, 2023

Roasbeef commented Jun 5, 2023

positiveblue commented Jun 6, 2023

Chinwendu20 commented Jun 13, 2023 •

edited

Roasbeef commented Jun 14, 2023

Chinwendu20 commented Jun 14, 2023

Chinwendu20 commented Jun 15, 2023

Chinwendu20 commented Jun 15, 2023

Roasbeef commented Jun 15, 2023

Chinwendu20 commented Jun 26, 2023

Chinwendu20 commented Jun 26, 2023

Chinwendu20 commented Jun 26, 2023

Chinwendu20 commented Jul 2, 2023

Chinwendu20 commented Jul 3, 2023

Roasbeef commented Jul 4, 2023

Chinwendu20 commented Jul 7, 2023

Chinwendu20 commented Jul 7, 2023 •

edited

Roasbeef commented Jul 7, 2023

neutrino: implement parallel header download via header checkpoints #71

neutrino: implement parallel header download via header checkpoints #71

Comments

Roasbeef commented Jul 10, 2018

Chinwendu20 commented May 22, 2023

Chinwendu20 commented Jun 3, 2023

Roasbeef commented Jun 5, 2023

positiveblue commented Jun 6, 2023

Chinwendu20 commented Jun 13, 2023 • edited

Roasbeef commented Jun 14, 2023

Chinwendu20 commented Jun 14, 2023

Chinwendu20 commented Jun 15, 2023

Chinwendu20 commented Jun 15, 2023

Roasbeef commented Jun 15, 2023

Chinwendu20 commented Jun 26, 2023

Chinwendu20 commented Jun 26, 2023

Chinwendu20 commented Jun 26, 2023

Chinwendu20 commented Jul 2, 2023

Chinwendu20 commented Jul 3, 2023

Roasbeef commented Jul 4, 2023

Chinwendu20 commented Jul 7, 2023

Chinwendu20 commented Jul 7, 2023 • edited

Roasbeef commented Jul 7, 2023

Chinwendu20 commented Jun 13, 2023 •

edited

Chinwendu20 commented Jul 7, 2023 •

edited