Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2-Phase optimization for first round #95

Closed
ranchalp opened this issue Feb 20, 2024 · 4 comments · May be fixed by #106
Closed

2-Phase optimization for first round #95

ranchalp opened this issue Feb 20, 2024 · 4 comments · May be fixed by #106
Assignees
Labels
design-question gossipbft Relates to core GossipPBFT protocol

Comments

@ranchalp
Copy link
Collaborator

ranchalp commented Feb 20, 2024

I need to think more about this but I think that if we run a first phase just with PREPARE and COMMIT, in the case where a strong quorum of nodes start exactly with the same proposal, we can have a decision in the first round in 2 phases with no QUALITY.

Provided confirmation from the theoretical side, it would be a great optimization to implement this first round before running QUALITY or starting with an exponential backoff.

@ranchalp ranchalp added design-question gossipbft Relates to core GossipPBFT protocol labels Feb 20, 2024
@ranchalp
Copy link
Collaborator Author

ranchalp commented Mar 4, 2024

After careful consideration, I am confident this is a valid optimization:

In the first round, we can skip PREPARE, and have a decision in 2-phases (+ DECIDE for extra long-range attack prevention, so 3 in total). This implies:

  • Justification is the aggregated QUALITY messages only if StrongQuorum(proposal,M), otherwise justification is empty.
  • Attach aggregate of QUALITY when sending COMMIT (skipping PREPARE in first round), or empty aggregate. Change validation rules of COMMIT for first round as well so that a certificate of QUALITY is valid.

@anorth
Copy link
Member

anorth commented Mar 11, 2024

in the case where a strong quorum of nodes start exactly with the same proposal

and from filecoin-project/FIPs#953

This is an optimization that exploits the expectation that most correct nodes will propose the same prefix to finalize during normal operation of F3.

I am not convinced about this assumption. No data has been cited to back it up. However frequently it does not hold, this change will cause the protocol to execute 5 rounds (+ DECIDE) in order to terminate, instead of 3 (in return for 2 instead of 3 when it does hold). Whether that termination represents any progress is up to the die-roll in CONVERGE, and there's still good chance of ending up with base chain.

The great effect of the existing QUALITY phase is to find what agreement can be found among the participants. In times of divergence, it can still quickly agree on what can be agreed, and make progress without causing F3 to take any longer.

How often the assumption of 2/3 nodes proposing exactly the same chain holds will depend on parameters to be determined, such as the delay to allow for the EC chain to settle before proposing it for F3. If we have a long, multi-epoch delay then the assumption will hold often, but of course that would introduce a multi-epoch latency in finalisation. Conversely if we tune the parameter down for fast finality during good times, the assumption will frequently not hold, and the protocol will take 5 rounds to decide basechain.

I think this would better be described as a tradeoff, rather than optimisation. It can gain a reduction in messages and signature validation costs in return for an increase in finalisation latency. I don't think it can give a meaningful reduction in finalisation latency: the potential speed-up from skipping a round is probably more than offset by the starting delay necessary to have the assumption hold true most of the time.

This also makes the protocol more difficult to think about, code, and debug. I'm not claiming it's not a correct change, but it is a tradeoff that hasn't been discussed and I'm not very confident we want to take. It would be great to consider it alongside data about EC settlement times from mainnet, and simulation informed by that data demonstrating the tradeoff.

@ranchalp
Copy link
Collaborator Author

It was my understanding that we expect go-f3 to finalize tipsets roughly within the epoch they were proposed during normal operation. If this is the case, then the QUALITY step can only find agreement between 1 of 2 prefixes (if any): baseChain and baseChain extended by one tipset only. In that case, this is definitely an optimisation.

In full agreement that we could park this until we have more data to back it up. Or perhaps I can think of an even further optimisation that actually proposes the heaviest prefix of the QUALITYs in COMMIT (still skipping prepare), and get the best of both. Will think about this over this/next week.

@Kubuxu
Copy link
Collaborator

Kubuxu commented Apr 22, 2024

I think the consensus was "not right now" I'm closing; feel free to comment and reopen.

@Kubuxu Kubuxu closed this as completed Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-question gossipbft Relates to core GossipPBFT protocol
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants