New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stabilize chunk producer assignments across every epoch #11213
Comments
Depends on #11204 |
New chunk producer assignments algorithmWith requirement of sticky chunk producer assignment, the existing code basically doesn't work, so we needed to come up with some new idea. We discussed whether we need to keep Requirements, in order of priority:
AlgorithmStep 1. Take previous chunk producer assignment. If shard layout in the previous epoch changed, we assume it was empty. This could be smarter and take parent shard ids into account, but we'll care about it when resharding will have to be supported.
Satisfy requirement 1:
Note that it may not end up with the most fair assignment. If we have 4 shards, 4 validators, 2 minimum seats and current assignment is [[0, 1], [1, 2], [0, 2], [3, x]], then x is going to be 0, 1 or 2 and assignment is unfair. But I want to ignore that, because it's pathological case anyway, and with more joining validators next step resolves that unfairness. Satisfy requirement 2:
Each successful iteration decreases number of repeats by 1. Satisfy requirement 3:
In the end we achieve minimal number of state syncs, because:
Notes:
|
Chunk producer assignments algorithm v2New considerationsThe feedback was that v1 didn't consider the following case: At the same time, we still keep New algorithm ideaFirst, the easy steps. Take previous chunk producer assignment. If shard layouts in previous and current epoch are different, we assume that assignment was empty. This could be smarter and take parent shard ids into account, but we'll care about it when resharding will have to be supported. Then we remove validators which are not selected as chunk producers for the new epoch (by going offline or by decreased stake). After that, we optimise on performing the following two kinds of operations in the right way:
Note that in scope of this task, we optimise three parameters which matter for us:
For simplicity we don’t consider T. First, we have some new validators to assign. They state sync anyway. Sort them by (reversed stake, id) for clarity. Use them to optimise D by iteratively assigning to shard tracked by minimal number of validators ("minimal shard"). Then, the only way to minimise D (bring it to 0 or 1) is to reassign some random validator from maximal shard to minimal shard. Each such operation increases S by 1. So resulting D is defined by how many state syncs per epoch we allow. Define it as S_max. Then condition C for reassigning is "we reassign until (minimal shard has < minimum_validators_per_shard validators) OR ((D is not minimal) AND (S < S_max))". Algorithm
Notes
|
Thanks Alex, this looks much simpler. If I understand correctly, the new algorithm will be default and the previous one will only apply if the num producers go below a certain threshold? Could there be a way to modify the new algorithm to support the case of num producers going under a limit instead of running two separate algorithms? Some questions: T = (max stake tracking shard - min stake tracking shard) Then condition C for reassigning is "we reassign until (minimal shard has < minimum_validators_per_shard validators) OR ((D is not minimal) AND (S < S_max))". I think it would be useful in general to find a way to take data from N epochs and run the algorithm on it, not sure what is the right way to capture such information (eg. a sequence of |
I don't see one. The issue is that it would add another parameter to optimise, number of validator repeats in the assignment - it would make logic harder.
So we need some smooth transition of assignment supporting adding repeats and getting rid of them. I doubt that complexity of resulting algorithm is lower, so I suggest to treat this corner case separately.
Yes (and yes). Changed wording a bit
Yes, we can choose it dynamically as well if needed, but I doubt it is needed.
I don't get the question. S_max can be 100 on mainnet, which would mean that we can reassign everyone to achieve minimal D. |
Thanks for the answers and the offline discussion, I now have a good understanding of the differences and similarities of the both algorithms and we can proceed with the implementation. |
Evaluation of new algorithmWe analyse how well algorithm behaves on the last ~750 epochs, on epoch heights from 1200 to 2507, which is around 2 years of mainnet data. In short, the new algorithm looks good for our needs. ResultsWe look at three factors, following the previous logic: number of state syncs, number of validators per shard, stake imbalance. State syncsQuick look at old algo shows that number of state syncs is wild, consistently requiring ~50 chunk producers to download new shard in next epoch. Validators numberIn the old algo, maximal difference in chunk producer number tracking shard is around 1-4, which makes sense, because algorithm cares about stakes, not validators number. Stake differenceThe old algo optimises stake balance, so it is more than expected that it performs great. Relative (mix - min) / max stake difference is below 1%. MethodologyHacky tool which reruns Command: |
Using stable (sticky) chunk producer assignments since stateless validation protocol version, applying algorithm from #11213 (comment). New algorithm properties: * tries to balance numbers of chunk producers for shards, **ignoring the stakes**. Comparison of stake diffs for old and new algo: #11213 (comment) * minimises number of state syncs to be made, setting limit caused by reassignments to 5 in epoch config. Figuring out exact number is a TODO as well, but for mainnet it shouldn't matter because we have way more validator proposals than planned chunk producers (100) at every epoch. Old assignment algorithm is moved behind `old_validator_selection`. The part for assigning validators with repeats is still used though. The main function is `assign_chunk_producers_to_shards` which is comprehensively tested. +737 lines are scary but the algorithm itself is +300 lines with comments and tests are another +250. `get_chunk_producers_assignment` is separated into a function because it became too big.
Using stable (sticky) chunk producer assignments since stateless validation protocol version, applying algorithm from near#11213 (comment). New algorithm properties: * tries to balance numbers of chunk producers for shards, **ignoring the stakes**. Comparison of stake diffs for old and new algo: near#11213 (comment) * minimises number of state syncs to be made, setting limit caused by reassignments to 5 in epoch config. Figuring out exact number is a TODO as well, but for mainnet it shouldn't matter because we have way more validator proposals than planned chunk producers (100) at every epoch. Old assignment algorithm is moved behind `old_validator_selection`. The part for assigning validators with repeats is still used though. The main function is `assign_chunk_producers_to_shards` which is comprehensively tested. +737 lines are scary but the algorithm itself is +300 lines with comments and tests are another +250. `get_chunk_producers_assignment` is separated into a function because it became too big.
Opening this to track the work based on this thread.
Note that we are disabling the shard-assignment shuffling in #11190. However even after this change, there will still be chunk-producer assignment changes at epoch boundaries. The relevant analysis is being done in #11204. The goal of this task is to limit shard-assignment changes further to provide a deterministic algorithm that will keep the assignment stable across epochs.
The text was updated successfully, but these errors were encountered: