New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Consistent sharding with bounded loads #16564
base: master
Are you sure you want to change the base?
feat: Consistent sharding with bounded loads #16564
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16564 +/- ##
==========================================
- Coverage 49.73% 49.27% -0.46%
==========================================
Files 274 274
Lines 48948 48211 -737
==========================================
- Hits 24343 23755 -588
+ Misses 22230 22107 -123
+ Partials 2375 2349 -26 ☔ View full report in Codecov by Sentry. |
629cf2a
to
a2714e0
Compare
Enhancement proposal is here #16570 and it has been exposed briefly during several calls, but, we will discuss more deeply |
a2714e0
to
5350c9c
Compare
9d86b5c
to
08e7298
Compare
5cf46ad
to
dfc7af1
Compare
e2d5bf7
to
d143e9f
Compare
9b2ad1f
to
20581cf
Compare
20581cf
to
ca5fadf
Compare
ca5fadf
to
d20f4f3
Compare
@akram Suggested some changes above. I tested with the various changes with 50 clusters 10k apps randomly distributed across the clusters. Results were consistent with the blog post: Cluster/Shard reassignments when switching from 10 to 9 replicas: Max/Min Avg CPU usage across the shards when syncing: Let me know if you have any questions on the changes. |
d20f4f3
to
104d07d
Compare
10934e9
to
e747864
Compare
@ishitasequeira can you PTAL ? |
766419e
to
04d265f
Compare
1cfef01
to
2258479
Compare
@akram Tested with your latest commits and everything is consistent with the previous testing/blog post: Cluster/Shard reassignments when switching from 10 to 9 replicas: Max/Min Avg CPU usage across the shards when syncing: Let me know if you have any questions on the testing. |
e8a5098
to
8e61149
Compare
Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
- The assignment or running of the algorithm has to be consistent across all the clusters. Changed the function to return a map where the consistent hash will be used to build the map - Modifications to the createConsistentHashsingWithBoundLoads function. This will create the map for cluster to shard. Note that the list must be consistent across all shards so that is why the cluster list must be sorted before going through the consistent hash algorithm Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
8e61149
to
ddc0a8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall PR looks good to me!! Left a few nits and a question.
if avgLoadPerNode == 0 { | ||
avgLoadPerNode = 1 | ||
} | ||
avgLoadPerNode = math.Ceil(avgLoadPerNode * 1.25) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 1.25
always going to be constant or can this be configurable? Never the less, should we move this value to a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I would move this to a constant and not make it configurable. 1.25 is the load factor described by the original paper that the algorithm is based on and was found to be the ideal balance between keeping the shards uniform while also keeping consistency when changing shard numbers.
if float64(bserver.Load)+1 <= avgLoadPerNode { | ||
return true | ||
} | ||
|
||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if float64(bserver.Load)+1 <= avgLoadPerNode { | |
return true | |
} | |
return false | |
return float64(bserver.Load)+1 <= avgLoadPerNode |
// ConsistentHashingWithBoundedLoadsAlgorithm uses an algorithm that tries to use an equal distribution accross | ||
// all shards but is optimised to handled sharding and/or cluster addings or removal. In case of sharding or | ||
// cluster changes, this algorithm minimise the changes between shard and clusters assignments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// ConsistentHashingWithBoundedLoadsAlgorithm uses an algorithm that tries to use an equal distribution accross | |
// all shards but is optimised to handled sharding and/or cluster addings or removal. In case of sharding or | |
// cluster changes, this algorithm minimise the changes between shard and clusters assignments. | |
// ConsistentHashingWithBoundedLoadsAlgorithm uses an algorithm that tries to use an equal distribution across | |
// all shards but is optimised to handle sharding and/or cluster addition or removal. In case of sharding or | |
// cluster changes, this algorithm minimises the changes between shard and clusters assignments. |
Checklist: