Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Consistent sharding with bounded loads #16564

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

akram
Copy link
Contributor

@akram akram commented Dec 7, 2023

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@akram akram requested review from a team as code owners December 7, 2023 00:09
@akram akram changed the title Consistent sharding with bounded loads feat: Consistent sharding with bounded loads Dec 7, 2023
Copy link

codecov bot commented Dec 7, 2023

Codecov Report

Attention: Patch coverage is 75.60976% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 49.27%. Comparing base (f87897c) to head (9b2ad1f).
Report is 113 commits behind head on master.

❗ Current head 9b2ad1f differs from pull request most recent head ddc0a8e. Consider uploading reports for the commit ddc0a8e to get more accurate results

Files Patch % Lines
controller/sharding/sharding.go 75.60% 8 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #16564      +/-   ##
==========================================
- Coverage   49.73%   49.27%   -0.46%     
==========================================
  Files         274      274              
  Lines       48948    48211     -737     
==========================================
- Hits        24343    23755     -588     
+ Misses      22230    22107     -123     
+ Partials     2375     2349      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch from 629cf2a to a2714e0 Compare December 7, 2023 00:59
@akram
Copy link
Contributor Author

akram commented Dec 7, 2023

Enhancement proposal is here #16570 and it has been exposed briefly during several calls, but, we will discuss more deeply

@akram akram marked this pull request as draft December 7, 2023 07:50
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch from a2714e0 to 5350c9c Compare December 11, 2023 13:46
@akram akram marked this pull request as ready for review December 11, 2023 13:48
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch 8 times, most recently from 9d86b5c to 08e7298 Compare January 17, 2024 14:17
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch 4 times, most recently from 5cf46ad to dfc7af1 Compare January 25, 2024 15:50
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch 2 times, most recently from e2d5bf7 to d143e9f Compare January 29, 2024 16:00
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch 2 times, most recently from 9b2ad1f to 20581cf Compare February 7, 2024 04:41
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch from 20581cf to ca5fadf Compare February 26, 2024 06:18
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch from ca5fadf to d20f4f3 Compare March 4, 2024 16:24
@Enclavet
Copy link
Contributor

Enclavet commented Mar 11, 2024

@akram Suggested some changes above. I tested with the various changes with 50 clusters 10k apps randomly distributed across the clusters. Results were consistent with the blog post:

Cluster/Shard reassignments when switching from 10 to 9 replicas:
11 changes - consistent hash
37 changes - round-robin

Max/Min Avg CPU usage across the shards when syncing:
0.77 to 0.44 - consistent hash
1.11 to 0.03 - round-robin (this was a big discrepancy that can occur if one shard is assigned a cluster with very little apps vs the other shards getting a cluster with lots of apps)

Let me know if you have any questions on the changes.

@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch from d20f4f3 to 104d07d Compare March 26, 2024 05:20
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch 3 times, most recently from 10934e9 to e747864 Compare April 15, 2024 19:09
@akram
Copy link
Contributor Author

akram commented Apr 15, 2024

@ishitasequeira can you PTAL ?

@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch 5 times, most recently from 766419e to 04d265f Compare April 22, 2024 12:12
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch 2 times, most recently from 1cfef01 to 2258479 Compare April 25, 2024 15:09
@Enclavet
Copy link
Contributor

@akram Tested with your latest commits and everything is consistent with the previous testing/blog post:

Cluster/Shard reassignments when switching from 10 to 9 replicas:
10 changes - consistent hash
37 changes - round-robin

Max/Min Avg CPU usage across the shards when syncing:
0.73 to 0.34 - consistent hash
1.01 to 0.30 - round-robin

Let me know if you have any questions on the testing.

@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch 3 times, most recently from e8a5098 to 8e61149 Compare April 29, 2024 12:47
Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
- The assignment or running of the algorithm has to be consistent across all the clusters. Changed the function to return a map where the consistent hash will be used to build the map

- Modifications to the createConsistentHashsingWithBoundLoads function. This will create the map for cluster to shard. Note that the list must be consistent across all shards so that is why the cluster list must be sorted before going through the consistent hash algorithm

Signed-off-by: Akram Ben Aissi <akram.benaissi@gmail.com>
@akram akram force-pushed the consistent-sharding-with-bounded-loads-akram branch from 8e61149 to ddc0a8e Compare April 30, 2024 09:39
Copy link
Member

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall PR looks good to me!! Left a few nits and a question.

if avgLoadPerNode == 0 {
avgLoadPerNode = 1
}
avgLoadPerNode = math.Ceil(avgLoadPerNode * 1.25)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 1.25 always going to be constant or can this be configurable? Never the less, should we move this value to a constant?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I would move this to a constant and not make it configurable. 1.25 is the load factor described by the original paper that the algorithm is based on and was found to be the ideal balance between keeping the shards uniform while also keeping consistency when changing shard numbers.

Comment on lines +259 to +263
if float64(bserver.Load)+1 <= avgLoadPerNode {
return true
}

return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if float64(bserver.Load)+1 <= avgLoadPerNode {
return true
}
return false
return float64(bserver.Load)+1 <= avgLoadPerNode

Comment on lines +120 to +122
// ConsistentHashingWithBoundedLoadsAlgorithm uses an algorithm that tries to use an equal distribution accross
// all shards but is optimised to handled sharding and/or cluster addings or removal. In case of sharding or
// cluster changes, this algorithm minimise the changes between shard and clusters assignments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// ConsistentHashingWithBoundedLoadsAlgorithm uses an algorithm that tries to use an equal distribution accross
// all shards but is optimised to handled sharding and/or cluster addings or removal. In case of sharding or
// cluster changes, this algorithm minimise the changes between shard and clusters assignments.
// ConsistentHashingWithBoundedLoadsAlgorithm uses an algorithm that tries to use an equal distribution across
// all shards but is optimised to handle sharding and/or cluster addition or removal. In case of sharding or
// cluster changes, this algorithm minimises the changes between shard and clusters assignments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants