Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coalesce token ranges sent in repair jobs #3789

Open
Tracked by #3791
Michal-Leszczynski opened this issue Apr 15, 2024 · 0 comments · May be fixed by #3866
Open
Tracked by #3791

Coalesce token ranges sent in repair jobs #3789

Michal-Leszczynski opened this issue Apr 15, 2024 · 0 comments · May be fixed by #3866
Assignees
Labels
enhancement New feature or request repair

Comments

@Michal-Leszczynski
Copy link
Collaborator

As mentioned in the https://github.com/scylladb/scylla-enterprise/issues/4006, it might be beneficial to always send more token ranges per job than max_repair_ranges_in_parallel (e.g. the greater of 10 * max_repair_ranges_in_parallel and 5%-10% of all ranges owned by repaired replica set). It allows for better concurrency (the whole job won't be blocked from returning when only 1 token range is still being repaired). Note that this should be safe to do with non-max intensity as well. It would require SM to use the ranges_parallelism repair param that limits used parallelism even when SM sends less ranges than max_repair_ranges_in_parallel.

This behavior could be controlled by an additional flag or repair config option in scylla-manager.yaml.
A side benefit would be to decrease the amount of clutter in SM logs.
In terms of testing, it would be good to see performance improvement on a cluster like: 2ds, 5 nodes each, keyspace with RF 3 in each dc, setup in which the repair indeed has to do some work (missing rows on some nodes).

@Michal-Leszczynski Michal-Leszczynski added enhancement New feature or request repair labels Apr 15, 2024
@Michal-Leszczynski Michal-Leszczynski changed the title Coalesce token ranges sent repair jobs Coalesce token ranges sent in repair jobs Apr 15, 2024
@Michal-Leszczynski Michal-Leszczynski self-assigned this May 24, 2024
Michal-Leszczynski added a commit that referenced this issue May 24, 2024
This should improve both vnode and tablet table repair performance, as described in issues below.
Fixes #3789
Fixes #3792
@Michal-Leszczynski Michal-Leszczynski linked a pull request May 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request repair
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant