Coalesce token ranges sent in repair jobs #3789

Michal-Leszczynski · 2024-04-15T08:36:27Z

As mentioned in the https://github.com/scylladb/scylla-enterprise/issues/4006, it might be beneficial to always send more token ranges per job than max_repair_ranges_in_parallel (e.g. the greater of 10 * max_repair_ranges_in_parallel and 5%-10% of all ranges owned by repaired replica set). It allows for better concurrency (the whole job won't be blocked from returning when only 1 token range is still being repaired). Note that this should be safe to do with non-max intensity as well. It would require SM to use the ranges_parallelism repair param that limits used parallelism even when SM sends less ranges than max_repair_ranges_in_parallel.

This behavior could be controlled by an additional flag or repair config option in scylla-manager.yaml.
A side benefit would be to decrease the amount of clutter in SM logs.
In terms of testing, it would be good to see performance improvement on a cluster like: 2ds, 5 nodes each, keyspace with RF 3 in each dc, setup in which the repair indeed has to do some work (missing rows on some nodes).

The text was updated successfully, but these errors were encountered:

This should improve both vnode and tablet table repair performance, as described in issues below. Fixes #3789 Fixes #3792

Michal-Leszczynski added enhancement New feature or request repair labels Apr 15, 2024

Michal-Leszczynski mentioned this issue Apr 15, 2024

Experiment with speeding up the repair #3791

Open

Michal-Leszczynski changed the title ~~Coalesce token ranges sent repair jobs~~ Coalesce token ranges sent in repair jobs Apr 15, 2024

Michal-Leszczynski mentioned this issue Apr 22, 2024

Add separate parallel/intensity control for tablets #3792

Open

Michal-Leszczynski self-assigned this May 24, 2024

Michal-Leszczynski added a commit that referenced this issue May 24, 2024

feat(repair): use ranges_parallelism

9961d19

This should improve both vnode and tablet table repair performance, as described in issues below. Fixes #3789 Fixes #3792

Michal-Leszczynski linked a pull request May 24, 2024 that will close this issue

Repair: use ranges parallelism #3866

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coalesce token ranges sent in repair jobs #3789

Coalesce token ranges sent in repair jobs #3789

Michal-Leszczynski commented Apr 15, 2024

Coalesce token ranges sent in repair jobs #3789

Coalesce token ranges sent in repair jobs #3789

Comments

Michal-Leszczynski commented Apr 15, 2024