Skip to content

Commit

Permalink
storage controller: fix handing of tenants with no timelines during s…
Browse files Browse the repository at this point in the history
…cheduling optimization (#7673)

## Problem

Storage controller was using a zero layer count in SecondaryProgress as
a proxy for "not initialized". However, in tenants with zero timelines
(a legitimate state), the layer count remains zero forever.

This caused #7583 to
destabilize the storage controller scale test, which creates lots of
tenants, some of which don't get any timelines.

## Summary of changes

- Use a None mtime instead of zero layer count to determine if a
SecondaryProgress should be ignored.
- Adjust the test to use a shorter heatmap upload period to let it
proceed faster while waiting for scheduling optimizations to complete.
  • Loading branch information
jcsp committed May 9, 2024
1 parent 39c712f commit 107f535
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 1 deletion.
2 changes: 1 addition & 1 deletion storage_controller/src/service.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4745,7 +4745,7 @@ impl Service {
// them in an optimization
const DOWNLOAD_FRESHNESS_THRESHOLD: u64 = 10 * 1024 * 1024 * 1024;

if progress.bytes_total == 0
if progress.heatmap_mtime.is_none()
|| progress.bytes_total < DOWNLOAD_FRESHNESS_THRESHOLD
&& progress.bytes_downloaded != progress.bytes_total
|| progress.bytes_total - progress.bytes_downloaded
Expand Down
3 changes: 3 additions & 0 deletions test_runner/performance/test_storage_controller_scale.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,9 @@ def check_memory():
tenant_id,
shard_count,
stripe_size,
# Upload heatmaps fast, so that secondary downloads happen promptly, enabling
# the controller's optimization migrations to proceed promptly.
tenant_config={"heatmap_period": "10s"},
placement_policy={"Attached": 1},
)
futs.append(f)
Expand Down

1 comment on commit 107f535

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3105 tests run: 2959 passed, 0 failed, 146 skipped (full report)


Flaky tests (1)

Postgres 15

  • test_gc_aggressive: debug

Code coverage* (full report)

  • functions: 31.4% (6314 of 20126 functions)
  • lines: 47.3% (47593 of 100686 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
107f535 at 2024-05-09T12:57:00.120Z :recycle:

Please sign in to comment.