scrubber: add `scan-metadata` and hook into integration tests #5176

This provides the same analysis done at the end of `tidy`, but in a standalone command that uses Stream-based listing helpers with parallel execution to provide a faster result when one is interested in the contents of a bucket, but does not want to check the control plane state to learn which items in the bucket correspond to active/non-deleted tenants/timelines.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>

## Problem Currently, the `deploy` job doesn't wait for the custom extension job (in another repo) and can be started even with failed extensions build. This PR adds another job that polls the status of the extension build job and fails if the extension build fails. ## Summary of changes - Add `wait-for-extensions-build` job, which waits for a custom extension build in another repo.

…ime (#5177) The `remote_timeline_client` tests use `#[tokio::test]` and rely on the fact that the test runtime that is set up by this macro is single-threaded. In PR #5164, we observed interesting flakiness with the `upload_scheduling` test case: it would observe the upload of the third layer (`layer_file_name_3`) before we did `wait_completion`. Under the single-threaded-runtime assumption, that wouldn't be possible, because the test code doesn't await inbetween scheduling the upload and calling `wait_completion`. However, RemoteTimelineClient was actually using `BACKGROUND_RUNTIME`. That means there was parallelism where the tests didn't expect it, leading to flakiness such as execution of an UploadOp task before the test calls `wait_completion`. The most confusing scenario is code like this: ``` schedule upload(A); wait_completion.await; // B schedule_upload(C); wait_completion.await; // D ``` On a single-threaded executor, it is guaranteed that the upload up C doesn't run before D, because we (the test) don't relinquish control to the executor before D's `await` point. However, RemoteTimelineClient actually scheduled onto the BACKGROUND_RUNTIME, so, `A` could start running before `B` and `C` could start running before `D`. This would cause flaky tests when making assertions about the state manipulated by the operations. The concrete issue that led to discover of this bug was an assertion about `remote_fs_dir` state in #5164.

…#5086) This RFC describes a simple scheme to make layer map updates crash consistent by leveraging the index_part.json in remote storage. Without such a mechanism, crashes can induce certain edge cases in which broadly held assumptions about system invariants don't hold.

part of #4743 preliminary to #5180

For [#5086](#5086 (comment)) we will require remote storage to be configured in pageserver. This PR enables `localfs`-based storage for all Rust unit tests. Changes: - In `TenantHarness`, set up localfs remote storage for the tenant. - `create_test_timeline` should mimic what real timeline creation does, and real timeline creation waits for the timeline to reach remote storage. With this PR, `create_test_timeline` now does that as well. - All the places that create the harness tenant twice need to shut down the tenant before the re-create through a second call to `try_load` or `load`. - Without shutting down, upload tasks initiated by/through the first incarnation of the harness tenant might still be ongoing when the second incarnation of the harness tenant is `try_load`/`load`ed. That doesn't make sense in the tests that do that, they generally try to set up a scenario similar to pageserver stop & start. - There was one test that recreates a timeline, not the tenant. For that case, I needed to create a `Timeline::shutdown` method. It's a refactoring of the existing `Tenant::shutdown` method. - The remote_timeline_client tests previously set up their own `GenericRemoteStorage` and `RemoteTimelineClient`. Now they re-use the one that's pre-created by the TenantHarness. Some adjustments to the assertions were needed because the assertions now need to account for the initial image layer that's created by `create_test_timeline` to be present.

## Problem #5162 (comment)

## Problem Tests using remote storage have manually entered `test_name` parameters, which: - Are easy to accidentally duplicate when copying code to make a new test - Omit parameters, so don't actually create unique S3 buckets when running many tests concurrently. ## Summary of changes - Use the `request` fixture in neon_env_builder fixture to get the test name, then munge that into an S3 compatible bucket name. - Remove the explicit `test_name` parameters to enable_remote_storage

…metadata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrubber: add `scan-metadata` and hook into integration tests #5176

scrubber: add `scan-metadata` and hook into integration tests #5176

Commits on Sep 1, 2023

Commits on Sep 4, 2023

Commits on Sep 6, 2023

scrubber: add scan-metadata and hook into integration tests #5176

scrubber: add scan-metadata and hook into integration tests #5176

Commits on Sep 1, 2023

Commits on Sep 4, 2023

Commits on Sep 6, 2023

scrubber: add `scan-metadata` and hook into integration tests #5176

scrubber: add `scan-metadata` and hook into integration tests #5176