Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrubber: add scan-metadata and hook into integration tests #5176

Merged
merged 25 commits into from Sep 6, 2023

Commits on Sep 1, 2023

  1. Copy the full SHA
    8951c57 View commit details
    Browse the repository at this point in the history
  2. scrubber: add scan-metadata command

    This provides the same analysis done at the end of
    `tidy`, but in a standalone command that uses Stream-based
    listing helpers with parallel execution to provide a
    faster result when one is interested in the contents
    of a bucket, but does not want to check the control plane
    state to learn which items in the bucket correspond
    to active/non-deleted tenants/timelines.
    jcsp committed Sep 1, 2023
    Copy the full SHA
    87421aa View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    f544cdc View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    0f0cd9c View commit details
    Browse the repository at this point in the history
  5. Copy the full SHA
    5cdecb5 View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    f9836ad View commit details
    Browse the repository at this point in the history
  7. Update s3_scrubber/src/scan_metadata.rs

    Co-authored-by: Joonas Koivunen <joonas@neon.tech>
    jcsp and koivunej committed Sep 1, 2023
    Copy the full SHA
    bc68568 View commit details
    Browse the repository at this point in the history

Commits on Sep 4, 2023

  1. Wait for custom extensions build before deploy (#5170)

    ## Problem
    
    Currently, the `deploy` job doesn't wait for the custom extension job
    (in another repo) and can be started even with failed extensions build.
    This PR adds another job that polls the status of the extension build job
    and fails if the extension build fails.
    
    ## Summary of changes
    - Add `wait-for-extensions-build` job, which waits for a custom
    extension build in another repo.
    bayandin authored and jcsp committed Sep 4, 2023
    Copy the full SHA
    f51d888 View commit details
    Browse the repository at this point in the history
  2. remote_timeline_client: tests: run upload ops on the tokio::test runt…

    …ime (#5177)
    
    The `remote_timeline_client` tests use `#[tokio::test]` and rely on the
    fact that the test runtime that is set up by this macro is
    single-threaded.
    
    In PR #5164, we observed
    interesting flakiness with the `upload_scheduling` test case:
    it would observe the upload of the third layer (`layer_file_name_3`)
    before we did `wait_completion`.
    
    Under the single-threaded-runtime assumption, that wouldn't be possible,
    because the test code doesn't await inbetween scheduling the upload
    and calling `wait_completion`.
    
    However, RemoteTimelineClient was actually using `BACKGROUND_RUNTIME`.
    That means there was parallelism where the tests didn't expect it,
    leading to flakiness such as execution of an UploadOp task before
    the test calls `wait_completion`.
    
    The most confusing scenario is code like this:
    
    ```
    schedule upload(A);
    wait_completion.await; // B
    schedule_upload(C);
    wait_completion.await; // D
    ```
    
    On a single-threaded executor, it is guaranteed that the upload up C
    doesn't run before D, because we (the test) don't relinquish control
    to the executor before D's `await` point.
    
    However, RemoteTimelineClient actually scheduled onto the
    BACKGROUND_RUNTIME, so, `A` could start running before `B` and
    `C` could start running before `D`.
    
    This would cause flaky tests when making assertions about the state
    manipulated by the operations. The concrete issue that led to discover
    of this bug was an assertion about `remote_fs_dir` state in #5164.
    problame authored and jcsp committed Sep 4, 2023
    Copy the full SHA
    06053dd View commit details
    Browse the repository at this point in the history
  3. rfc: Crash-Consistent Layer Map Updates By Leveraging index_part.json (

    …#5086)
    
    This RFC describes a simple scheme to make layer map updates crash
    consistent by leveraging the index_part.json in remote storage. Without
    such a mechanism, crashes can induce certain edge cases in which broadly
    held assumptions about system invariants don't hold.
    problame authored and jcsp committed Sep 4, 2023
    Copy the full SHA
    94ad504 View commit details
    Browse the repository at this point in the history
  4. FileBlockReader<File> is never used (#5181)

    part of #4743
    
    preliminary to #5180
    problame authored and jcsp committed Sep 4, 2023
    Copy the full SHA
    6455f0d View commit details
    Browse the repository at this point in the history
  5. pageserver: run all Rust tests with remote storage enabled (#5164)

    For
    [#5086](#5086 (comment))
    we will require remote storage to be configured in pageserver.
    
    This PR enables `localfs`-based storage for all Rust unit tests.
    
    Changes:
    
    - In `TenantHarness`, set up localfs remote storage for the tenant.
    - `create_test_timeline` should mimic what real timeline creation does,
    and real timeline creation waits for the timeline to reach remote
    storage. With this PR, `create_test_timeline` now does that as well.
    - All the places that create the harness tenant twice need to shut down
    the tenant before the re-create through a second call to `try_load` or
    `load`.
    - Without shutting down, upload tasks initiated by/through the first
    incarnation of the harness tenant might still be ongoing when the second
    incarnation of the harness tenant is `try_load`/`load`ed. That doesn't
    make sense in the tests that do that, they generally try to set up a
    scenario similar to pageserver stop & start.
    - There was one test that recreates a timeline, not the tenant. For that
    case, I needed to create a `Timeline::shutdown` method. It's a
    refactoring of the existing `Tenant::shutdown` method.
    - The remote_timeline_client tests previously set up their own
    `GenericRemoteStorage` and `RemoteTimelineClient`. Now they re-use the
    one that's pre-created by the TenantHarness. Some adjustments to the
    assertions were needed because the assertions now need to account for
    the initial image layer that's created by `create_test_timeline` to be
    present.
    problame authored and jcsp committed Sep 4, 2023
    Copy the full SHA
    9b91c07 View commit details
    Browse the repository at this point in the history
  6. proxy: error typo (#5187)

    ## Problem
    
    #5162 (comment)
    conradludgate authored and jcsp committed Sep 4, 2023
    Copy the full SHA
    3112aa9 View commit details
    Browse the repository at this point in the history
  7. tests: get test name automatically for remote storage (#5184)

    ## Problem
    
    Tests using remote storage have manually entered `test_name` parameters,
    which:
    - Are easy to accidentally duplicate when copying code to make a new
    test
    - Omit parameters, so don't actually create unique S3 buckets when
    running many tests concurrently.
    
    ## Summary of changes
    
    - Use the `request` fixture in neon_env_builder fixture to get the test
    name, then munge that into an S3 compatible bucket name.
    - Remove the explicit `test_name` parameters to enable_remote_storage
    jcsp committed Sep 4, 2023
    Copy the full SHA
    b7f3ca6 View commit details
    Browse the repository at this point in the history
  8. Copy the full SHA
    80c942f View commit details
    Browse the repository at this point in the history
  9. Copy the full SHA
    826aafb View commit details
    Browse the repository at this point in the history
  10. Copy the full SHA
    da440bf View commit details
    Browse the repository at this point in the history
  11. Copy the full SHA
    40a809f View commit details
    Browse the repository at this point in the history
  12. Copy the full SHA
    25ceb87 View commit details
    Browse the repository at this point in the history
  13. Copy the full SHA
    177645c View commit details
    Browse the repository at this point in the history
  14. Copy the full SHA
    f8622e5 View commit details
    Browse the repository at this point in the history
  15. clippy

    jcsp committed Sep 4, 2023
    Copy the full SHA
    21c489d View commit details
    Browse the repository at this point in the history
  16. Copy the full SHA
    36c57b5 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2023

  1. typos

    jcsp committed Sep 6, 2023
    Copy the full SHA
    a6fef13 View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    8ac049b View commit details
    Browse the repository at this point in the history