Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backup: implement store based backup to solve the long tail issue #53081

Merged
merged 23 commits into from
May 16, 2024

Conversation

3pointer
Copy link
Contributor

@3pointer 3pointer commented May 7, 2024

What problem does this PR solve?

Issue Number: ref #52534

Problem Summary:

current backup(range-based backup) mechanism has 2 problems:

  1. When a store shutdown, the retry on this store was too delayed and resulted in a long tail issue.
  2. Too many ranges or a big range backup request is too slow and sometimes exacerbating the first issue.

What changed and how does it work?

image

This PR is focus on implement backup store-based backup. refer to the storeBackup goroutine and main goroutine.
Also in this PR, I'll remove some special range-based backup code. especially code related to handleFineGrained.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copy link

ti-chi-bot bot commented May 7, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 7, 2024
@3pointer
Copy link
Contributor Author

3pointer commented May 8, 2024

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test label May 8, 2024
Copy link

codecov bot commented May 8, 2024

Codecov Report

Attention: Patch coverage is 78.70813% with 89 lines in your changes are missing coverage. Please review.

Project coverage is 75.4312%. Comparing base (81da4f8) to head (5870354).
Report is 85 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #53081        +/-   ##
================================================
+ Coverage   72.0214%   75.4312%   +3.4098%     
================================================
  Files          1508       1518        +10     
  Lines        434823     451436     +16613     
================================================
+ Hits         313166     340524     +27358     
+ Misses       102084      89934     -12150     
- Partials      19573      20978      +1405     
Flag Coverage Δ
integration 51.2828% <73.2057%> (?)
unit 71.3103% <10.0478%> (+0.0827%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9957% <ø> (ø)
parser ∅ <ø> (∅)
br 57.2166% <78.7081%> (+18.7394%) ⬆️

@3pointer
Copy link
Contributor Author

3pointer commented May 8, 2024

/retest

@3pointer
Copy link
Contributor Author

3pointer commented May 9, 2024

/ok-to-test

@3pointer
Copy link
Contributor Author

3pointer commented May 9, 2024

/ok-to-test

@3pointer
Copy link
Contributor Author

3pointer commented May 9, 2024

/ok-to-test

@3pointer 3pointer requested review from YuJuncen and Leavrth May 9, 2024 09:42
@3pointer
Copy link
Contributor Author

/test pull-br-integration-test

Copy link

tiprow bot commented May 10, 2024

@3pointer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test fast_test_tiprow
  • /test tidb_parser_test

Use /test all to run all jobs.

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@3pointer
Copy link
Contributor Author

/test pull-br-integration-test

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2024
br/tests/br_rawkv/run.sh Show resolved Hide resolved
br/pkg/utils/misc.go Show resolved Hide resolved
br/pkg/backup/client.go Show resolved Hide resolved
br/pkg/backup/client.go Outdated Show resolved Hide resolved
br/pkg/backup/client.go Show resolved Hide resolved
br/pkg/backup/client.go Show resolved Hide resolved
br/pkg/backup/client.go Outdated Show resolved Hide resolved
br/pkg/backup/client.go Show resolved Hide resolved
br/pkg/backup/client.go Show resolved Hide resolved
Copy link
Contributor

@Leavrth Leavrth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

}()
cases := make([]reflect.SelectCase, 0)
for _, ch := range storeBackupChs {
cases = append(cases, reflect.SelectCase{Dir: reflect.SelectRecv, Chan: reflect.ValueOf(ch)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe need a ctx.Done?

Copy link
Contributor

@YuJuncen YuJuncen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm.

bCli, err := client.Backup(ctx, &req)
failpoint.Inject("reset-retryable-error", func(val failpoint.Value) {
switch val.(string) {
case "Unavaiable":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shuold be "Unavailable"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this failpoint never works before..

Copy link

ti-chi-bot bot commented May 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Leavrth, YuJuncen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

ti-chi-bot bot commented May 15, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-05-14 06:32:53.020598525 +0000 UTC m=+1548526.777734095: ☑️ agreed by Leavrth.
  • 2024-05-15 07:45:47.300081645 +0000 UTC m=+1639301.057217219: ☑️ agreed by YuJuncen.

@3pointer
Copy link
Contributor Author

/test unit-test

Copy link

ti-chi-bot bot commented May 16, 2024

@3pointer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test build
  • /test check-dev
  • /test check-dev2
  • /test mysql-test
  • /test pull-br-integration-test
  • /test pull-integration-ddl-test
  • /test pull-lightning-integration-test
  • /test pull-mysql-client-test
  • /test unit-test

The following commands are available to trigger optional jobs:

  • /test canary-notify-when-compatibility-sections-changed
  • /test pingcap/tidb/canary_ghpr_unit_test
  • /test pull-common-test
  • /test pull-e2e-test
  • /test pull-integration-common-test
  • /test pull-integration-copr-test
  • /test pull-integration-jdbc-test
  • /test pull-integration-mysql-test
  • /test pull-integration-nodejs-test
  • /test pull-sqllogic-test
  • /test pull-tiflash-test

Use /test all to run the following jobs that were automatically triggered:

  • pingcap/tidb/ghpr_build
  • pingcap/tidb/ghpr_check
  • pingcap/tidb/ghpr_check2
  • pingcap/tidb/ghpr_mysql_test
  • pingcap/tidb/ghpr_unit_test
  • pingcap/tidb/pull_br_integration_test
  • pingcap/tidb/pull_integration_ddl_test
  • pingcap/tidb/pull_mysql_client_test

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

tiprow bot commented May 16, 2024

@3pointer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test fast_test_tiprow
  • /test tidb_parser_test

Use /test all to run all jobs.

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@3pointer
Copy link
Contributor Author

/test unit-test

Copy link

tiprow bot commented May 16, 2024

@3pointer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test fast_test_tiprow
  • /test tidb_parser_test

Use /test all to run all jobs.

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@3pointer
Copy link
Contributor Author

/test unit-test

Copy link

tiprow bot commented May 16, 2024

@3pointer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test fast_test_tiprow
  • /test tidb_parser_test

Use /test all to run all jobs.

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot merged commit 835ea04 into pingcap:master May 16, 2024
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test release-note-none size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants