Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw KV Backup is failing due to "Resolve lock timeout" #1470

Open
fschoell opened this issue May 26, 2022 · 0 comments
Open

Raw KV Backup is failing due to "Resolve lock timeout" #1470

fschoell opened this issue May 26, 2022 · 0 comments
Labels
type/bug Something isn't working

Comments

@fschoell
Copy link

  1. What did you do?

We are trying to do a full backup of our raw kv store into our ceph instance using this command:

br backup raw --cf default --pd "http://127.0.0.1:2379" --s3.provider "ceph" --s3.endpoint "http://ceph-internal:7480" -s "s3://tikv-backups/<date>" --send-credentials-to-tikv=true --s3.region "none" --ratelimit 3 --log-file /var/log/tikv-backup.log --start 00 --end ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

Our TikV database holds around 70 TB of data and a few rather large key value pairs (around 40 MB).

  1. What did you expect to see?

We expected the backup to run without errors.

  1. What did you see instead?

The backup fails with error="[tikv:9004]Resolve lock timeout"]

  1. What version of BR and TiDB/TiKV/PD are you using?
$ br -V
Release Version: v4.0.15
Git Commit Hash: 2ca9fa25f15aec061e281ad84f77dacf622ac9c3
Git Branch: heads/refs/tags/v4.0.15
Go Version: go1.13
UTC Build Time: 2021-09-23 09:48:49
Race Enabled: false
$ pd-server -V
Release Version: v4.0.15
Edition: Community
Git Commit Hash: 605dc238b9a9e54491a5e6b0843f30de33421402
Git Branch: heads/refs/tags/v4.0.15
UTC Build Time:  2021-09-23 09:48:07
$ tikv-server -V
TiKV
Release Version:   4.0.15
Edition:           Community
Git Commit Hash:   0a0482dee0a0cf135f56b9f90b412ea641ec3ae2
Git Commit Branch: heads/refs/tags/v4.0.15
UTC Build Time:    2021-09-13 03:10:53
Rust Version:      rustc 1.42.0-nightly (0de96d37f 2019-12-19)
Enable Features:   jemalloc mem-profiling portable sse protobuf-codec
Profile:           dist_release
  1. Operation logs

relevant br.log entries:

[2022/05/04 17:38:19.479 +00:00] [INFO] [client.go:896] ["try backup"] [startKey=74727864623B00FFFF3A14D6BEBABB6DFBC0128B656504E14837E614744D15CA7981586163F77B0939A37AA73064E1F9836986F096618B026ADCA0F1C7EB0CA9C2A075A4418DA9] [endKey=74727864623B01F16A2706A2140D46779368D753B72975EA2D98D1A9067185FF6DE94931CC446C] [storeID=649575725] ["retry time"=0]
[2022/05/04 17:38:24.135 +00:00] [WARN] [client.go:859] ["reset the connection in handleFineGrained"] [storeID=649575725]
[2022/05/04 17:38:24.135 +00:00] [INFO] [conn.go:277] ["Reset backup client"] [storeID=649575725]
[2022/05/04 17:38:54.136 +00:00] [WARN] [conn.go:291] ["failed to reset grpc connection, retry it"] ["retry time"=0] [error="[BR:Common:ErrFailedToConnect]failed to make connection to store 649575725"]
[2022/05/04 17:39:09.362 +00:00] [INFO] [client.go:896] ["try backup"] [startKey=74727864623B00FFFF3A14D6BEBABB6DFBC0128B656504E14837E614744D15CA7981586163F77B0939A37AA73064E1F9836986F096618B026ADCA0F1C7EB0CA9C2A075A4418DA9] [endKey=74727864623B01F16A2706A2140D46779368D753B72975EA2D98D1A9067185FF6DE94931CC446C] [storeID=649575725] ["retry time"=1]
[2022/05/04 17:39:09.401 +00:00] [INFO] [client.go:951] ["backup streaming finish"] [StoreID=649575725] ["retry time"=1]
[2022/05/04 17:39:09.415 +00:00] [INFO] [client.go:713] ["handle fine grained"] [backoffMs=10000]
[2022/05/04 17:39:09.438 +00:00] [WARN] [backoff.go:329] ["txnLock backoffer.maxSleep 80000ms is exceeded, errors:\n[BR:Common:ErrUnknown]internal error at 2022-05-04T17:37:18.643773397Z\n[BR:Common:ErrUnknown]internal error at 2022-05-04T17:38:16.265197102Z\n[BR:Common:ErrUnknown]internal error at 2022-05-04T17:39:09.415374193Z"]
[2022/05/04 17:39:09.448 +00:00] [INFO] [client.go:511] ["backup range finished"] [take=51h16m53.518837989s]
[2022/05/04 17:39:09.670 +00:00] [INFO] [collector.go:195] ["Raw backup failed summary"] [total-ranges=1] [ranges-succeed=0] [ranges-failed=1] [backup-total-regions=1121414] [unit-name="range start:00 end:ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"] [error="[tikv:9004]Resolve lock timeout"] [errorVerbose="[tikv:9004]Resolve lock timeout\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/backup.(*Client).fineGrainedBackup\n\tgithub.com/pingcap/br@/pkg/backup/client.go:718\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRange\n\tgithub.com/pingcap/br@/pkg/backup/client.go:544\ngithub.com/pingcap/br/pkg/task.RunBackupRaw\n\tgithub.com/pingcap/br@/pkg/task/backup_raw.go:202\nmain.runBackupRawCommand\n\tgithub.com/pingcap/br@/cmd/br/backup.go:44\nmain.newRawBackupCommand.func1\n\tgithub.com/pingcap/br@/cmd/br/backup.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:56\nruntime.main\n\truntime/proc.go:203\nruntime.goexit\n\truntime/asm_amd64.s:1357"]
[2022/05/04 17:39:09.714 +00:00] [ERROR] [backup.go:45] ["failed to backup raw kv"] [error="[tikv:9004]Resolve lock timeout"] [errorVerbose="[tikv:9004]Resolve lock timeout\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/backup.(*Client).fineGrainedBackup\n\tgithub.com/pingcap/br@/pkg/backup/client.go:718\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRange\n\tgithub.com/pingcap/br@/pkg/backup/client.go:544\ngithub.com/pingcap/br/pkg/task.RunBackupRaw\n\tgithub.com/pingcap/br@/pkg/task/backup_raw.go:202\nmain.runBackupRawCommand\n\tgithub.com/pingcap/br@/cmd/br/backup.go:44\nmain.newRawBackupCommand.func1\n\tgithub.com/pingcap/br@/cmd/br/backup.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:56\nruntime.main\n\truntime/proc.go:203\nruntime.goexit\n\truntime/asm_amd64.s:1357"] [stack="main.runBackupRawCommand\n\tgithub.com/pingcap/br@/cmd/br/backup.go:45\nmain.newRawBackupCommand.func1\n\tgithub.com/pingcap/br@/cmd/br/backup.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:56\nruntime.main\n\truntime/proc.go:203"]
[2022/05/04 17:39:09.730 +00:00] [ERROR] [main.go:58] ["br failed"] [error="[tikv:9004]Resolve lock timeout"] [errorVerbose="[tikv:9004]Resolve lock timeout\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/backup.(*Client).fineGrainedBackup\n\tgithub.com/pingcap/br@/pkg/backup/client.go:718\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRange\n\tgithub.com/pingcap/br@/pkg/backup/client.go:544\ngithub.com/pingcap/br/pkg/task.RunBackupRaw\n\tgithub.com/pingcap/br@/pkg/task/backup_raw.go:202\nmain.runBackupRawCommand\n\tgithub.com/pingcap/br@/cmd/br/backup.go:44\nmain.newRawBackupCommand.func1\n\tgithub.com/pingcap/br@/cmd/br/backup.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:56\nruntime.main\n\truntime/proc.go:203\nruntime.goexit\n\truntime/asm_amd64.s:1357"] [stack="main.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:58\nruntime.main\n\truntime/proc.go:203"]
  1. Configuration of the cluster and the task

  2. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible

@fschoell fschoell added the type/bug Something isn't working label May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant