We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We are trying to do a full backup of our raw kv store into our ceph instance using this command:
br backup raw --cf default --pd "http://127.0.0.1:2379" --s3.provider "ceph" --s3.endpoint "http://ceph-internal:7480" -s "s3://tikv-backups/<date>" --send-credentials-to-tikv=true --s3.region "none" --ratelimit 3 --log-file /var/log/tikv-backup.log --start 00 --end ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Our TikV database holds around 70 TB of data and a few rather large key value pairs (around 40 MB).
We expected the backup to run without errors.
The backup fails with error="[tikv:9004]Resolve lock timeout"]
error="[tikv:9004]Resolve lock timeout"]
$ br -V Release Version: v4.0.15 Git Commit Hash: 2ca9fa25f15aec061e281ad84f77dacf622ac9c3 Git Branch: heads/refs/tags/v4.0.15 Go Version: go1.13 UTC Build Time: 2021-09-23 09:48:49 Race Enabled: false
$ pd-server -V Release Version: v4.0.15 Edition: Community Git Commit Hash: 605dc238b9a9e54491a5e6b0843f30de33421402 Git Branch: heads/refs/tags/v4.0.15 UTC Build Time: 2021-09-23 09:48:07
$ tikv-server -V TiKV Release Version: 4.0.15 Edition: Community Git Commit Hash: 0a0482dee0a0cf135f56b9f90b412ea641ec3ae2 Git Commit Branch: heads/refs/tags/v4.0.15 UTC Build Time: 2021-09-13 03:10:53 Rust Version: rustc 1.42.0-nightly (0de96d37f 2019-12-19) Enable Features: jemalloc mem-profiling portable sse protobuf-codec Profile: dist_release
relevant br.log entries:
[2022/05/04 17:38:19.479 +00:00] [INFO] [client.go:896] ["try backup"] [startKey=74727864623B00FFFF3A14D6BEBABB6DFBC0128B656504E14837E614744D15CA7981586163F77B0939A37AA73064E1F9836986F096618B026ADCA0F1C7EB0CA9C2A075A4418DA9] [endKey=74727864623B01F16A2706A2140D46779368D753B72975EA2D98D1A9067185FF6DE94931CC446C] [storeID=649575725] ["retry time"=0] [2022/05/04 17:38:24.135 +00:00] [WARN] [client.go:859] ["reset the connection in handleFineGrained"] [storeID=649575725] [2022/05/04 17:38:24.135 +00:00] [INFO] [conn.go:277] ["Reset backup client"] [storeID=649575725] [2022/05/04 17:38:54.136 +00:00] [WARN] [conn.go:291] ["failed to reset grpc connection, retry it"] ["retry time"=0] [error="[BR:Common:ErrFailedToConnect]failed to make connection to store 649575725"] [2022/05/04 17:39:09.362 +00:00] [INFO] [client.go:896] ["try backup"] [startKey=74727864623B00FFFF3A14D6BEBABB6DFBC0128B656504E14837E614744D15CA7981586163F77B0939A37AA73064E1F9836986F096618B026ADCA0F1C7EB0CA9C2A075A4418DA9] [endKey=74727864623B01F16A2706A2140D46779368D753B72975EA2D98D1A9067185FF6DE94931CC446C] [storeID=649575725] ["retry time"=1] [2022/05/04 17:39:09.401 +00:00] [INFO] [client.go:951] ["backup streaming finish"] [StoreID=649575725] ["retry time"=1] [2022/05/04 17:39:09.415 +00:00] [INFO] [client.go:713] ["handle fine grained"] [backoffMs=10000] [2022/05/04 17:39:09.438 +00:00] [WARN] [backoff.go:329] ["txnLock backoffer.maxSleep 80000ms is exceeded, errors:\n[BR:Common:ErrUnknown]internal error at 2022-05-04T17:37:18.643773397Z\n[BR:Common:ErrUnknown]internal error at 2022-05-04T17:38:16.265197102Z\n[BR:Common:ErrUnknown]internal error at 2022-05-04T17:39:09.415374193Z"] [2022/05/04 17:39:09.448 +00:00] [INFO] [client.go:511] ["backup range finished"] [take=51h16m53.518837989s] [2022/05/04 17:39:09.670 +00:00] [INFO] [collector.go:195] ["Raw backup failed summary"] [total-ranges=1] [ranges-succeed=0] [ranges-failed=1] [backup-total-regions=1121414] [unit-name="range start:00 end:ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"] [error="[tikv:9004]Resolve lock timeout"] [errorVerbose="[tikv:9004]Resolve lock timeout\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/backup.(*Client).fineGrainedBackup\n\tgithub.com/pingcap/br@/pkg/backup/client.go:718\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRange\n\tgithub.com/pingcap/br@/pkg/backup/client.go:544\ngithub.com/pingcap/br/pkg/task.RunBackupRaw\n\tgithub.com/pingcap/br@/pkg/task/backup_raw.go:202\nmain.runBackupRawCommand\n\tgithub.com/pingcap/br@/cmd/br/backup.go:44\nmain.newRawBackupCommand.func1\n\tgithub.com/pingcap/br@/cmd/br/backup.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:56\nruntime.main\n\truntime/proc.go:203\nruntime.goexit\n\truntime/asm_amd64.s:1357"] [2022/05/04 17:39:09.714 +00:00] [ERROR] [backup.go:45] ["failed to backup raw kv"] [error="[tikv:9004]Resolve lock timeout"] [errorVerbose="[tikv:9004]Resolve lock timeout\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/backup.(*Client).fineGrainedBackup\n\tgithub.com/pingcap/br@/pkg/backup/client.go:718\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRange\n\tgithub.com/pingcap/br@/pkg/backup/client.go:544\ngithub.com/pingcap/br/pkg/task.RunBackupRaw\n\tgithub.com/pingcap/br@/pkg/task/backup_raw.go:202\nmain.runBackupRawCommand\n\tgithub.com/pingcap/br@/cmd/br/backup.go:44\nmain.newRawBackupCommand.func1\n\tgithub.com/pingcap/br@/cmd/br/backup.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:56\nruntime.main\n\truntime/proc.go:203\nruntime.goexit\n\truntime/asm_amd64.s:1357"] [stack="main.runBackupRawCommand\n\tgithub.com/pingcap/br@/cmd/br/backup.go:45\nmain.newRawBackupCommand.func1\n\tgithub.com/pingcap/br@/cmd/br/backup.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:56\nruntime.main\n\truntime/proc.go:203"] [2022/05/04 17:39:09.730 +00:00] [ERROR] [main.go:58] ["br failed"] [error="[tikv:9004]Resolve lock timeout"] [errorVerbose="[tikv:9004]Resolve lock timeout\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/backup.(*Client).fineGrainedBackup\n\tgithub.com/pingcap/br@/pkg/backup/client.go:718\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRange\n\tgithub.com/pingcap/br@/pkg/backup/client.go:544\ngithub.com/pingcap/br/pkg/task.RunBackupRaw\n\tgithub.com/pingcap/br@/pkg/task/backup_raw.go:202\nmain.runBackupRawCommand\n\tgithub.com/pingcap/br@/cmd/br/backup.go:44\nmain.newRawBackupCommand.func1\n\tgithub.com/pingcap/br@/cmd/br/backup.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:56\nruntime.main\n\truntime/proc.go:203\nruntime.goexit\n\truntime/asm_amd64.s:1357"] [stack="main.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:58\nruntime.main\n\truntime/proc.go:203"]
Configuration of the cluster and the task
Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible
The text was updated successfully, but these errors were encountered:
No branches or pull requests
We are trying to do a full backup of our raw kv store into our ceph instance using this command:
Our TikV database holds around 70 TB of data and a few rather large key value pairs (around 40 MB).
We expected the backup to run without errors.
The backup fails with
error="[tikv:9004]Resolve lock timeout"]
relevant br.log entries:
Configuration of the cluster and the task
Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible
The text was updated successfully, but these errors were encountered: