Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br restore failed when split range and pd unavailable for in 3-5s, which is not expected #1305

Open
Tammyxia opened this issue Jul 1, 2021 · 0 comments

Comments

@Tammyxia
Copy link

Tammyxia commented Jul 1, 2021

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
  • br restore full to S3
  • tiup cluster restart xxx -R pd, the tidb cluster has only one pd, so pd unavaible for only 3-5s.
  • br restore failed.
  1. What did you expect to see?
    br restore can tolerate 1-3minutes when split range and pd unavailable

  2. What did you see instead?
    br log:
    [2021/07/01 14:11:26.245 +08:00] [INFO] [base_client.go:296] ["[pd] cannot update member from this address"] [address=http://172.16.6.6:12379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: connection refused" target:172.16.6.6:12379 status:TRANSIENT_FAILURE"]
    [2021/07/01 14:11:26.245 +08:00] [ERROR] [base_client.go:166] ["[pd] failed updateMember"] [error="[PD:client:ErrClientGetLeader]get leader from [http://172.16.6.6:12379] error"] [stack="github.com/tikv/pd/client.(*baseClient).memberLoop\n\tgithub.com/tikv/pd@v1.1.0-beta.0.20210323121136-78679e5e209d/client/base_client.go:166"]

...
[2021/07/01 14:11:26.855 +08:00] [ERROR] [base_client.go:166] ["[pd] failed updateMember"] [error="[PD:client:ErrClientGetLeader]get leader from [http://172.16.6.6:12379] error"] [stack="github.com/tikv/pd/client.(*baseClient).memberLoop\n\tgithub.com/tikv/pd@v1.1.0-beta.
0.20210323121136-78679e5e209d/client/base_client.go:166"]
[2021/07/01 14:11:26.855 +08:00] [ERROR] [pipeline_items.go:236] ["failed on split range"] [ranges="{total=178,ranges="[\"[7480000000000014855F69800000000000000300, 7480000000000014855F698000000000000003FB)\",\"(skip 176)\",\"[74800000000000F6075F72000000000000
0000, 74800000000000F6075F72FFFFFFFFFFFFFFFF00)\"]",totalFiles=205,totalKVs=5309510,totalBytes=737344350,totalSize=737344350}"] [error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: co
nnection refused""] [errorVerbose="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: connection refused"\ngithub.com/tikv/pd/client.(*client).ScanRegions\n\tgithub.com/tikv/pd@v1.1.0-beta.0
.20210323121136-78679e5e209d/client/client.go:1100\ngithub.com/pingcap/br/pkg/restore.(*pdClient).ScanRegions\n\tgithub.com/pingcap/br/pkg/restore/split_client.go:385\ngithub.com/pingcap/br/pkg/restore.PaginateScanRegion\n\tgithub.com/pingcap/br/pkg/restore/split.go:298\n
github.com/pingcap/br/pkg/restore.(*RegionSplitter).Split\n\tgithub.com/pingcap/br/pkg/restore/split.go:113\ngithub.com/pingcap/br/pkg/restore.SplitRanges\n\tgithub.com/pingcap/br/pkg/restore/util.go:390\ngithub.com/pingcap/br/pkg/restore.(*tikvSender).splitWorker\n\tgith
ub.com/pingcap/br/pkg/restore/pipeline_items.go:235\nruntime.goexit\n\truntime/asm_amd64.s:1371"] [stack="github.com/pingcap/br/pkg/restore.(*tikvSender).splitWorker\n\tgithub.com/pingcap/br/pkg/restore/pipeline_items.go:236"]
...

[2021/07/01 14:11:29.487 +08:00] [ERROR] [restore.go:35] ["failed to restore"] [error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: connection refused""] [errorVerbose="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.16.6.6:12379: connect: connection refused\

  1. What version of BR and TiDB/TiKV/PD are you using?
  1. Operation logs

    • Please upload br.log for BR if possible
    • Please upload tidb-lightning.log for TiDB-Lightning if possible
    • Please upload tikv-importer.log from TiKV-Importer if possible
    • Other interesting logs
  2. Configuration of the cluster and the task

    • tidb-lightning.toml for TiDB-Lightning if possible
    • tikv-importer.toml for TiKV-Importer if possible
    • topology.yml if deployed by TiUP
  3. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment