Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv: Fix the issue that KV requests might be processed slower because the connections to some KV servers are slow to establish. #12733

Merged
merged 5 commits into from Oct 18, 2019

Conversation

lysu
Copy link
Collaborator

@lysu lysu commented Oct 15, 2019

What problem does this PR solve?

fix the potential problem that slow establishes the connection make other kv requests be slow

  • 5 min dial timeout doesn't work in superbatch
  • createConnArray will block wait conn establish on create tikvClient.BatchCommands
  • long time hold lock in createConnArray will block getConnArray

What is changed and how it works?

  • make establish grpc client in other goroutine which is blocking process
  • check estDone before send request, send directly if done
  • wait estReady channel if it's not done(only need select channel for first time)
  • set right dialTimeout for batch client creation

more words:

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • impl

Side effects

  • n/a

Related changes

  • Need to cherry-pick to the release branch

Release note

  • Fix the issue that KV requests might be processed slower because the connections to some KV servers are slow to establish.

This change is Reviewable

@lysu lysu added status/WIP type/bug-fix This PR fixes a bug. priority/release-blocker This PR blocks a release. Please review it ASAP. component/tikv labels Oct 15, 2019
@lysu
Copy link
Collaborator Author

lysu commented Oct 15, 2019

/bench

@sre-bot
Copy link
Contributor

sre-bot commented Oct 15, 2019

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: e80bab6d46869a77d56bdaef29d950bcbd4fdc40
+++ tidb: 7927cecc6c25a6843638795699c95d121c839165
tikv: 6b42cb67548e29d45e6f5e15da7ee9ba7099e993
pd: 381d36ba81d7616d8cd7e05c7672aa80a726f290
================================================================================
test-1: < oltp_point_select >
    * QPS : 139370.11 ± 0.7083% (std=626.58) delta: -0.19% (p=0.569)
    * AvgMs : 1.83 ± 0.8724% (std=0.01) delta: 0.49%
    * PercentileMs99 : 6.21 ± 0.0000% (std=0.00) delta: 0.00%
            
test-2: < oltp_read_write >
    * QPS : 41350.50 ± 1.7176% (std=415.34) delta: -0.58% (p=0.649)
    * AvgMs : 123.37 ± 0.1783% (std=0.14) delta: -0.16%
    * PercentileMs99 : 248.83 ± 0.0000% (std=0.00) delta: 0.00%
            
test-3: < oltp_insert >
    * QPS : 23847.62 ± 0.1959% (std=27.98) delta: -0.08% (p=0.782)
    * AvgMs : 10.73 ± 0.1864% (std=0.01) delta: 0.11%
    * PercentileMs99 : 22.05 ± 1.0614% (std=0.19) delta: 0.71%
            
test-4: < oltp_update_index >
    * QPS : 19240.50 ± 0.0552% (std=8.21) delta: 0.03% (p=0.551)
    * AvgMs : 13.26 ± 0.3771% (std=0.03) delta: -0.35%
    * PercentileMs99 : 30.81 ± 0.0000% (std=0.00) delta: 0.00%
            
test-5: < oltp_update_non_index >
    * QPS : 32242.45 ± 0.0556% (std=13.39) delta: -0.38% (p=0.091)
    * AvgMs : 7.94 ± 0.0840% (std=0.00) delta: 0.42%
    * PercentileMs99 : 20.74 ± 0.0000% (std=0.00) delta: 1.08%
            

https://perf.pingcap.com

@codecov
Copy link

codecov bot commented Oct 16, 2019

Codecov Report

Merging #12733 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #12733   +/-   ##
===========================================
  Coverage   80.0582%   80.0582%           
===========================================
  Files           465        465           
  Lines        106535     106535           
===========================================
  Hits          85290      85290           
  Misses        14897      14897           
  Partials       6348       6348

@lysu
Copy link
Collaborator Author

lysu commented Oct 16, 2019

/run-all-tests

@lysu lysu removed the status/WIP label Oct 16, 2019
@lysu lysu marked this pull request as ready for review October 16, 2019 08:52
@lysu
Copy link
Collaborator Author

lysu commented Oct 16, 2019

/bench

@lysu lysu changed the title tikv: non-blocking establish superbatch connection tikv: non-blocking establish superbatch connection with timeout Oct 16, 2019
@lysu lysu requested a review from coocood October 16, 2019 09:20
@sre-bot
Copy link
Contributor

sre-bot commented Oct 16, 2019

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: 6b403e5767e117e599489070a495f7c3a7da787c
+++ tidb: b275614654d1a23e54ea9670f00957c5e9beede2
tikv: f7f9a5944feea92df87368874ca0575fb87cf9b1
pd: 543309d2613cbe3bc6592fa77e0c7e87692750c0
================================================================================
test-1: < oltp_point_select >
    * QPS : 138924.20 ± 0.1087% (std=96.15) delta: -1.06% (p=0.006)
    * AvgMs : 1.84 ± 0.0000% (std=0.00) delta: 0.99%
    * PercentileMs99 : 6.21 ± 0.0000% (std=0.00) delta: 1.97%
            
test-2: < oltp_read_write >
    * QPS : 41579.01 ± 0.2375% (std=61.29) delta: 0.18% (p=0.926)
    * AvgMs : 123.67 ± 0.2070% (std=0.18) delta: -0.01%
    * PercentileMs99 : 248.83 ± 0.0000% (std=0.00) delta: 1.08%
            
test-3: < oltp_insert >
    * QPS : 23718.83 ± 0.0252% (std=4.51) delta: -0.23% (p=0.377)
    * AvgMs : 10.79 ± 0.0618% (std=0.00) delta: 0.23%
    * PercentileMs99 : 22.28 ± 0.0000% (std=0.00) delta: 1.06%
            
test-4: < oltp_update_index >
    * QPS : 19155.67 ± 1.5763% (std=208.50) delta: -0.81% (p=0.482)
    * AvgMs : 13.24 ± 0.0755% (std=0.01) delta: -0.06%
    * PercentileMs99 : 30.81 ± 0.0000% (std=0.00) delta: 0.00%
            
test-5: < oltp_update_non_index >
    * QPS : 32284.11 ± 0.3777% (std=103.40) delta: 0.03% (p=0.865)
    * AvgMs : 7.92 ± 0.3281% (std=0.02) delta: -0.08%
    * PercentileMs99 : 20.52 ± 1.0820% (std=0.18) delta: 0.00%
            

https://perf.pingcap.com

@lysu
Copy link
Collaborator Author

lysu commented Oct 17, 2019

PTAL @coocood @hicqu @tiancaiamao we need this in next release

@lysu lysu requested a review from cfzjywxk October 17, 2019 07:14
store/tikv/client_batch.go Outdated Show resolved Hide resolved
@lysu
Copy link
Collaborator Author

lysu commented Oct 17, 2019

/run-all-tests

1 similar comment
@zyxbest
Copy link
Contributor

zyxbest commented Oct 17, 2019

/run-all-tests

@zyxbest
Copy link
Contributor

zyxbest commented Oct 17, 2019

/run-unit-tests

@coocood
Copy link
Member

coocood commented Oct 17, 2019

LGTM

@lysu lysu added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 17, 2019
@lysu
Copy link
Collaborator Author

lysu commented Oct 17, 2019

/run-all-tests

@lysu
Copy link
Collaborator Author

lysu commented Oct 17, 2019

/rebuild

@tiancaiamao
Copy link
Contributor

LGTM

@tiancaiamao tiancaiamao added the status/LGT2 Indicates that a PR has LGTM 2. label Oct 18, 2019
@lysu
Copy link
Collaborator Author

lysu commented Oct 18, 2019

/run-all-tests

@lysu
Copy link
Collaborator Author

lysu commented Oct 18, 2019

/rebuild

@zyxbest
Copy link
Contributor

zyxbest commented Oct 18, 2019

/run-all-tests

@lysu
Copy link
Collaborator Author

lysu commented Oct 18, 2019

/bench

@lysu lysu removed the status/LGT1 Indicates that a PR has LGTM 1. label Oct 18, 2019
@coocood coocood added the status/can-merge Indicates a PR has been approved by a committer. label Oct 18, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Oct 18, 2019

/run-all-tests

@sre-bot sre-bot merged commit 4c989b0 into pingcap:master Oct 18, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Oct 18, 2019

cherry pick to release-3.1 failed

@sre-bot
Copy link
Contributor

sre-bot commented Oct 18, 2019

cherry pick to release-3.0 failed

@lysu lysu changed the title tikv: non-blocking establish superbatch connection with timeout tikv: Fix the issue that KV requests might be processed slower because the connections to some KV servers are slow to establish. Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv priority/release-blocker This PR blocks a release. Please review it ASAP. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bug-fix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants