Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd unexpected request time consumption with HTTPs v3 API #14077

Closed
Tracked by #14138
leslie-tsang opened this issue May 27, 2022 · 7 comments
Closed
Tracked by #14138

Etcd unexpected request time consumption with HTTPs v3 API #14077

leslie-tsang opened this issue May 27, 2022 · 7 comments
Labels

Comments

@leslie-tsang
Copy link

leslie-tsang commented May 27, 2022

What happened?

We use a API gateway APISIX to watch etcd cluster(auth and TLS enabled) resource with HTTPs request, the APISIX cluster will raise about 260 connection to etcd cluster.

In this scenario, the etcd operation with HTTPs request will speed too much time (more than 20s). the operation with gRPC protocol wouldn't spend that much time.

We use etcdctl and etcdkeeper as gRPC client

Simple conclusion

Etcd operation with RESTful request with TLS will spend too much time (more than 10s).
Etcd operation with gRPC request performing normal (within 0.5s).

Etcd use grpc-gateway to convert HTTP request to gRPC request, Once the auth is enabled, the perf of etcd cluster becomes lower, and then when TLS is enabled, the perf of the etcd cluster becomes even worse.

We had a try about etcd cluster without TLS, then it perf as normal one again.

What did you expect to happen?

With TLS enabled scenarios, etcd operation within 1s.

How can we reproduce it (as minimally and precisely as possible)?

  1. A auth and TLS enabled etcd cluster env can launch with apisix-etcd-mtls
  2. Follow the readme to init the test data for etcd cluster
  3. Reproduce the scenario with the command below
# fetch the v3 api token
curl -v --cacert ./cert/ca.pem \
    -L https://127.0.0.1:2379/v3/auth/authenticate \
    -X POST -d '{"name": "etcd.client", "password": "123456"}'

# export the token
export ETCD_AUTH_TOKEN='<the token fetched from v3 auth api>'

# create 260 watch connection, all curl cmd will timeout in 300s
for i in {1..260}; do \
curl -v --cacert ./cert/ca.pem \
    --max-time 300 \
    -L https://127.0.0.1:2379/v3/watch \
    -H "Authorization:${ETCD_AUTH_TOKEN}" \
    -X POST -d '{"create_request": {"key":"L3B1Yi9hYWFh"} }' & \
done

# try to read the key from etcd cluster
curl -v --cacert ./cert/ca.pem \
    -L https://127.0.0.1:2379/v3/kv/range \
    -H "Authorization:${ETCD_AUTH_TOKEN}" \
    -X POST -d '{"key": "L3B1Yi9hYWFh"}'

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.5.4
Git SHA: 08407ff76
Go Version: go1.16.15
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.5.4
API version: 3.5

Etcd configuration (command line flags or environment variables)

ETCD_CLIENT_KEY_FILE=/opt/etcd/ssl/client-key.pem
HOSTNAME=bc18fac70256
ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
PWD=/opt/bitnami/etcd
OS_FLAVOUR=debian-10
ETCD_ENABLE_V2=true
ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
ETCD_CLIENT_CERT_FILE=/opt/etcd/ssl/client.pem
HOME=/
ETCD_PEER_TRUSTED_CA_FILE=/opt/etcd/ssl/ca.pem
ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
ETCD_ENABLE_GRPC_GATEWAY=true
ETCD_NAME=etcd1
ETCD_PEER_CLIENT_CERT_AUTH=true
ETCD_ROOT_PASSWORD=123456
TERM=xterm
ETCD_KEY_FILE=/opt/etcd/ssl/server-key.pem
ETCD_PEER_KEY_FILE=/opt/etcd/ssl/server-key.pem
SHLVL=1
BITNAMI_APP_NAME=etcd
APP_VERSION=3.5.4
ETCD_PEER_CERT_FILE=/opt/etcd/ssl/server.pem
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_INITIAL_CLUSTER=etcd1=https://etcd1:2380,etcd2=https://etcd2:2380,etcd3=https://etcd3:2380
OS_NAME=linux
PATH=/opt/bitnami/common/bin:/opt/bitnami/etcd/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
ETCD_ADVERTISE_CLIENT_URLS=https://0.0.0.0:2379
ETCD_CERT_FILE=/opt/etcd/ssl/server.pem
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://etcd1:2380
OS_ARCH=amd64
_=/usr/bin/env

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table

> etcdctl --cacert="${ETCD_TRUSTED_CA_FILE}" member list -w table
+------------------+---------+-------+--------------------+----------------------+------------+
|        ID        | STATUS  | NAME  |     PEER ADDRS     |     CLIENT ADDRS     | IS LEARNER |
+------------------+---------+-------+--------------------+----------------------+------------+
| 1f6fd35e3327767a | started | etcd1 | https://etcd1:2380 | https://0.0.0.0:2379 |      false |
| 2a6277f8728ef760 | started | etcd3 | https://etcd3:2380 | https://0.0.0.0:2379 |      false |
| 4acd0a1e9189cd7a | started | etcd2 | https://etcd2:2380 | https://0.0.0.0:2379 |      false |
+------------------+---------+-------+--------------------+----------------------+------------+

$ etcdctl --endpoints=<member list> endpoint status -w table

> etcdctl --cacert="${ETCD_TRUSTED_CA_FILE}" --endpoints=etcd1:2379,etcd2:2379,etcd3:2379 endpoint status -w table
+------------+------------------+---------+---------+-----------+------------+-----------+------------+-----------------
---+--------+
|  ENDPOINT  |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| etcd1:2379 | 1f6fd35e3327767a |   3.5.4 |   20 kB |     false |      false |         5 |         15 |                 15 |        |
| etcd2:2379 | 4acd0a1e9189cd7a |   3.5.4 |   20 kB |      true |      false |         5 |         15 |                 15 |        |
| etcd3:2379 | 2a6277f8728ef760 |   3.5.4 |   20 kB |     false |      false |         5 |         15 |                 15 |        |
+------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

/debug/requests

Most elapse spend in send phase

Relevant log output

No response

@moonming
Copy link

same as #14065

@ahrtr
Copy link
Member

ahrtr commented May 27, 2022

This issue should can be fixed by pull/13985. The ConcurrentMaxStream is 250, so when you create 260 connections, obviously it exceeds the limitation.

@membphis
Copy link

This issue should can be fixed by pull/13985. The ConcurrentMaxStream is 250, so when you create 260 connections, obviously it exceeds the limitation.

Thanks a lot for your tips. We will reply you after we test your PR. ^_^

@leslie-tsang
Copy link
Author

@ahrtr @membphis After test, PR 13985 proved to be feasible in this scenario. :)

@serathius
Copy link
Member

serathius commented Jun 21, 2022

Is this issue fixed? can we close it?

@ahrtr
Copy link
Member

ahrtr commented Jun 21, 2022

The related PR isn't merged yet.

It seems that the contributor is still struggling to work on the PR. I may spend some time to get it resolved sometime this or next week.

@ahrtr
Copy link
Member

ahrtr commented Jul 13, 2022

Resolved by #14169 and #14219

@ahrtr ahrtr closed this as completed Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants