Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lot of sockets left in CLOSE_WAIT state #5872

Closed
csahli opened this issue Nov 29, 2018 · 7 comments
Closed

Lot of sockets left in CLOSE_WAIT state #5872

csahli opened this issue Nov 29, 2018 · 7 comments
Labels
bug Used to indicate a potential bug storage/dynamodb
Milestone

Comments

@csahli
Copy link

csahli commented Nov 29, 2018

Describe the bug
Observed Vault process left a lot of sockets in CLOSE_WAIT state. Socket in CLOSE_WAIT state are supposed to be released by the process owning the socket by issuing a close() system call. For any reason, vault is not releasing the sockets and reach the MAX OPEN FILE limit leading to a service failure.

To Reproduce
Steps to reproduce the behavior:

  1. set vault address parameter in listen tcp stanza using a fqdn instead of ip address
  2. start vault systemctl start vault
  3. every vault api request will create a socket which will stay in CLOSE_WAIT state
  4. once number of request to vault api reach the MAX OPEN FILE limit, vault process become unresponsive.

Expected behavior
Vault process should release any socket in CLOSE_WAIT state.

Environment:

  • Vault Server Version (retrieve with vault status): Version 0.11.3
  • Vault CLI Version (retrieve with vault version):
    Vault v0.11.3 ('fb601237bfbe4bc16ff679f642248ee8a86e627b')
  • Server Operating System/Architecture:
    Red Hat Enterprise Linux Server release 7.6 (Maipo)
    Linux foo.bar.foobar.io 3.10.0-957.el7.x86_64 Initial Website Import #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Infrastructure: AWS EC2
  • Storage Backend: AWS DynamoDB
  • Vault deployment: Cluster of 3 vault instances accessed via an AWS Classic LB.

Vault server configuration file(s):

cluster_name = "dc1"
max_lease_ttl = "768h"
default_lease_ttl = "768h"

listener "tcp" {
    address = "foo.bar.foobar.io:8200"
    tls_cert_file = "/etc/vault.d/tls/vault.crt"
    tls_key_file  = "/etc/vault.d/tls/vault.key"
    tls_min_version  = "tls12"
    tls_cipher_suites = "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA"
    tls_prefer_server_cipher_suites = "false"
  tls_disable = "false"
}

api_addr = "http://foo.bar.foobar.io:8200"

storage "dynamodb" {
  ha_enabled        = "true"
  region            = "eu-west-1"
  endpoint          = ""
  table             = "vault-data"
  session_token     = "Hashicorp Vault"
  max_parallel      = "128"
  read_capacity     = "5"
  write_capacity    = "5"
}

Additional context
[root@foo ~]# systemctl status vault -l

● vault.service - Vault secret management tool
   Loaded: loaded (/usr/lib/systemd/system/vault.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-11-20 15:37:34 UTC; 1 weeks 1 days ago
 Main PID: 25708 (vault)
   CGroup: /system.slice/vault.service
           └─25708 /usr/local/bin/vault server -config=/etc/vault.d/vault_main.hcl -log-level=info

Nov 29 15:15:26 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:26.175Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s
Nov 29 15:15:27 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:27.175Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s
Nov 29 15:15:28 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:28.175Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s
Nov 29 15:15:29 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:29.176Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s
Nov 29 15:15:30 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:30.176Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s
Nov 29 15:15:31 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:31.176Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s
Nov 29 15:15:32 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:32.177Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s
Nov 29 15:15:33 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:33.177Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s
Nov 29 15:15:34 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:34.177Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s
Nov 29 15:15:35 foo.bar.foobar.io vault[25708]: 2018-11-29T15:15:35.177Z [INFO ] http: Accept error: accept tcp 172.18.32.152:8200: accept4: too many open files; retrying in 1s

[root@foo ~]# netstat -antplu | grep 8200

tcp      129      0 172.18.32.152:8200      0.0.0.0:*               LISTEN      25708/vault         
tcp        0      0 172.18.32.152:8200      172.18.32.161:58322     SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.161:58334     SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.89:14386      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.161:58344     SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.152:52634     SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.33.40:38230      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.33.40:38218      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.161:58324     SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.89:14366      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.89:14384      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.152:52638     SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.33.40:38238      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.161:58356     SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.33.40:38208      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.89:14398      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.33.40:38254      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.89:14408      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.161:58372     SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.33.40:38244      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.89:14368      SYN_RECV    -                   
tcp        0      0 172.18.32.152:8200      172.18.32.161:58352     SYN_RECV    -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37580      CLOSE_WAIT  -                   
tcp        0    246 172.18.32.152:52638     172.18.32.152:8200      FIN_WAIT1   -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57878     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37416      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13932      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57708     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37534      CLOSE_WAIT  -                   
tcp      148      0 172.18.32.152:8200      172.18.32.161:15181     ESTABLISHED -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13994      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37512      CLOSE_WAIT  -                   
tcp      148      0 172.18.32.152:8200      172.18.33.40:29859      ESTABLISHED -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37440      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37336      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13838      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13984      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37542      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37346      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37392      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37504      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37348      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:14004      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57844     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37434      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37386      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37530      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57750     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37540      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37478      CLOSE_WAIT  -                   
tcp      301      0 172.18.32.152:8200      172.18.33.40:37428      CLOSE_WAIT  -                   
tcp      148      0 172.18.32.152:8200      172.18.32.89:39293      ESTABLISHED -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57934     CLOSE_WAIT  -                   
tcp      246      0 172.18.32.152:8200      172.18.32.152:44684     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37522      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13964      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13850      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37394      CLOSE_WAIT  -                   
tcp        0    246 172.18.32.152:52634     172.18.32.152:8200      FIN_WAIT1   -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37458      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37374      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37474      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57716     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37588      CLOSE_WAIT  -                   
tcp      246      0 172.18.32.152:8200      172.18.32.152:44688     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37486      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57868     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37418      CLOSE_WAIT  -                   
tcp      246      0 172.18.32.152:8200      172.18.32.152:44690     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37480      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37578      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13802      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:14022      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13960      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13818      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37372      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37500      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57906     CLOSE_WAIT  -                   
tcp      246      0 172.18.32.152:8200      172.18.32.152:44668     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37490      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37568      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37370      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37576      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37412      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13922      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37524      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57886     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37538      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37586      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13828      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37574      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37596      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57728     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37496      CLOSE_WAIT  -                   
tcp      246      0 172.18.32.152:8200      172.18.32.152:44694     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37464      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37264      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37368      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37508      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37278      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37572      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37528      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57860     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37430      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37454      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37532      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37514      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37248      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57922     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37268      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37450      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37436      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37390      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37362      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37502      CLOSE_WAIT  -                   
tcp        0    246 172.18.32.152:52632     172.18.32.152:8200      FIN_WAIT1   -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37582      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37598      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37484      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.89:13902      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37452      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57830     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37462      CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.32.161:57810     CLOSE_WAIT  -                   
tcp      149      0 172.18.32.152:8200      172.18.33.40:37408      CLOSE_WAIT  -                   
@jefferai
Copy link
Member

set vault address parameter in listen tcp stanza using a fqdn instead of ip address

Does this make a difference?

@jefferai
Copy link
Member

Also, can you comment on the IP addresses on the other end? Are they random clients, do they correspond to other Vault servers, etc.?

@jefferai jefferai added this to the 1.0 milestone Nov 29, 2018
@jefferai
Copy link
Member

Can you take a look at Vault server logs/client logs and see if you're seeing errors? e.g. are you seeing any panics?

@csahli
Copy link
Author

csahli commented Nov 30, 2018

Sorry, I mislead you, my mistake. The problem is not related to a vault parameter in the configuration file. I just described what I did to face the error.
I dive deeper and the problem looks to be related to DynamoDB backend storage.

As you can see at the vault configuration level we are using DynamoDB backend storage with the following DynamoDB stanza parameters:

   read_capacity    = "5"
   write_capacity    = "5"

On AWS DynamoDB table configuration side read/write capacity was set as follow:

Provisioned read capacity units: 5 (Auto Scaling Disabled)
Provisioned write capacity units: 5 (Auto Scaling Disabled)

Basically, vault read/write capacity was set to DynamoDB provisioned read/write capacity. But as we have 3 nodes configured, every node configured with a read/write capacity of 5 we actually exceed the DynamoDB table capacity.

And here is what was visible in the /var/log/messages system log file as vault is logging here as a systemd unit:

2018-11-26T16:54:55.871050+00:00 foo.bar.foobar.io vault: 2018-11-26T16:54:55.870Z [INFO ] core: security barrier not initialized
2018-11-28T17:05:41.567852+00:00 foo.bar.foobar.io vault: 2018-11-28T17:05:41.564Z [INFO ] core: vault is unsealed
2018-11-28T17:05:41.568139+00:00 foo.bar.foobar.io vault: 2018-11-28T17:05:41.566Z [INFO ] core: entering standby mode
2018-11-28T17:25:57.139296+00:00 foo.bar.foobar.io vault: 2018-11-28T17:25:57.136Z [ERROR] core: barrier init check failed: error="failed to check for initialization: ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.
2018-11-28T17:25:57.139578+00:00 foo.bar.foobar.io vault: status code: 400, request id: FO1EVFHQQ9TO2NHBVS083F8G03VV4KQNSO5AEMVJF66Q9ASUAAJG"
2018-11-28T17:25:57.139794+00:00 foo.bar.foobar.io vault: 2018-11-28T17:25:57.137Z [ERROR] core: error checking health: error="failed to check for initialization: ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.
2018-11-28T17:25:57.140005+00:00 foo.bar.foobar.io vault: status code: 400, request id: FO1EVFHQQ9TO2NHBVS083F8G03VV4KQNSO5AEMVJF66Q9ASUAAJG"
2018-11-28T17:26:06.524964+00:00 foo.bar.foobar.io vault: 2018-11-28T17:26:06.524Z [ERROR] core: barrier init check failed: error="failed to check for initialization: ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.
2018-11-28T17:26:06.525299+00:00 foo.bar.foobar.io vault: status code: 400, request id: TP3TO5DJA3DAUTCOGIGICP4C3RVV4KQNSO5AEMVJF66Q9ASUAAJG"
2018-11-28T17:26:06.525530+00:00 foo.bar.foobar.io vault: 2018-11-28T17:26:06.524Z [ERROR] core: error checking health: error="failed to check for initialization: ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.
2018-11-28T17:26:06.525756+00:00 foo.bar.foobar.io vault: status code: 400, request id: TP3TO5DJA3DAUTCOGIGICP4C3RVV4KQNSO5AEMVJF66Q9ASUAAJG"
2018-11-28T17:26:16.526527+00:00 foo.bar.foobar.io vault: 2018-11-28T17:26:16.526Z [ERROR] core: barrier init check failed: error="failed to check for initialization: ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API.
2018-11-28T17:26:16.526795+00:00 foo.bar.foobar.io vault: status code: 400, request id: 9SRQ1V6P0U2MA4O796AH9Q4FOFVV4KQNSO5AEMVJF66Q9ASUAAJG"

So to solve the issue I just triple the DynamoDB table configuration read/write capacity to make it consistent with the number of vault nodes

Provisioned read capacity units : 15 (Auto Scaling Disabled)
Provisioned write capacity units: 15 (Auto Scaling Disabled)

Basically DynamoDB read/write capacity must be compliant with following formulas:

AWS DynamoDB table provisioning read capacity  = (Vault read capacity * number of vault node)
AWS DynamoDB table provisioning write capacity = (Vault write capacity * number of vault node)

This is the solution to solve the issue I face but the problem remains. Vault dynamoDB storage backend should handle the problem differently and should correctly release the resources in case of exception. As it looks like vault doesn't release the socket because it is not handling this throttling error correctly.

So I would propose to:

  • Enhance the dynamoDB backend storage documentation to take into account the number of nodes to set the read/write capacity
  • Update the dynamoDB backend storage code to better handle capacity limit reach problems so that if the vault node exceeds the read/write capacity after the retries, it releases the resources correctly.

@csahli csahli closed this as completed Nov 30, 2018
@csahli csahli reopened this Nov 30, 2018
@jefferai
Copy link
Member

jefferai commented Dec 3, 2018

The Dynamo code uses the aws sdk so this is probably a bug at that layer, though unsure.

@jefferai jefferai modified the milestones: 1.0, not-scheduled Dec 3, 2018
@michelvocks michelvocks added bug Used to indicate a potential bug storage/dynamodb labels Nov 12, 2019
@pbernal pbernal modified the milestones: not-scheduled, triaged May 28, 2020
@aphorise
Copy link
Contributor

aphorise commented Aug 13, 2020

I was not able to reproduce this and I believe that in any event it may be resolved using OS specific tuning (like below) if not being addressed in DynamoDB plugin portion of Vault. Other suspects here that differed to my own testing which was outside AWS are LB & OS (I did not use RHEL).

# /etc/sysctl.conf
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

sysctl -p ;

Is this issue still applicable?

@vishalnayak
Copy link
Member

Issues that are not reproducible and/or have not had any interaction for a long time are stale issues. Sometimes even the valid issues remain stale lacking traction either by the maintainers or the community. In order to provide faster responses and better engagement with the community, we strive to keep the issue tracker clean and the issue count low. In this regard, our current policy is to close stale issues after 30 days. If a feature request is being closed, it means that it is not on the product roadmap. Closed issues will still be indexed and available for future viewers. If users feel that the issue is still relevant but is wrongly closed, we encourage reopening them.

Please refer to our contributing guidelines for details on issue lifecycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug storage/dynamodb
Projects
None yet
Development

No branches or pull requests

8 participants