Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Standby still causing health check to fail #397

Closed
Ardun21 opened this issue Mar 31, 2020 · 3 comments
Closed

Performance Standby still causing health check to fail #397

Ardun21 opened this issue Mar 31, 2020 · 3 comments

Comments

@Ardun21
Copy link

Ardun21 commented Mar 31, 2020

After updating to spring-cloud-starter-vault-config version 2.2.2.RELEASE, I'm still seeing the health check report a "DOWN" status for a Vault Enterprise node which is running performance standby mode. Looking through the history of this project, it appears as though the commit to address this was added in 2.2.0, but I've tried with 2.2.0, 2.2.1, and 2.2.2 and each time I get the same results when I hit the actuator/health endpoint:

org.springframework.web.client.UnknownHttpStatusCodeException: 473 status code 473: [{"initialized":true,"sealed":false,"standby":true,"performance_standby":true,"replication_performance_mode":"disabled","replication_dr_mode":"disabled","server_time_utc":1585688437,"version":"1.2.3+prem","cluster_name":"vault-cluster-f21cff50","cluster_id":"a5137abb-6a0c-058f-2679-00dceb119c1a"}

@mp911de
Copy link
Member

mp911de commented Apr 1, 2020

How can we reproduce a performance-standby node?

@Ardun21
Copy link
Author

Ardun21 commented Apr 1, 2020

We tested this against a Vault Enterprise v1.2.3 HA cluster (using Consul as the back-end). Our cluster consists of 3 Vault Enterprise nodes and 5 Consul OSS nodes, but as long as you have at least two Vault Enterprise nodes running in some sort of HA cluster, I believe you should see this issue.

I just pointed my test app directly at one of the HA standby nodes (which by default run in "performance standby" mode on Vault Enterprise) and I was able to produce the health check failure.

You can verify that a given Vault Enterprise node is in performance standby mode by checking the response body of the sys/health API:

curl -k $VAULT_ADDR/v1/sys/health | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   297  100   297    0     0    461      0 --:--:-- --:--:-- --:--:--   461
{
  "initialized": true,
  "sealed": false,
  "standby": true,
  "performance_standby": true,
  "replication_performance_mode": "disabled",
  "replication_dr_mode": "disabled",
  "server_time_utc": 1585758353,
  "version": "1.2.3+prem",
  "cluster_name": "vault-cluster-cd10e9e9",
  "cluster_id": "ae4fa9b2-d2ca-33d5-2d80-53255cdbdd55"
}

If you check the headers, you'll also see that performance standby nodes return a unique HTTP code, 473, which is referenced both in the above error and in the sys/health docs

mp911de added a commit that referenced this issue Apr 2, 2020
Add test for the reactive health indicator.

See gh-397
@mp911de
Copy link
Member

mp911de commented Apr 2, 2020

Thanks. The issue is caused only when using the synchronous API. The reactive API is not affected. The underlying cause is that Spring Framework's RestTemplate (DefaultResponseErrorHandler) consumes the error response body twice. Once to assembly the error message and once for the responseBody in UnknownHttpStatusCodeExceptionin prior to Spring Framework 5.2.5.

The issue is already addressed within Spring Framework 5.2.5 (see spring-projects/spring-framework#24595) which then just requires an upgrade on your side.

@mp911de mp911de closed this as completed Apr 2, 2020
spencergibb pushed a commit that referenced this issue Sep 14, 2023
Add test for the reactive health indicator.

See gh-397
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants