Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault-driven Consul TTL checks #1349

Merged
merged 21 commits into from Apr 26, 2016
Merged

Vault-driven Consul TTL checks #1349

merged 21 commits into from Apr 26, 2016

Conversation

sean-
Copy link
Contributor

@sean- sean- commented Apr 24, 2016

Vault will now register itself with Consul. The active node can be found using active.vault.service.consul. All standby vaults are available via standby.vault.service.consul. All unsealed vaults are considered healthy and available via vault.service.consul. Change in status and registration is event driven and should happen at the speed of a write to Consul (~network RTT + fsync(2) of the Consul Servers).

Healthy/active:

curl -X GET 'http://127.0.0.1:8500/v1/health/service/vault?pretty'; && echo;
[
    {
        "Node": {
            "Node": "vm1",
            "Address": "127.0.0.1",
            "TaggedAddresses": {
                "wan": "127.0.0.1"
            },
            "CreateIndex": 3,
            "ModifyIndex": 20
        },
        "Service": {
            "ID": "vault:127.0.0.1:8200",
            "Service": "vault",
            "Tags": [
                "active"
            ],
            "Address": "127.0.0.1",
            "Port": 8200,
            "EnableTagOverride": false,
            "CreateIndex": 17,
            "ModifyIndex": 20
        },
        "Checks": [
            {
                "Node": "vm1",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "CreateIndex": 3,
                "ModifyIndex": 3
            },
            {
                "Node": "vm1",
                "CheckID": "vault-sealed-check",
                "Name": "Vault Sealed Status",
                "Status": "passing",
                "Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
                "Output": "",
                "ServiceID": "vault:127.0.0.1:8200",
                "ServiceName": "vault",
                "CreateIndex": 19,
                "ModifyIndex": 19
            }
        ]
    }
]

Healthy/standby:

[snip]
        "Service": {
            "ID": "vault:127.0.0.2:8200",
            "Service": "vault",
            "Tags": [
                "standby"
            ],
            "Address": "127.0.0.2",
            "Port": 8200,
            "EnableTagOverride": false,
            "CreateIndex": 17,
            "ModifyIndex": 20
        },
        "Checks": [
            {
                "Node": "vm2",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "CreateIndex": 3,
                "ModifyIndex": 3
            },
            {
                "Node": "vm2",
                "CheckID": "vault-sealed-check",
                "Name": "Vault Sealed Status",
                "Status": "passing",
                "Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
                "Output": "",
                "ServiceID": "vault:127.0.0.2:8200",
                "ServiceName": "vault",
                "CreateIndex": 19,
                "ModifyIndex": 19
            }
        ]
    }
]

Sealed:

        "Checks": [
            {
                "Node": "vm2",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "CreateIndex": 3,
                "ModifyIndex": 3
            },
            {
                "Node": "vm2",
                "CheckID": "vault-sealed-check",
                "Name": "Vault Sealed Status",
                "Status": "critical",
                "Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
                "Output": "Vault Sealed",
                "ServiceID": "vault:127.0.0.2:8200",
                "ServiceName": "vault",
                "CreateIndex": 19,
                "ModifyIndex": 38
            }
        ]

@sean- sean- self-assigned this Apr 24, 2016
@jefferai jefferai changed the title F vault service Vault-driven Consul TTL checks Apr 25, 2016
* `disable_registration` (optional) - If true, then Vault will not register
itself with Vault. If the Consul Agent for Vault and the Consul Servers
are older than `0.6.4` it is required to set this to "true" due to API
incompatibilities. Defaults to "false".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way we can auto detect this rather than force all of our Consul users -- most of whom won't be on 0.6.4 -- to have to change their configuration files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we need to make registration opt-in. Aside from potentially causing problems for people upgrading, we don't want to cause issues for people that already have configured Consul health checks advertising vault

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: first point: there isn't any builtin way to do this, no. We could finger print the Consul server and then change our behavior based on the support of the Consul Servers and agent, but that functionality doesn't exist out of the box anywhere atm.

re: point #2, the worst that would happen is a double registration.

Let's touch base re: this today to hash out next steps on these points (I wrestled with them before hitting enter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: #1, revised the API call because we're never passing more than 1K of output. There is no longer a dependency on Consul 0.6.4.

@vishalnayak
Copy link
Member

@sean- Nicely done! LGTM!
Over to @jefferai for his take..

sean- added 12 commits April 25, 2016 18:00
Hook asynchronous notifications into Core to change the status of vault based on its active/standby, and sealed/unsealed status.
Vault will now register itself with Consul.  The active node can be found using `active.vault.service.consul`.  All standby vaults are available via `standby.vault.service.consul`.  All unsealed vaults are considered healthy and available via `vault.service.consul`.  Change in status and registration is event driven and should happen at the speed of a write to Consul (~network RTT + ~1x fsync(2)).

Healthy/active:

```
curl -X GET 'http://127.0.0.1:8500/v1/health/service/vault?pretty' && echo;
[
    {
        "Node": {
            "Node": "vm1",
            "Address": "127.0.0.1",
            "TaggedAddresses": {
                "wan": "127.0.0.1"
            },
            "CreateIndex": 3,
            "ModifyIndex": 20
        },
        "Service": {
            "ID": "vault:127.0.0.1:8200",
            "Service": "vault",
            "Tags": [
                "active"
            ],
            "Address": "127.0.0.1",
            "Port": 8200,
            "EnableTagOverride": false,
            "CreateIndex": 17,
            "ModifyIndex": 20
        },
        "Checks": [
            {
                "Node": "vm1",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "CreateIndex": 3,
                "ModifyIndex": 3
            },
            {
                "Node": "vm1",
                "CheckID": "vault-sealed-check",
                "Name": "Vault Sealed Status",
                "Status": "passing",
                "Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
                "Output": "",
                "ServiceID": "vault:127.0.0.1:8200",
                "ServiceName": "vault",
                "CreateIndex": 19,
                "ModifyIndex": 19
            }
        ]
    }
]
```

Healthy/standby:

```
[snip]
        "Service": {
            "ID": "vault:127.0.0.2:8200",
            "Service": "vault",
            "Tags": [
                "standby"
            ],
            "Address": "127.0.0.2",
            "Port": 8200,
            "EnableTagOverride": false,
            "CreateIndex": 17,
            "ModifyIndex": 20
        },
        "Checks": [
            {
                "Node": "vm2",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "CreateIndex": 3,
                "ModifyIndex": 3
            },
            {
                "Node": "vm2",
                "CheckID": "vault-sealed-check",
                "Name": "Vault Sealed Status",
                "Status": "passing",
                "Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
                "Output": "",
                "ServiceID": "vault:127.0.0.2:8200",
                "ServiceName": "vault",
                "CreateIndex": 19,
                "ModifyIndex": 19
            }
        ]
    }
]
```

Sealed:

```
        "Checks": [
            {
                "Node": "vm2",
                "CheckID": "serfHealth",
                "Name": "Serf Health Status",
                "Status": "passing",
                "Notes": "",
                "Output": "Agent alive and reachable",
                "ServiceID": "",
                "ServiceName": "",
                "CreateIndex": 3,
                "ModifyIndex": 3
            },
            {
                "Node": "vm2",
                "CheckID": "vault-sealed-check",
                "Name": "Vault Sealed Status",
                "Status": "critical",
                "Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
                "Output": "Vault Sealed",
                "ServiceID": "vault:127.0.0.2:8200",
                "ServiceName": "vault",
                "CreateIndex": 19,
                "ModifyIndex": 38
            }
        ]
```
Useful if the HOME envvar is not set because `vault` was launched in a clean environment (e.g. `env -i vault ...`).
Brought to you by: Dept of 2nd thoughts before pushing enter on `git push`
The rest of the tests here use spaces, not tabs
Hide all Consul checks behind `CONSUL_HTTP_ADDR` env vs `CONSUL_ADDR` which is non-standard.
Consul service registration for Vault requires Consul 0.6.4.
If the local Consul agent is not available while attempting to step down from active or up to active, retry once a second.  Allow for concurrent changes to the state with a single registration updater.  Fix standby initialization.
Consul is never going to pass in more than 1K of output.  This mitigates the pre-0.6.4 concern.
Consul is never going to pass in more than 1K of output.  This mitigates the pre-0.6.4 concern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants