Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault with Consul #55

Closed
gvenka008c opened this issue Sep 14, 2016 · 39 comments
Closed

Vault with Consul #55

gvenka008c opened this issue Sep 14, 2016 · 39 comments

Comments

@gvenka008c
Copy link

Hi,

We have Vault integrated with Consul. We normally write all our key/value in the secret path as shown below.

vault write secret/elements/test value=mes0sd0cker

How can this be used in consul-replicate? I see the data is stored under /vault/logical in Consul. (see attached)

screen shot 2016-09-14 at 3 16 52 pm

Here is the sample config file for consul-replicate
./consul-replicate -config=/srv/consul/consul_replicate.hcl

# more consul_replicate.hcl
consul = "127.0.0.1:8500"
retry = "10s"
log_level = "debug"
max_stale = "10m"

syslog {
   enabled = true
}

prefix {
    source = "vault/logical/@ndc_ho_b"
}
@gvenka008c
Copy link
Author

++ I have the below configured on Datacenter A

consul = "127.0.0.1:8500"
retry = "10s"
log_level = "debug"

syslog {
   enabled = true
}

prefix {
    source = "vault/logical/@ndc_ho_b"
}

I have the below configured on Datacenter B.

more /srv/consul/consul_replicate.hcl
consul = "127.0.0.1:8500"
retry = "10s"
log_level = "debug"

syslog {
   enabled = true
   facility = "LOCAL1"
}

prefix {
    source = "vault/logical/@ndc_as_b"
}

When I add a key from B, I can see the key being replicated in A and i am not able to read the value.

#vault write secret/elements/PROD/test1 value=fromhob
Success! Data written to: secret/elements/PROD/test1

# vault read secret/elements/PROD/test1
Key                 Value
---                 -----
refresh_interval    2592000
value               from hob

From Datacenter B

# vault read secret/elements/PROD/test1
No value found at secret/elements/PROD/test1

@gvenka008c
Copy link
Author


Sep 14 20:19:22  consul-replicate: 2016/09/14 20:19:22 [INFO] (runner) quiescence minTimer fired
Sep 14 20:19:22  consul-replicate[22435]: (runner) quiescence minTimer fired
Sep 14 20:19:22  consul-replicate[22435]: (runner) running
Sep 14 20:19:22  consul-replicate: 2016/09/14 20:19:22 [INFO] (runner) running

Sep 14 20:19:22 archemas-asb-01s consul-replicate: 2016/09/14 20:19:22 [DEBUG] (runner) updated key "vault/logical/e4dd4f8c-8edc-b6e0-873e-3a7160ee07ca/elements/PROD/test1"
Sep 14 20:19:22 archemas-asb-01s consul-replicate[22435]: (runner) updated key "vault/logical/e4dd4f8c-8edc-b6e0-873e-3a7160ee07ca/elements/PROD/test1"

Sep 14 20:19:22  consul-replicate: 2016/09/14 20:19:22 [DEBUG] (runner) updated 
Sep 14 20:19:22  consul-replicate[22435]: (runner) replicated 4 updates, 0 deletes
Sep 14 20:19:22  consul-replicate: 2016/09/14 20:19:22 [INFO] (runner) replicated 4 updates, 0 deletes

@kak-tus
Copy link

kak-tus commented Sep 14, 2016

Setup from last link doesn't work with current version (latest binary, 0.2) of consul-replicate. See #54

@gvenka008c
Copy link
Author

I am running consul-replicate on both the Data Centers. Now I am seeing error on both the DC's while reading the data

# vault read secret/logical/elements/PROD/mesos
No value found at secret/logical/elements/PROD/mesos

See attached snapshot.

screen shot 2016-09-14 at 4 44 25 pm

@ntnn
Copy link

ntnn commented Sep 14, 2016

@kak-tus I'd be surprised if it didn't work, since we're rolling that version, though we use a patched version to exclude keys.
@gvenka008c Are you replicating in both directions - as in, are both agents active? A master-master setup isn't possible with consul and consul-replicate.

@gvenka008c
Copy link
Author

gvenka008c commented Sep 14, 2016

@ntnn Yes, replicating in both direction.
I was running consul-replicate on the non master nodes on each datacenter.

@ntnn
Copy link

ntnn commented Sep 14, 2016

@gvenka008c You have to deactivate consul-replicate in the leader dc and let the hot-standby dc's only replicate from the leader dc.

@gvenka008c
Copy link
Author

@ntnn Just now looked at the consul info. One was master and other was standby. So is it good practice to run the consul-replicate outside the VM's where consul / vault was not setup? Thoughts?

@gvenka008c
Copy link
Author

Do consul-replicate has any official docker image that can be used to run on a different VM for replication purpose?

@ntnn
Copy link

ntnn commented Sep 14, 2016

Possible, but that'd be wasted resources imho. What I did was letting a process check for the leader dc continuously and start/stop the replicate-agent accordingly. Locking is handled by consul as advised in the documentation.

@gvenka008c
Copy link
Author

Is this managed through systemd service? I am running consul-replicate as a
service

On Sep 14, 2016 4:59 PM, "Nelo-T. Wallus" notifications@github.com wrote:

Possible, but that'd be wasted resources imho. What I did was letting a
process check for the leader dc continuously and start/stop the
replicate-agent accordingly. Locking is handled by consul as advised in the
documentation.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#55 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AM0TI0O4X19kaAxJsO4-APeRSuN1GoPmks5qqGA5gaJpZM4J9I5s
.

@ntnn
Copy link

ntnn commented Sep 14, 2016

Yes.

@ntnn
Copy link

ntnn commented Sep 14, 2016

It works, but I'd rather use a backend with a proper transaction log.

@gvenka008c
Copy link
Author

Can you share the snippet if possible?

On Sep 14, 2016 5:07 PM, "Nelo-T. Wallus" notifications@github.com wrote:

Yes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#55 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AM0TI30NKh78NP3_4_ZIApV7iDkw_ORrks5qqGIcgaJpZM4J9I5s
.

@ntnn
Copy link

ntnn commented Sep 14, 2016

I'll check in with my superior tomorrow and get back to you.

On 2016-09-14 14:27, Govindaraj wrote:

Can you share the snippet if possible?

On Sep 14, 2016 5:07 PM, "Nelo-T. Wallus" notifications@github.com wrote:

Yes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#55 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AM0TI30NKh78NP3_4_ZIApV7iDkw_ORrks5qqGIcgaJpZM4J9I5s
.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#55 (comment)

/"\ ASCII Ribbon Campaign
\ / - against HTML emails
X - against proprietory attachments
/ \ http://en.wikipedia.org/wiki/ASCII_Ribbon_Campaign

@gvenka008c
Copy link
Author

@ntnn Also can't we have Active / Active setup? We wanted both the datacenters to accept traffic. If any data written in DC A has been be replicated in DC B and vice versa. Is that a possibility with consul-replicate?

After running consul-replicate on both DC's now I am not able to read any data on both DC's

# vault read secret/logical/elements/PROD/mesos
No value found at secret/logical/elements/PROD/mesos

@gvenka008c
Copy link
Author

Also I stopped consul-replicate on datacenter A. Wrote the below key/value using Vault.

# vault write secret/Elements/PROD/mesos value=mes0sd0cker
Success! Data written to: secret/Elements/PROD/mesos

# vault read secret/Elements/PROD/mesos
Key                 Value
---                 -----
refresh_interval    2592000
value               test


I ran the consul-replicate on datacenter B (which is not consul leader) and see the data being replicated on consul GUI

screen shot 2016-09-14 at 7 08 58 pm

When I run the Vault read command on datacenter B, I am getting the error as below

# vault read secret/Elements/PROD/mesos
No value found at secret/Elements/PROD/mesos

Thoughts?

@jefferai
Copy link
Member

This is unsupported. It is not within the design parameters of Vault and is potentially dangerous.

@gvenka008c
Copy link
Author

@jefferai Sorry, what is unsupported?

@gvenka008c
Copy link
Author

@jefferai @ntnn

I stopped consul-replicate on datacenter A. Wrote the below key/value using Vault.

# vault write secret/Elements/PROD/mesos value=mes0sd0cker
Success! Data written to: secret/Elements/PROD/mesos

# vault read secret/Elements/PROD/mesos
Key                 Value
---                 -----
refresh_interval    2592000
value               test

I ran the consul-replicate on datacenter B (which is not consul leader) and see the data being replicated on consul GUI

screen shot 2016-09-14 at 7 08 58 pm

When I run the Vault read command on datacenter B, I am getting the error as below

# vault read secret/Elements/PROD/mesos
No value found at secret/Elements/PROD/mesos

Thoughts?

@slackpad
Copy link
Contributor

@gvenka008c consul-replicate does not support master-master replication. It's only set up to pull in one direction. There are some limited cases where people use consul-replicate with Vault (see hashicorp/vault#633), but it definitely does not support the use case you are going for here with a bidirectional sync, you could lose data or otherwise corrupt Vault's store.

@gvenka008c
Copy link
Author

@slackpad If you see my previous experiment, I stopped master-master replication. It's is setup on only one Data Center. I didn't do a bidirectional sync.

@gvenka008c
Copy link
Author

@ntnn Any update on managing it through services?

@slackpad: When we do vault initial on datacenter A, it will have separate unseal keys. When we do Vault init on datacenter B, it will have another set of keys. So i am concerned how can the replication happen effectively? Thoughts?

@ntnn
Copy link

ntnn commented Sep 19, 2016

@gvenka008c I haven't gotten the permission to share details, but I'm allowed to write an article detailing the problem at hand. I'm writing this down and send you a link after I've published it.

The problem and resolution is reasonably simple, so if you're under pressure I'd say don't wait for it. The approval of the article might take a week.

@gvenka008c
Copy link
Author

@ntnn Did you had a chance to publish the article? Thanks.

@gvenka008c
Copy link
Author

How does Vault on Datacenter B know that Vault on Datacenter A is already initialized. I followed the document as outlined below

http://sysadminsjourney.com/blog/2015/10/30/replicating-hashicorp-vault-in-a-multi-datacenter-setup/

But when I do the below on Datacenter B, i get message as server is not yet initialized. Is there any other data that needs to be copied to ensure a sync up of Vault on DC B with DC A?

# vault status
Error checking seal status: Error making API request.

URL: GET http://xx.xxx.xxx.xx:8200/v1/sys/seal-status
Code: 400. Errors:

* server is not yet initialized

Thoughts? Thanks.

@ntnn
Copy link

ntnn commented Oct 21, 2016

Ah yes, sorry. I forgot about that, I'll haven't heard back about the article, but I'll take care of that today. If I get the approvement it'll be up later today.

@ntnn
Copy link

ntnn commented Oct 22, 2016

@gvenka008c Got the approvement, though I had to make some adjustment. Here's the post:
https://ntnn.de/blog/vault_georedundancy/

Primarly note the last two points.

@kak-tus
Copy link

kak-tus commented Oct 22, 2016

@ntnn In described setup you have only one vault leader in all DCs. I can't understand how vault elects leader in cross DC setup in your case.

In my setup I have vault leader in every DC. And one of the DCs is manually selected by me as master, so I do writes to vaults in this DC. And vaults in other DC's is for read only and replicated from master DC.

@ntnn
Copy link

ntnn commented Oct 23, 2016

@kak-tus It doesn't. First vault leader to be elected and achieve the lock wins. In my tests I've seen that this works pretty well, the lan clusters usually had a gap big enough before getting to achieve a consensus - and afterwards the leader is set.

The problem with the read-only slave-masters in other dc's is that you might get stale secrets and you can't actually restrict not writing to those slaves.
Given you restrict access trough iptables/vlans the problem of direct access is solved. Same if you're using Vault for a low number of secrets and as a WORM storage. With a large number of secrets the replication takes a while to get the secrets and apply them, which isn't feasible for use with no-knowledge secrets shared between machines.

@kak-tus
Copy link

kak-tus commented Oct 25, 2016

@ntnn After some tries I can't get it work. Vault in your setup is configured to local consul on each node (as example consul.service.consul)?

My actions:

  1. Setup a new consul in new DC.
  2. Clone all vault subkey of consul kv (may be it's my mistake?)
  3. Then I start vault container.
  4. vault status show that it is sealed.
  5. Unseal it.
  6. And after unsealing it becomes leader (in current DC).

Consul 0.7
Vault 0.6.2

Vault config (same in both DC's)

disable_cache = true

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_disable = 1
}

backend "consul" {
  address = "consul.service.consul:8500"
  token = "some token"
  redirect_addr = "http://some ip:8200"
  cluster_addr = "http://some ip:8200"
  service = "vault-internal"
}

Consul config

{
  "bootstrap_expect": 1,
  "node_name": "some node name",
  "server": true,
  "recursors": ["172.17.0.1","8.8.8.8"],
  "datacenter": "dc-test",
  "client_addr": "0.0.0.0",
  "ui": true,
  "encrypt": "some key",
  "acl_datacenter": "dc-ihor",
  "acl_master_token": "some token",
  "acl_token": "some token"
}

P.S. in article you say "the replication has to be done before initializing vault". It means: before initializing vault in second (target) DC or before initializing vault in first (source) DC?
I think - you mean first case, because in second case it will be impossible to attach new DC (third,fourth) to the cluster.

@ntnn
Copy link

ntnn commented Oct 25, 2016

On 2016-10-25 14:25, Andrey Kuzmin wrote:

@ntnn After some tries I can't get it work. Vault in your setup is configured to local consul on each node (as example consul.service.consul)?

My actions:

  1. Setup a new consul in new DC.
  2. Clone all vault subkey of consul kv (may be it's my mistake?)
  3. Then I start vault container.
  4. vault status show that it is sealed.
  5. Unseal it.
  6. And after unsealing it becomes master.

Consul 0.7
Vault 0.6.2

Vault config (same in both DC's)

disable_cache = true

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_disable = 1
}

backend "consul" {
  address = "consul.service.consul:8500"
  token = "some token"
  redirect_addr = "http://some ip:8200"
  cluster_addr = "http://some ip:8200"
  service = "vault-internal"
}

P.S. in article you say "the replication has to be done before initializing vault". It means: before initializing vault in second (target) DC or before initializing vault in first (source) DC?
I think - you mean first case, because in second case it will be impossible to attach new DC (third,fourth) to the cluster.

The replication has to be prepared and verified working before
initializing vault in any dc.

The rough workflow for setting up a completely new vault cluster is:

  • Setting up consul with LAN/WAN clusters
  • Setting up replication
  • Initialization one vault node
  • Optional: Unseal vault node

After the initialization the consul LAN cluster propagates the k/v
pairs, including the encrypted master key, through the LAN gossip and
consul-replicate instances in other DCs will pull the k/v pairs to their
local dc.

After that adding and removing nodes is just a matter of deployment.

You only have to initialize one node in one DC, the replication - if set
up correctly - should take care of transferring the keys.

In my tests so far it didn't matter if your consul nodes are local or
a specific host, though that depends on your network and art of load
balancing.

Though if you have the resources I'd suggest to use an instance of the
backend on the nodes your vault instances are running on.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#55 (comment)

/"\ ASCII Ribbon Campaign
\ / - against HTML emails
X - against proprietory attachments
/ \ http://en.wikipedia.org/wiki/ASCII_Ribbon_Campaign

@kak-tus
Copy link

kak-tus commented Oct 26, 2016

@ntnn Thank you for detailed answer, but it is not work for me. :-(

I have created two new test WAN clusters.
Setup replication (have checked that it works).
Initialize node in dc1.
Unseal node in dc1.

After that vault node in dc1 began a leader.
I checked kv store, vault folder was synced.

Node in dc2 shows "Mode: sealed".
Then I unseal node in dc2.
After that node in dc2 began a leader too.

I think, you have some more different vault configuration than config in documentation.

@ntnn
Copy link

ntnn commented Oct 31, 2016

@kak-tus Can you pastebin your configuration somewhere? I'll check on wednesday where the differences are. Also note that I'm using a patched consul-replicate with the now merged #51 .

@kak-tus
Copy link

kak-tus commented Nov 4, 2016

@ntnn I can't get it work even with new consul-replicate version (with exclude option).
For test I've installed two new nodes with consul (one node in every dc).

replicate config (I also tried to exclude vault/core/leader)
https://gist.github.com/kak-tus/dec8eb7b62209d30a6e900fdc118b895

consul config (same in both dc)
https://gist.github.com/kak-tus/da72b2631bd3a0228f629869beff36b2

vault config (vault looks at local consul agent)
https://gist.github.com/kak-tus/0ce521a91bbb952c5b963da240b6f928

In my current try, when start vault I have seen some strange notice in log

[INF] core/startClusterListener: clustering disabled, not starting listeners

@cscetbon
Copy link

cscetbon commented Nov 16, 2016

Hey @ntnn,

In your article you said :

However I’ll probably switch to a backend which supports cross-dc replication natively

What other backend do you have in mind ? I was thinking about Cassandra, that supports cross-dc replication, and is supported as a secret backend but not as a storage backend. It's unfortunate that it can't be used for both. This architecture looks also complex to build but Hashicorp says that it's the recommended backend to use for HA as it is what their supported customers use.

@ntnn
Copy link

ntnn commented Nov 16, 2016

@kak-tus Sorry for coming back so late - the raft protocol works best with an uneven number of nodes, that's probably what you're encountering there.
Also, you don't want to exclude core/lock from the replication - not necessarily at least - since that is the path used to acquire a lock to read out the leader's UUID (see vault/core.go L663, L669 and L679), though I haven't looked deeper into it - if someone can tell otherwise I'd be glad to hear it.

@cscetbon I'm looking towards etcd, DynamoDB or Zookeeper.
The secret backends are solemnly to retrieve credentials for other services securily through vault. E.g. you can generate ad-hoc credentials with them, which is quite handy and lets your infrastructure look good in an audit.

You could, of course, write a backend plug for cassandra and submit it to be added to the list of community-supported backends.

@Niranjankolli
Copy link

I am working on a vault multi DC setup with consul as backend storage.
3 nodes consul cluster in each DC with LAN clusters.
Using consul-replicate am replicating the data from source to destination.
Whenever I create the new secret engine, it is not getting replicated to secondary DC.
Once I restart vault in secondary DC am able to see the secret engine.
Keys/policies/users are getting replicated.

I followed hashicorp/vault#674.
Anyone achieved this setup. ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants