Skip to content

Commit

Permalink
vault documentation: doc cleanup effort-batch4 (#16711)
Browse files Browse the repository at this point in the history
* cleanup effort

* modified text

* Update website/content/docs/internals/integrated-storage.mdx

Co-authored-by: Yoko Hyakuna <yoko@hashicorp.com>

Co-authored-by: Yoko Hyakuna <yoko@hashicorp.com>
  • Loading branch information
taoism4504 and yhyakuna committed Aug 12, 2022
1 parent 192c2aa commit be4131f
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 62 deletions.
75 changes: 33 additions & 42 deletions website/content/docs/internals/integrated-storage.mdx
Expand Up @@ -6,20 +6,20 @@ description: Learn about the integrated raft storage in Vault.

# Integrated Storage

Vault supports a number of Storage options for the durable storage of Vault's
information. Each backend has pros, cons, advantages, and trade-offs. For
Vault supports several storage options for the durable storage of Vault's
information. Each backend offers pros, cons, advantages, and trade-offs. For
example, some backends support high availability while others provide a more
robust backup and restoration process.

As of Vault 1.4 an Integrated Storage option is offered. This storage backend
does not rely on any third party systems, it implements high availability,
As of Vault 1.4, an Integrated Storage option is offered. This storage backend
does not rely on any third party systems; it implements high availability,
supports Enterprise Replication features, and provides backup/restore workflows.

## Consensus Protocol

Vault's Integrated Storage uses a [consensus
protocol](<https://en.wikipedia.org/wiki/Consensus_(computer_science)>) to provide
[Consistency (as defined by CAP)](https://en.wikipedia.org/wiki/CAP_theorem).
[Consistency](https://en.wikipedia.org/wiki/CAP_theorem) (as defined by CAP).
The consensus protocol is based on ["Raft: In search of an Understandable
Consensus Algorithm"](https://raft.github.io/raft.pdf). For a visual explanation
of Raft, see [The Secret Lives of Data](http://thesecretlivesofdata.com/raft).
Expand All @@ -33,66 +33,57 @@ understandable algorithm.

There are a few key terms to know when discussing Raft:

- Log - The primary unit of work in a Raft system is a log entry. The problem
of consistency can be decomposed into a _replicated log_. A log is an ordered
sequence of entries. Entries includes any cluster change: adding nodes, adding
services, new key-value pairs, etc. We consider the log consistent if all
members agree on the entries and their order.
- **Leader** - At any given time, the peer set elects a single node to be the leader.
The leader is responsible for ingesting new log entries, replicating to followers,
and managing when an entry is committed. The leader node is also the active Vault node and followers are standby nodes. Refer to the [High Availability](/docs/internals/high-availability#design-overview) document for more information.

- FSM - [Finite State Machine](https://en.wikipedia.org/wiki/Finite-state_machine).
An FSM is a collection of finite states with transitions between them. As new logs
- **Log** - An ordered sequence of entries (replicated log) to keep track of any cluster changes. The leader is responsible for _log replication_. When new data is written, for example, a new event creates a log entry. The leader then sends the new log entry to its followers. Any inconsistency within the replicated log entries will indicate an issue.

- **FSM** - [Finite State Machine](https://en.wikipedia.org/wiki/Finite-state_machine).
A collection of finite states with transitions between them. As new logs
are applied, the FSM is allowed to transition between states. Application of the
same sequence of logs must result in the same state, meaning behavior must be deterministic.

- Peer set - The peer set is the set of all members participating in log replication.
For Vault's purposes, all server nodes are in the peer set of the local cluster.
- **Peer set** - The set of all members participating in log replication. All server nodes are in the peer set of the local cluster.

- Quorum - A quorum is a majority of members from a peer set: for a set of size `n`,
- **Quorum** - A majority of members from a peer set: for a set of size `n`,
quorum requires at least `(n+1)/2` members. For example, if there are 5 members
in the peer set, we would need 3 nodes to form a quorum. If a quorum of nodes is
unavailable for any reason, the cluster becomes _unavailable_ and no new logs
can be committed.

- Committed Entry - An entry is considered _committed_ when it is durably stored
on a quorum of nodes. Once an entry is committed it can be applied.
- **Committed Entry** - An entry is considered _committed_ when it is durably stored
on a quorum of nodes. An entry is applied once its committed.

- Leader - At any given time, the peer set elects a single node to be the leader.
The leader is responsible for ingesting new log entries, replicating to followers,
and managing when an entry is considered committed. For Vault's purposes, the
leader node is also the Active vault node and followers are standby nodes. See
the [High Availability docs](/docs/internals/high-availability#design-overview)
for more information.

Raft is a complex protocol and will not be covered here in detail (for those who
desire a more comprehensive treatment, the full specification is available in this
[paper](https://raft.github.io/raft.pdf)). We will, however, attempt to provide
a high level description which may be useful for building a mental model.
Raft is a complex protocol and will not be covered here in detail. We will, however, attempt to provide
a high level description which may be useful for building a mental model. For those who
want a more comprehensive understanding of Raft, the full specification is available in this
[paper](https://raft.github.io/raft.pdf)).

Raft nodes are always in one of three states: follower, candidate, or leader. All
nodes initially start out as a follower. In this state, nodes can accept log entries
from a leader and cast votes. If no entries are received for some time, nodes
self-promote to the candidate state. In the candidate state, nodes request votes from
their peers. If a candidate receives a quorum of votes, then it is promoted to a leader.
The leader must accept new log entries and replicate to all the other followers.
from a leader and cast votes. If no entries are received for a period of time, nodes
will self-promote to the candidate state. In the candidate state, nodes request votes from their peers. If a candidate receives a quorum of votes, then it is promoted to a leader. The leader must accept new log entries and replicate to all the other followers.

Once a cluster has a leader, it is able to accept new log entries. A client can
request that a leader append a new log entry (from Raft's perspective, a log entry
is an opaque binary blob). The leader then writes the entry to durable storage and
attempts to replicate to a quorum of followers. Once the log entry is considered
_committed_, it can be _applied_ to a finite state machine. The finite state machine
is application specific; in Vault's case, we use
[BoltDB](https://github.com/etcd-io/bbolt) to maintain cluster state. Vault's writes
block until it is both _committed_ and _applied_.
[BoltDB](https://github.com/etcd-io/bbolt) to maintain a cluster state. Vault's writes
are block until they are _committed_ and _applied_.

Obviously, it would be undesirable to allow a replicated log to grow in an unbounded
It would be undesirable to allow a replicated log to grow in an unbounded
fashion. Raft provides a mechanism by which the current state is snapshotted and the
log is compacted. Because of the FSM abstraction, restoring the state of the FSM must
result in the same state as a replay of old logs. This allows Raft to capture the FSM
state at a point in time and then remove all the logs that were used to reach that
state. This is performed automatically without user intervention and prevents unbounded
disk usage while also minimizing the time spent replaying logs. One of the advantages of
using BoltDB is that it allows Vault's snapshots to be very light weight. Since
Vault's data is already persisted to disk in BoltDB the snapshot process just
Vault's data is already persisted to disk in BoltDB, the snapshot process just
needs to truncate the raft logs.

Consensus is fault-tolerant while a cluster has quorum.
Expand All @@ -101,8 +92,8 @@ about peer membership. For example, suppose there are only 2 peers: A and B. The
size is also 2, meaning both nodes must agree to commit a log entry. If either A or B
fails, it is now impossible to reach quorum. This means the cluster is unable to add
or remove a node or to commit any additional log entries. This results in
_unavailability_. At this point, manual intervention would be required to remove
either A or B and to restart the remaining node in bootstrap mode.
_unavailability_. At this point, manual intervention is required to remove
either A or B and restart the remaining node in bootstrap mode.

A Raft cluster of 3 nodes can tolerate a single node failure while a cluster
of 5 can tolerate 2 node failures. The recommended configuration is to either
Expand All @@ -117,16 +108,16 @@ Thus, performance is bound by disk I/O and network latency.
### Raft in Vault

When getting started, a single Vault server is
[initialized](/docs/commands/operator/init/#operator-init). At this point the
cluster is of size 1 which allows the node to self-elect as a leader. Once a
[initialized](/docs/commands/operator/init/#operator-init). At this point, the
cluster is of size 1, which allows the node to self-elect as a leader. Once a
leader is elected, other servers can be added to the peer set in a way that
preserves consistency and safety.

The join process is how new nodes are added to the vault cluster, it uses an
The join process is how new nodes are added to the Vault cluster; it uses an
encrypted challenge/answer workflow. To accomplish this, all nodes in a single
raft cluster must share the same seal configuration. If using an Auto Unseal the
Raft cluster must share the same seal configuration. If using an Auto Unseal, the
join process can use the configured seal to automatically decrypt the challenge
and respond with the answer. If using a Shamir seal the unseal keys must be
and respond with the answer. If using a Shamir seal, the unseal keys must be
provided to the node attempting to join the cluster before it can decrypt the
challenge and respond with the decrypted answer.

Expand Down
41 changes: 21 additions & 20 deletions website/content/docs/internals/security.mdx
Expand Up @@ -6,15 +6,15 @@ description: Learn about the security model of Vault.

# Security Model

Due to the nature of Vault and the confidentiality of data it is managing,
Due to the nature of Vault and the confidentiality of data it manages,
the Vault security model is very critical. The overall goal of Vault's security
model is to provide [confidentiality, integrity, availability, accountability,
authentication](https://en.wikipedia.org/wiki/Information_security).

This means that data at rest and in transit must be secure from eavesdropping
or tampering. Clients must be appropriately authenticated and authorized
to access data or modify policy. All interactions must be auditable and traced
uniquely back to the origin entity. The system must be robust against intentional
to access data or modify policies. All interactions must be auditable and traced
uniquely back to the origin entity, and the system must be robust against intentional
attempts to bypass any of its access controls.

# Threat Model
Expand Down Expand Up @@ -42,7 +42,7 @@ The following are the various parts of the Vault threat model:
- Availability of secret material in the face of failure. Vault supports
running in a highly available configuration to avoid loss of availability.

The following are not parts of the Vault threat model:
The following are not considered part of the Vault threat model:

- Protecting against arbitrary control of the storage backend. An attacker
that can perform arbitrary operations against the storage backend can
Expand All @@ -57,7 +57,7 @@ The following are not parts of the Vault threat model:
and is stored, even if it is kept confidential.

- Protecting against memory analysis of a running Vault. If an attacker is able
to inspect the memory state of a running Vault instance then the confidentiality
to inspect the memory state of a running Vault instance, then the confidentiality
of data may be compromised.

- Protecting against flaws in external systems or services used by Vault.
Expand All @@ -77,11 +77,12 @@ The following are not parts of the Vault threat model:

# External Threat Overview

Given the architecture of Vault, there are 3 distinct systems we are concerned
with for Vault. There is the client, which is speaking to Vault over an API.
There is Vault or the server more accurately, which is providing an API and
serving requests. Lastly, there is the storage backend, which the server is
utilizing to read and write data.
Vault architecture compromises of three distinct systems:

- Client: Speaks to Vault over an API.
- Server: Provides an API and serves requests.
- Storage backend: Utilized by the server to read and write data.


There is no mutual trust between the Vault client and server. Clients use
[TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) to verify the
Expand All @@ -93,7 +94,7 @@ permitted to make login requests.
All server-to-server traffic between Vault instances within a cluster (i.e,
high availability, enterprise replication or integrated storage) uses
mutually-authenticated TLS to ensure the confidentiality and integrity of data
in transit. Nodes are authenticated prior to joining the cluster, by an
in transit. Nodes are authenticated prior to joining the cluster by an
[unseal challenge](/docs/concepts/integrated-storage#vault-networking-recap) or
a [one-time-use activation token](/docs/enterprise/replication#security-model).

Expand All @@ -105,11 +106,11 @@ Encryption Standard
the [Galois Counter Mode
(GCM)](https://en.wikipedia.org/wiki/Galois/Counter_Mode) with 96-bit nonces.
The nonce is randomly generated for every encrypted object. When data is read
from the security barrier the GCM authentication tag is verified during the
from the security barrier, the GCM authentication tag is verified during the
decryption process to detect any tampering.

Depending on the backend used, Vault may communicate with the backend over TLS
to provide an added layer of security. In some cases, such as a file backend
to provide an added layer of security. In some cases, such as a file backend,
this is not applicable. Because storage backends are untrusted, an eavesdropper
would only gain access to encrypted data even if communication with the backend
was intercepted.
Expand All @@ -118,7 +119,7 @@ was intercepted.

Within the Vault system, a critical security concern is an attacker attempting
to gain access to secret material they are not authorized to. This is an internal
threat if the attacker is already permitted some level of access to Vault and is
threat if the attacker is already permitted to some level of access to Vault, and is
able to authenticate.

When a client first authenticates with Vault, an auth method is used to verify
Expand All @@ -129,7 +130,7 @@ example, GitHub users in the "engineering" team may be mapped to the
which is a randomly generated, serialized value and maps it to the policy list.
This client token is then returned to the client.

On each request a client provides this token. Vault then uses it to check that
On each request, a client provides this token. Vault then uses it to check that
the token is valid and has not been revoked or expired, and generates an ACL
based on the associated policies. Vault uses a strict default deny
enforcement strategy. This means unless an associated policy allows for a given action,
Expand All @@ -144,8 +145,8 @@ may be an exact match or the longest-prefix match glob pattern. See

Certain operations are only permitted by "root" users, which is a distinguished
policy built into Vault. This is similar to the concept of a root user on a
Unix system or an Administrator on Windows. Although clients could be provided
with root tokens or associated with the root policy, instead Vault supports the
Unix system or an administrator on Windows. In cases where clients are provided
with root tokens or associated with the root policy, Vault supports the
notion of "sudo" privilege. As part of a policy, users may be granted "sudo"
privileges to certain paths, so that they can still perform security sensitive
operations without being granted global root access to Vault.
Expand All @@ -158,7 +159,7 @@ is started, it starts in a _sealed_ state. This means that the encryption key
needed to read and write from the storage backend is not yet known. The process
of unsealing requires providing the root key so that the encryption key can
be retrieved. The risk of distributing the root key is that a single
malicious actor with access to it can decrypt the entire Vault. Instead,
malicious attacker with access to it can decrypt the entire Vault. Instead,
Shamir's technique allows us to split the root key into multiple shares or
parts. The number of shares and the threshold needed is configurable, but by
default Vault generates 5 shares, any 3 of which must be provided to
Expand All @@ -173,11 +174,11 @@ Once unsealed the standard ACL mechanisms are used for all requests.
To make an analogy, a bank puts security deposit boxes inside of a vault. Each
security deposit box has a key, while the vault door has both a combination and
a key. The vault is encased in steel and concrete so that the door is the only
practical entrance. The analogy to Vault, is that the cryptosystem is the
practical entrance. The analogy to Vault is that the cryptosystem is the
steel and concrete protecting the data. While you could tunnel through the
concrete or brute force the encryption keys, it would be prohibitively time
consuming. Opening the bank vault requires two-factors: the key and the
combination. Similarly, Vault requires multiple shares be provided to
reconstruct the root key. Once unsealed, each security deposit boxes still
requires the owner provide a key, and similarly the Vault ACL system protects
requires that the owner provide a key, and similarly the Vault ACL system protects
all the secrets stored.

0 comments on commit be4131f

Please sign in to comment.