vault documentation: doc cleanup effort-batch4 (#16711)

* cleanup effort * modified text * Update website/content/docs/internals/integrated-storage.mdx Co-authored-by: Yoko Hyakuna <yoko@hashicorp.com> Co-authored-by: Yoko Hyakuna <yoko@hashicorp.com>
hashicorp · Aug 12, 2022 · be4131f · be4131f
1 parent 192c2aa
commit be4131f
Show file tree

Hide file tree

Showing 2 changed files with 54 additions and 62 deletions.
diff --git a/website/content/docs/internals/integrated-storage.mdx b/website/content/docs/internals/integrated-storage.mdx
@@ -6,20 +6,20 @@ description: Learn about the integrated raft storage in Vault.
 
 # Integrated Storage
 
-Vault supports a number of Storage options for the durable storage of Vault's
-information. Each backend has pros, cons, advantages, and trade-offs. For
+Vault supports several storage options for the durable storage of Vault's
+information. Each backend offers pros, cons, advantages, and trade-offs. For
 example, some backends support high availability while others provide a more
 robust backup and restoration process.
 
-As of Vault 1.4 an Integrated Storage option is offered. This storage backend
-does not rely on any third party systems, it implements high availability,
+As of Vault 1.4, an Integrated Storage option is offered. This storage backend
+does not rely on any third party systems; it implements high availability,
 supports Enterprise Replication features, and provides backup/restore workflows.
 
 ## Consensus Protocol
 
 Vault's Integrated Storage uses a [consensus
 protocol](<https://en.wikipedia.org/wiki/Consensus_(computer_science)>) to provide
-[Consistency (as defined by CAP)](https://en.wikipedia.org/wiki/CAP_theorem).
+[Consistency](https://en.wikipedia.org/wiki/CAP_theorem) (as defined by CAP).
 The consensus protocol is based on ["Raft: In search of an Understandable
 Consensus Algorithm"](https://raft.github.io/raft.pdf). For a visual explanation
 of Raft, see [The Secret Lives of Data](http://thesecretlivesofdata.com/raft).
@@ -33,66 +33,57 @@ understandable algorithm.
 
 There are a few key terms to know when discussing Raft:
 
-- Log - The primary unit of work in a Raft system is a log entry. The problem
-  of consistency can be decomposed into a _replicated log_. A log is an ordered
-  sequence of entries. Entries includes any cluster change: adding nodes, adding
-  services, new key-value pairs, etc. We consider the log consistent if all
-  members agree on the entries and their order.
+- **Leader** - At any given time, the peer set elects a single node to be the leader.
+The leader is responsible for ingesting new log entries, replicating to followers,
+and managing when an entry is committed. The leader node is also the active Vault node and followers are standby nodes. Refer to the [High Availability](/docs/internals/high-availability#design-overview) document for more information.
 
-- FSM - [Finite State Machine](https://en.wikipedia.org/wiki/Finite-state_machine).
-  An FSM is a collection of finite states with transitions between them. As new logs
+- **Log** - An ordered sequence of entries (replicated log) to keep track of any cluster changes. The leader is responsible for _log replication_. When new data is written, for example, a new event creates a log entry. The leader then sends the new log entry to its followers. Any inconsistency within the replicated log entries will indicate an issue.
+
+- **FSM** - [Finite State Machine](https://en.wikipedia.org/wiki/Finite-state_machine).
+  A collection of finite states with transitions between them. As new logs
   are applied, the FSM is allowed to transition between states. Application of the
   same sequence of logs must result in the same state, meaning behavior must be deterministic.
 
-- Peer set - The peer set is the set of all members participating in log replication.
-  For Vault's purposes, all server nodes are in the peer set of the local cluster.
+- **Peer set** - The set of all members participating in log replication. All server nodes are in the peer set of the local cluster.
 
-- Quorum - A quorum is a majority of members from a peer set: for a set of size `n`,
+- **Quorum** - A majority of members from a peer set: for a set of size `n`,
   quorum requires at least `(n+1)/2` members. For example, if there are 5 members
   in the peer set, we would need 3 nodes to form a quorum. If a quorum of nodes is
   unavailable for any reason, the cluster becomes _unavailable_ and no new logs
   can be committed.
 
-- Committed Entry - An entry is considered _committed_ when it is durably stored
-  on a quorum of nodes. Once an entry is committed it can be applied.
+- **Committed Entry** - An entry is considered _committed_ when it is durably stored
+  on a quorum of nodes. An entry is applied once its committed.
 
-- Leader - At any given time, the peer set elects a single node to be the leader.
-  The leader is responsible for ingesting new log entries, replicating to followers,
-  and managing when an entry is considered committed. For Vault's purposes, the
-  leader node is also the Active vault node and followers are standby nodes. See
-  the [High Availability docs](/docs/internals/high-availability#design-overview)
-  for more information.
 
-Raft is a complex protocol and will not be covered here in detail (for those who
-desire a more comprehensive treatment, the full specification is available in this
-[paper](https://raft.github.io/raft.pdf)). We will, however, attempt to provide
-a high level description which may be useful for building a mental model.
+Raft is a complex protocol and will not be covered here in detail. We will, however, attempt to provide
+a high level description which may be useful for building a mental model. For those who
+want a more comprehensive understanding of Raft, the full specification is available in this
+[paper](https://raft.github.io/raft.pdf)).
 
 Raft nodes are always in one of three states: follower, candidate, or leader. All
 nodes initially start out as a follower. In this state, nodes can accept log entries
-from a leader and cast votes. If no entries are received for some time, nodes
-self-promote to the candidate state. In the candidate state, nodes request votes from
-their peers. If a candidate receives a quorum of votes, then it is promoted to a leader.
-The leader must accept new log entries and replicate to all the other followers.
+from a leader and cast votes. If no entries are received for a period of time, nodes
+will self-promote to the candidate state. In the candidate state, nodes request votes from their peers. If a candidate receives a quorum of votes, then it is promoted to a leader. The leader must accept new log entries and replicate to all the other followers.
 
 Once a cluster has a leader, it is able to accept new log entries. A client can
 request that a leader append a new log entry (from Raft's perspective, a log entry
 is an opaque binary blob). The leader then writes the entry to durable storage and
 attempts to replicate to a quorum of followers. Once the log entry is considered
 _committed_, it can be _applied_ to a finite state machine. The finite state machine
 is application specific; in Vault's case, we use
-[BoltDB](https://github.com/etcd-io/bbolt) to maintain cluster state. Vault's writes
-block until it is both _committed_ and _applied_.
+[BoltDB](https://github.com/etcd-io/bbolt) to maintain a cluster state. Vault's writes
+are block until they are _committed_ and _applied_.
 
-Obviously, it would be undesirable to allow a replicated log to grow in an unbounded
+It would be undesirable to allow a replicated log to grow in an unbounded
 fashion. Raft provides a mechanism by which the current state is snapshotted and the
 log is compacted. Because of the FSM abstraction, restoring the state of the FSM must
 result in the same state as a replay of old logs. This allows Raft to capture the FSM
 state at a point in time and then remove all the logs that were used to reach that
 state. This is performed automatically without user intervention and prevents unbounded
 disk usage while also minimizing the time spent replaying logs. One of the advantages of
 using BoltDB is that it allows Vault's snapshots to be very light weight. Since
-Vault's data is already persisted to disk in BoltDB the snapshot process just
+Vault's data is already persisted to disk in BoltDB, the snapshot process just
 needs to truncate the raft logs.
 
 Consensus is fault-tolerant while a cluster has quorum.
@@ -101,8 +92,8 @@ about peer membership. For example, suppose there are only 2 peers: A and B. The
 size is also 2, meaning both nodes must agree to commit a log entry. If either A or B
 fails, it is now impossible to reach quorum. This means the cluster is unable to add
 or remove a node or to commit any additional log entries. This results in
-_unavailability_. At this point, manual intervention would be required to remove
-either A or B and to restart the remaining node in bootstrap mode.
+_unavailability_. At this point, manual intervention is required to remove
+either A or B and restart the remaining node in bootstrap mode.
 
 A Raft cluster of 3 nodes can tolerate a single node failure while a cluster
 of 5 can tolerate 2 node failures. The recommended configuration is to either
@@ -117,16 +108,16 @@ Thus, performance is bound by disk I/O and network latency.
 ### Raft in Vault
 
 When getting started, a single Vault server is
-[initialized](/docs/commands/operator/init/#operator-init). At this point the
-cluster is of size 1 which allows the node to self-elect as a leader. Once a
+[initialized](/docs/commands/operator/init/#operator-init). At this point, the
+cluster is of size 1, which allows the node to self-elect as a leader. Once a
 leader is elected, other servers can be added to the peer set in a way that
 preserves consistency and safety.
 
-The join process is how new nodes are added to the vault cluster, it uses an
+The join process is how new nodes are added to the Vault cluster; it uses an
 encrypted challenge/answer workflow. To accomplish this, all nodes in a single
-raft cluster must share the same seal configuration. If using an Auto Unseal the
+Raft cluster must share the same seal configuration. If using an Auto Unseal, the
 join process can use the configured seal to automatically decrypt the challenge
-and respond with the answer. If using a Shamir seal the unseal keys must be
+and respond with the answer. If using a Shamir seal, the unseal keys must be
 provided to the node attempting to join the cluster before it can decrypt the
 challenge and respond with the decrypted answer.
 

diff --git a/website/content/docs/internals/security.mdx b/website/content/docs/internals/security.mdx
@@ -6,15 +6,15 @@ description: Learn about the security model of Vault.
 
 # Security Model
 
-Due to the nature of Vault and the confidentiality of data it is managing,
+Due to the nature of Vault and the confidentiality of data it manages,
 the Vault security model is very critical. The overall goal of Vault's security
 model is to provide [confidentiality, integrity, availability, accountability,
 authentication](https://en.wikipedia.org/wiki/Information_security).
 
 This means that data at rest and in transit must be secure from eavesdropping
 or tampering. Clients must be appropriately authenticated and authorized
-to access data or modify policy. All interactions must be auditable and traced
-uniquely back to the origin entity. The system must be robust against intentional
+to access data or modify policies. All interactions must be auditable and traced
+uniquely back to the origin entity, and the system must be robust against intentional
 attempts to bypass any of its access controls.
 
 # Threat Model
@@ -42,7 +42,7 @@ The following are the various parts of the Vault threat model:
 - Availability of secret material in the face of failure. Vault supports
   running in a highly available configuration to avoid loss of availability.
 
-The following are not parts of the Vault threat model:
+The following are not considered part of the Vault threat model:
 
 - Protecting against arbitrary control of the storage backend. An attacker
   that can perform arbitrary operations against the storage backend can
@@ -57,7 +57,7 @@ The following are not parts of the Vault threat model:
   and is stored, even if it is kept confidential.
 
 - Protecting against memory analysis of a running Vault. If an attacker is able
-  to inspect the memory state of a running Vault instance then the confidentiality
+  to inspect the memory state of a running Vault instance, then the confidentiality
   of data may be compromised.
 
 - Protecting against flaws in external systems or services used by Vault.
@@ -77,11 +77,12 @@ The following are not parts of the Vault threat model:
 
 # External Threat Overview
 
-Given the architecture of Vault, there are 3 distinct systems we are concerned
-with for Vault. There is the client, which is speaking to Vault over an API.
-There is Vault or the server more accurately, which is providing an API and
-serving requests. Lastly, there is the storage backend, which the server is
-utilizing to read and write data.
+Vault architecture compromises of three distinct systems:
+
+- Client: Speaks to Vault over an API.
+- Server: Provides an API and serves requests.
+- Storage backend: Utilized by the server to read and write data.
+
 
 There is no mutual trust between the Vault client and server. Clients use
 [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) to verify the
@@ -93,7 +94,7 @@ permitted to make login requests.
 All server-to-server traffic between Vault instances within a cluster (i.e,
 high availability, enterprise replication or integrated storage) uses
 mutually-authenticated TLS to ensure the confidentiality and integrity of data
-in transit. Nodes are authenticated prior to joining the cluster, by an
+in transit. Nodes are authenticated prior to joining the cluster by an
 [unseal challenge](/docs/concepts/integrated-storage#vault-networking-recap) or
 a [one-time-use activation token](/docs/enterprise/replication#security-model).
 
@@ -105,11 +106,11 @@ Encryption Standard
 the [Galois Counter Mode
 (GCM)](https://en.wikipedia.org/wiki/Galois/Counter_Mode) with 96-bit nonces.
 The nonce is randomly generated for every encrypted object. When data is read
-from the security barrier the GCM authentication tag is verified during the
+from the security barrier, the GCM authentication tag is verified during the
 decryption process to detect any tampering.
 
 Depending on the backend used, Vault may communicate with the backend over TLS
-to provide an added layer of security. In some cases, such as a file backend
+to provide an added layer of security. In some cases, such as a file backend,
 this is not applicable. Because storage backends are untrusted, an eavesdropper
 would only gain access to encrypted data even if communication with the backend
 was intercepted.
@@ -118,7 +119,7 @@ was intercepted.
 
 Within the Vault system, a critical security concern is an attacker attempting
 to gain access to secret material they are not authorized to. This is an internal
-threat if the attacker is already permitted some level of access to Vault and is
+threat if the attacker is already permitted to some level of access to Vault, and is
 able to authenticate.
 
 When a client first authenticates with Vault, an auth method is used to verify
@@ -129,7 +130,7 @@ example, GitHub users in the "engineering" team may be mapped to the
 which is a randomly generated, serialized value and maps it to the policy list.
 This client token is then returned to the client.
 
-On each request a client provides this token. Vault then uses it to check that
+On each request, a client provides this token. Vault then uses it to check that
 the token is valid and has not been revoked or expired, and generates an ACL
 based on the associated policies. Vault uses a strict default deny
 enforcement strategy. This means unless an associated policy allows for a given action,
@@ -144,8 +145,8 @@ may be an exact match or the longest-prefix match glob pattern. See
 
 Certain operations are only permitted by "root" users, which is a distinguished
 policy built into Vault. This is similar to the concept of a root user on a
-Unix system or an Administrator on Windows. Although clients could be provided
-with root tokens or associated with the root policy, instead Vault supports the
+Unix system or an administrator on Windows. In cases where clients are provided
+with root tokens or associated with the root policy, Vault supports the
 notion of "sudo" privilege. As part of a policy, users may be granted "sudo"
 privileges to certain paths, so that they can still perform security sensitive
 operations without being granted global root access to Vault.
@@ -158,7 +159,7 @@ is started, it starts in a _sealed_ state. This means that the encryption key
 needed to read and write from the storage backend is not yet known. The process
 of unsealing requires providing the root key so that the encryption key can
 be retrieved. The risk of distributing the root key is that a single
-malicious actor with access to it can decrypt the entire Vault. Instead,
+malicious attacker with access to it can decrypt the entire Vault. Instead,
 Shamir's technique allows us to split the root key into multiple shares or
 parts. The number of shares and the threshold needed is configurable, but by
 default Vault generates 5 shares, any 3 of which must be provided to
@@ -173,11 +174,11 @@ Once unsealed the standard ACL mechanisms are used for all requests.
 To make an analogy, a bank puts security deposit boxes inside of a vault. Each
 security deposit box has a key, while the vault door has both a combination and
 a key. The vault is encased in steel and concrete so that the door is the only
-practical entrance. The analogy to Vault, is that the cryptosystem is the
+practical entrance. The analogy to Vault is that the cryptosystem is the
 steel and concrete protecting the data. While you could tunnel through the
 concrete or brute force the encryption keys, it would be prohibitively time
 consuming. Opening the bank vault requires two-factors: the key and the
 combination. Similarly, Vault requires multiple shares be provided to
 reconstruct the root key. Once unsealed, each security deposit boxes still
-requires the owner provide a key, and similarly the Vault ACL system protects
+requires that the owner provide a key, and similarly the Vault ACL system protects
 all the secrets stored.