elastic · stefnestor · Apr 27, 2024 · May 2, 2024 · May 2, 2024 · May 2, 2024
diff --git a/docs/reference/datatiers.asciidoc b/docs/reference/datatiers.asciidoc
@@ -2,40 +2,71 @@
 [[data-tiers]]
 == Data tiers
 
-A _data tier_ is a collection of nodes with the same data role that
-typically share the same hardware profile:
-
-* <<content-tier, Content tier>> nodes handle the indexing and query load for content such as a product catalog.
-* <<hot-tier, Hot tier>> nodes handle the indexing load for time series data such as logs or metrics
-and hold your most recent, most-frequently-accessed data.
-* <<warm-tier, Warm tier>> nodes hold time series data that is accessed less-frequently
+A _data tier_ is a collection of <<modules-node,nodes>> within a cluster that share the same 
+<<node-roles,data node role>>. Elastic recommends this collection of nodes also shares the same 
+hardware profile to avoid <<hotspotting,hot spotting>>. Data tiers' usage generally splits along 
+<<data-management,data categories>> for _content_ and _time series_ data. {es} available 
+data tiers:
+
+* <<content-tier,Content tier>> nodes handle the indexing and query load for content 
+indices, such as a <<system-indices,system index>> or a product catalog.
+* <<hot-tier,Hot tier>> nodes handle the indexing load for time series data, 
+such as logs or metrics. They hold your most recent, most-frequently-accessed data.
+* <<warm-tier,Warm tier>> nodes hold time series data that is accessed less-frequently
 and rarely needs to be updated.
 * <<cold-tier,Cold tier>> nodes hold time series data that is accessed
 infrequently and not normally updated. To save space, you can keep
 <<fully-mounted,fully mounted indices>> of
 <<ilm-searchable-snapshot,{search-snaps}>> on the cold tier. These fully mounted
 indices eliminate the need for replicas, reducing required disk space by
 approximately 50% compared to the regular indices.
-* <<frozen-tier, Frozen tier>> nodes hold time series data that is accessed 
+* <<frozen-tier,Frozen tier>> nodes hold time series data that is accessed 
 rarely and never updated. The frozen tier stores <<partially-mounted,partially
 mounted indices>> of <<ilm-searchable-snapshot,{search-snaps}>> exclusively.
 This extends the storage capacity even further — by up to 20 times compared to
 the warm tier. 
 
-IMPORTANT: {es} generally expects nodes within a data tier to share the same 
-hardware profile. Variations not following this recommendation should be 
+IMPORTANT: {es} generally expects nodes within a data tier to share the same
+hardware profile. Variations that don't follow this recommendation should be
 carefully architected to avoid <<hotspotting,hot spotting>>.
+Content data will remain on the <<content-tier,content tier>> for its entire 
+data lifecycle. You can configure your time series data to progress through the 
+descending temperature data tiers hot, warm, cold, and frozen according to your 
+performance, resiliency, and data retention requirements. Elastic recommends 
+automating these lifecycle transitions via <<index-lifecycle-management,{ilm}>>, 
+specifically also using <<data-streams,Data Streams>>. 
+
+[TIP]
+====
+A data tier's performance depends on its backing hardware profile. 
+See {cloud}/ec-configure-deployment-settings.html#ec-hardware-profiles[{ecloud}'s 
+hardware profiles] for example {cloud}/ec-reference-hardware.html[hardware configurations].
+
+{es} itself does not require but Elastic generally assumes, for example in {ecloud} 
+Deployment configurations, that descending temperature data tiers have an increasing 
+multiplier of cpu and/or heap resources to their data storage ratio, so that later data 
+tiers can gain more space for data storage at the cost of slower response times.
+
+Under this assumption for a general architecture baseline, the above outline of 
+descending temperature data tier access proportionalities would reflect as searches 
+hitting 85% hot, 10% warm, 5% cold, and 1% frozen and ingest targeting 
+95% hot, 4% warm, 1% cold, and 0% frozen as checked via 
+<<cat-thread-pool,CAT Threadpools>>. These proportions are not required by {es} 
+although they encourage stable and highly responsive clusters. They're only intended 
+to serve as a general architecture baseline to then be applied to your specific 
+use case, hardware profiles, and architecture per Elastic's 
+https://www.elastic.co/blog/it-depends[It Depends] philosphy.  
+====
 
-When you index documents directly to a specific index, they remain on content tier nodes indefinitely.
+[discrete]
+[[available-tier]]
+=== Available data tiers
 
-When you index documents to a data stream, they initially reside on hot tier nodes.
-You can configure <<index-lifecycle-management, {ilm}>> ({ilm-init}) policies
-to automatically transition your time series data through the hot, warm, and cold tiers
-according to your performance, resiliency and data retention requirements.
+Learn more about each data tier, including when and how it should be used.
 
 [discrete]
 [[content-tier]]
-=== Content tier
+==== Content tier
 
 // tag::content-tier[]
 Data stored in the content tier is generally a collection of items such as a product catalog or article archive.
@@ -50,13 +81,14 @@ While they are also responsible for indexing, content data is generally not inge
 as time series data such as logs and metrics. From a resiliency perspective the indices in this
 tier should be configured to use one or more replicas.
 
-The content tier is required. System indices and other indices that aren't part
-of a data stream are automatically allocated to the content tier.
+The content tier is required and is often deployed within the same node 
+grouping as the hot tier. System indices and other indices that aren't part
+of a data stream are automatically allocated to the content tier. 
 // end::content-tier[]
 
 [discrete]
 [[hot-tier]]
-=== Hot tier
+==== Hot tier
 
 // tag::hot-tier[]
 The hot tier is the {es} entry point for time series data and holds your most-recent,
@@ -71,7 +103,7 @@ data stream>> are automatically allocated to the hot tier.
 
 [discrete]
 [[warm-tier]]
-=== Warm tier
+==== Warm tier
 
 // tag::warm-tier[]
 Time series data can move to the warm tier once it is being queried less frequently
@@ -84,7 +116,7 @@ For resiliency, indices in the warm tier should be configured to use one or more
 
 [discrete]
 [[cold-tier]]
-=== Cold tier
+==== Cold tier
 
 // tag::cold-tier[]
 When you no longer need to search time series data regularly, it can move from
@@ -106,7 +138,7 @@ but doesn't reduce required disk space compared to the warm tier.
 
 [discrete]
 [[frozen-tier]]
-=== Frozen tier
+==== Frozen tier
 
 // tag::frozen-tier[]
 Once data is no longer being queried, or being queried rarely, it may move from
@@ -120,9 +152,15 @@ sometimes fetch frozen data from the snapshot repository, searches on the frozen
 tier are typically slower than on the cold tier.
 // end::frozen-tier[]
 
+[discrete]
+[[configure-data-tiers]]
+=== Configure data tiers
+
+Follow the instructions for your deployment type to configure data tiers.
+
 [discrete]
 [[configure-data-tiers-cloud]]
-=== Configure data tiers on {ess} or {ece}
+==== {ess} or {ece}
 
 The default configuration for an {ecloud} deployment includes a shared tier for
 hot and content data. This tier is required and can't be removed.
@@ -156,7 +194,7 @@ tier].
 
 [discrete]
 [[configure-data-tiers-on-premise]]
-=== Configure data tiers for self-managed deployments
+==== Self-managed deployments
 
 For self-managed deployments, each node's <<data-node,data role>> is configured
 in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster
@@ -174,25 +212,58 @@ tier.
 [[data-tier-allocation]]
 === Data tier index allocation
 
-When you create an index, by default {es} sets
-<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
+The <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>> setting determines the tier index shards should be allocated to.
+
+When you create an index, by default {es} sets the `_tier_preference`
 to `data_content` to automatically allocate the index shards to the content tier.
 
 When {es} creates an index as part of a <<data-streams, data stream>>,
-by default {es} sets
-<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
+by default {es} sets the `_tier_preference`
 to `data_hot` to automatically allocate the index shards to the hot tier.
 
-You can explicitly set `index.routing.allocation.include._tier_preference`
-to opt out of the default tier-based allocation.
+At the time of index creation, you can override the default setting by explicitly setting 
+the preferred value in one of two ways:
+
+- By using an <<index-templates,index template>>. Refer to <<getting-started-index-lifecycle-management,Automate rollover with ILM>> for details.
+- From within the <<indices-create-index,create index>> request body. 
+
+You can override this 
+setting after index creation by <<indices-update-settings,updating the index setting>> to the preferred 
+value. 
+
+In this setting, you can provide multiple tiers in order of preference to prevent indices from remaining unallocated if no nodes are available in the preferred tier.
+
+To remove the data tier preference 
+setting, set the `_tier_preference` value to `null`. This allows the index to allocate to any data node within the cluster. Setting the `_tier_preference` to `null` does not restore the default value. Note that, in the case of managed indices, a <<ilm-migrate,migrate>> action might apply a new value in its place. 
+
+[discrete]
+[[data-tier-allocation-value]]
+==== Determine the current data tier preference
+
+You can check an existing index's data tier preference by <<indices-get-settings,polling its 
+settings>> for `index.routing.allocation.include._tier_preference`:
+
+[source,console]
+--------------------------------------------------
+GET /my-index-000001/_settings?filter_path=*.settings.index.routing.allocation.include._tier_preference
+--------------------------------------------------
+
+[discrete]
+[[data-tier-allocation-troubleshooting]]
+==== Troubleshooting
+
+The `_tier_preference` setting might conflict with other allocation settings. This conflict might prevent the shard from allocating. A conflict might occur when a cluster has not yet been completely <<troubleshoot-migrate-to-tiers,migrated 
+to data tiers>>. 
+
+This setting will not unallocate a currently allocated shard, but might prevent it from migrating from its current location to its designated data tier. To troubleshoot, call the <<cluster-allocation-explain,cluster allocation explain API>> and specify the suspected problematic shard.
 
 [discrete]
 [[data-tier-migration]]
-=== Automatic data tier migration
+==== Automatic data tier migration
 
 {ilm-init} automatically transitions managed
 indices through the available data tiers using the <<ilm-migrate, migrate>> action.
 By default, this action is automatically injected in every phase.
-You can explicitly specify the migrate action with `"enabled": false` to disable automatic migration,
+You can explicitly specify the migrate action with `"enabled": false` to <<ilm-disable-migrate-ex,disable automatic migration>>,
 for example, if you're using the <<ilm-allocate, allocate action>> to manually
 specify allocation rules.