diff --git a/docs/reference/datatiers.asciidoc b/docs/reference/datatiers.asciidoc index 4aff273588926..442bc8303cb9c 100644 --- a/docs/reference/datatiers.asciidoc +++ b/docs/reference/datatiers.asciidoc @@ -2,13 +2,17 @@ [[data-tiers]] == Data tiers -A _data tier_ is a collection of nodes with the same data role that -typically share the same hardware profile: - -* <> nodes handle the indexing and query load for content such as a product catalog. -* <> nodes handle the indexing load for time series data such as logs or metrics -and hold your most recent, most-frequently-accessed data. -* <> nodes hold time series data that is accessed less-frequently +A _data tier_ is a collection of <> within a cluster which share the same +<>. Elastic recommends this collection of nodes also shares the same +hardware profile to avoid <>. Data tiers' usage generally splits along +<> for _content_ and _time series_ data. {es} available +data tiers: + +* <> nodes handle the indexing and query load for content +indices, such as a <> or a product catalog. +* <> nodes handle the indexing load for time series, +such as logs or metrics. They hold your most recent, most-frequently-accessed data. +* <> nodes hold time series data that is accessed less-frequently and rarely needs to be updated. * <> nodes hold time series data that is accessed infrequently and not normally updated. To save space, you can keep @@ -16,26 +20,48 @@ infrequently and not normally updated. To save space, you can keep <> on the cold tier. These fully mounted indices eliminate the need for replicas, reducing required disk space by approximately 50% compared to the regular indices. -* <> nodes hold time series data that is accessed +* <> nodes hold time series data that is accessed rarely and never updated. The frozen tier stores <> of <> exclusively. This extends the storage capacity even further — by up to 20 times compared to the warm tier. -IMPORTANT: {es} generally expects nodes within a data tier to share the same -hardware profile. Variations not following this recommendation should be -carefully architected to avoid <>. - -When you index documents directly to a specific index, they remain on content tier nodes indefinitely. +Content data will remain on the <> for its entire +data lifecycle. You can configure your time series data to progress through the +descending temperature data tiers hot, warm, cold, and frozen according to your +performance, resiliency, and data retention requirements. Elastic recommends +automating these lifecycle transitions via <>, +specifically also using <>. + +[TIP] +==== +A data tiers' performance is highly subjective to its backing hardware profile. +See {cloud}/ec-configure-deployment-settings.html#ec-hardware-profiles[{ecloud}'s +hardware profiles] for example {cloud}/ec-reference-hardware.html[hardware configurations]. + +{es} itself does not require but Elastic generally assumes, for example in {ecloud} +Deployment configurations, that descending temperature data tiers have an increasing +multiplier of cpu and/or heap resources to their data storage ratio, so that later data +tiers can gain more space for data storage at the cost of slower response times. + +Under this assumption for a general architecture baseline, the above outline of +descending temperature data tier access proportionalities would reflect as searches +hitting 85% hot, 10% warm, 5% cold, and 1% frozen and ingest targeting +95% hot, 4% warm, 1% cold, and 0% frozen as checked via +<>. These proportions are not required by {es} +although they encourage stable and highly responsive clusters. They're only intended +to serve as a general architecture baseline to then be applied to your specific +use case, hardware profiles, and architecture per Elastic's +https://www.elastic.co/blog/it-depends[It Depends] philosphy. +==== -When you index documents to a data stream, they initially reside on hot tier nodes. -You can configure <> ({ilm-init}) policies -to automatically transition your time series data through the hot, warm, and cold tiers -according to your performance, resiliency and data retention requirements. +[discrete] +[[available-tier]] +=== Available data tiers [discrete] [[content-tier]] -=== Content tier +==== Content tier // tag::content-tier[] Data stored in the content tier is generally a collection of items such as a product catalog or article archive. @@ -50,13 +76,14 @@ While they are also responsible for indexing, content data is generally not inge as time series data such as logs and metrics. From a resiliency perspective the indices in this tier should be configured to use one or more replicas. -The content tier is required. System indices and other indices that aren't part -of a data stream are automatically allocated to the content tier. +The content tier is required and is frequently seen deployed within the same node +grouping as the hot tier. System indices and other indices that aren't part +of a data stream are automatically allocated to the content tier. // end::content-tier[] [discrete] [[hot-tier]] -=== Hot tier +==== Hot tier // tag::hot-tier[] The hot tier is the {es} entry point for time series data and holds your most-recent, @@ -71,7 +98,7 @@ data stream>> are automatically allocated to the hot tier. [discrete] [[warm-tier]] -=== Warm tier +==== Warm tier // tag::warm-tier[] Time series data can move to the warm tier once it is being queried less frequently @@ -84,7 +111,7 @@ For resiliency, indices in the warm tier should be configured to use one or more [discrete] [[cold-tier]] -=== Cold tier +==== Cold tier // tag::cold-tier[] When you no longer need to search time series data regularly, it can move from @@ -106,7 +133,7 @@ but doesn't reduce required disk space compared to the warm tier. [discrete] [[frozen-tier]] -=== Frozen tier +==== Frozen tier // tag::frozen-tier[] Once data is no longer being queried, or being queried rarely, it may move from @@ -120,9 +147,13 @@ sometimes fetch frozen data from the snapshot repository, searches on the frozen tier are typically slower than on the cold tier. // end::frozen-tier[] +[discrete] +[[configure-data-tiers]] +=== Configure data tiers + [discrete] [[configure-data-tiers-cloud]] -=== Configure data tiers on {ess} or {ece} +==== On {ess} or {ece} The default configuration for an {ecloud} deployment includes a shared tier for hot and content data. This tier is required and can't be removed. @@ -156,7 +187,7 @@ tier]. [discrete] [[configure-data-tiers-on-premise]] -=== Configure data tiers for self-managed deployments +==== On self-managed deployments For self-managed deployments, each node's <> is configured in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster @@ -174,25 +205,49 @@ tier. [[data-tier-allocation]] === Data tier index allocation -When you create an index, by default {es} sets -<> -to `data_content` to automatically allocate the index shards to the content tier. - -When {es} creates an index as part of a <>, -by default {es} sets -<> -to `data_hot` to automatically allocate the index shards to the hot tier. - -You can explicitly set `index.routing.allocation.include._tier_preference` -to opt out of the default tier-based allocation. +You can check an existing index's data tier by <> for <>: + +[source,console] +-------------------------------------------------- +GET /my-index-000001/_settings?filter_path=*.settings.index.routing.allocation.include._tier_preference +-------------------------------------------------- + +This `_tier_preference` setting may include a descending preference list for later data tier +temperatures, for example <> would state `data_cold,data_warm,data_hot`. +See <> for more context. + +{es} will attempt to <> the index's shards +according to this setting. This setting will not overpower and may conflict with +other allocation settings preventing the shard from allocating. This historically +has occurred when a cluster has not yet been or has been insufficiently <>. This setting will not unallocate a currently allocated shard, but +may for example prevent it from migrating from its current location to its designated +data tier. To troubleshoot, run <> +against the suspected problematic shard. + +A created index will default the `_tier_preference` setting to `data_content` which +will allocate the index' shards to the content tier. A <> +will override its backing created index to `data_hot` to instead default allocate to the +hot tier. You can override these default actions upon index creation by explicitly setting +the preferred value either via an <>, see +<>, or from within the +<> request body itself. You may also override this +setting at any time by <> to the preferred +value. + +You may set the `_tier_preference` value to `null` to remove the data tier preference +setting which will allow it to allocate to any data node within the cluster and will not +reset the index's setting back to its respective upon-creation default. Forewarning if you +do that an <> may apply a value at a later point if the index is managed. [discrete] [[data-tier-migration]] -=== Automatic data tier migration +==== Automatic data tier migration {ilm-init} automatically transitions managed indices through the available data tiers using the <> action. By default, this action is automatically injected in every phase. -You can explicitly specify the migrate action with `"enabled": false` to disable automatic migration, +You can explicitly specify the migrate action with `"enabled": false` to <>, for example, if you're using the <> to manually specify allocation rules.