(Doc+) Flush out Data Tiers #107981

stefnestor · 2024-04-27T19:54:13Z

👋🏽 howdy, team!

I highly value the content on this Data Tiers page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise.

The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs.

The specific clarifications I'd like to push in order of appearance:

There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up.
Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR [TIP] guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen).
There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about _tier_preference so just extended the existing section already about it.

TIA! 🙏 cc: @dakrone @bytebilly

👋🏽 howdy, team! I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. The specific clarifications I'd like to push in order of appearance: - There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. - (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen). - There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. TIA! 🙏

github-actions · 2024-04-27T19:54:26Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2024-04-27T19:54:36Z

@stefnestor please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation.

elasticsearchmachine · 2024-04-27T19:54:38Z

Pinging @elastic/es-docs (Team:Docs)

shainaraskas

🔥 you added so many great details in this PR!

I've reviewed and provided some feedback/edits from an organization and clarity POV. There are some nuances around tier hardware profiles that I didn't completely understand, so I apologize for any inaccuracies I injected with my edits and for any feedback that doesn't exactly align with your goals.

docs/reference/datatiers.asciidoc

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

docs/reference/datatiers.asciidoc

stefnestor · 2024-05-02T17:24:04Z

👋🏽 @shainaraskas , thanks for hanging out! Apologies for the delay, I work weekends so today's my Monday.

Your edits are also 🔥 , cheers! I accepted all grammar and most rewordings; I've left comments on what remains because I agree it matters to get these parts right to avoid confusion.

shainaraskas

just working through your comments on the index allocation section but thought I'd throw these comments your way :)

docs/reference/datatiers.asciidoc

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

shainaraskas

looking so good! left a couple of comments that are up to your preference.

I think we're basically ready to go, but I'm not sure why the tests are failing. looking into it now. 👍

edit: this looks like it's maybe the same error as your other PR, so I'm going to rebase this one too.

edit 2: after it's green and you check out my comments, feel free to merge (unless you're waiting on an engineering review).

docs/reference/datatiers.asciidoc

shainaraskas · 2024-05-03T16:14:06Z

we can also probably target 8.14.0, 8.13.3, and 8.13.4 with this so the docs are available asap.

docs/reference/datatiers.asciidoc

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

stefnestor · 2024-05-03T23:33:27Z

docs/reference/datatiers.asciidoc

+
+	- Search: 85% hot, 10% warm, 5% cold, and 1% frozen
+	- Ingest: 95% hot, 4% warm, 1% cold, and 0% frozen
+


👋🏽 @dakrone will you kindly review these proportional percentages per data tier for Dev sign-off? I believe the rest of this PR consolidates content from existing doc pages for clarity, but this call out uniquely makes a new claim.

Where did we get these numbers? I don't think we can make generalizations for these kinds of percentages, for example, it's perfectly valid to have a "search" load that's hot and frozen, where the searches hit each tier 50% of the time (again, the performance requirements aren't something we can supply, they have to come from the user).

On the ingestion side, I wouldn't expect any indexing at all on the warm and cold tiers, how did we arrive at the 4% and 1% numbers respectively?

Where did we get these numbers?

In PR description I highlighted that I guesstimated/made-up these numbers. Please only consider them placeholders.

for example, it's perfectly valid to have a "search" load that's hot and frozen, where the searches hit each tier 50% of the time (again, the performance requirements aren't something we can supply, they have to come from the user).

From Support, I may only deal with the situations where searches 50% hitting frozen breaks the cluster. The age-old example is Frozen tier having future dates takes down the entire cluster. I do want to highlight though that the existing doc does already say "Frozen tier nodes hold time series data that is accessed rarely and never updated.". I may be missing the intended interpretation, but "accessed rarely" does not sound like 50% to me but a lot more like the 1% I guesstimated.

On the ingestion side, I wouldn't expect any indexing at all on the warm and cold tiers, how did we arrive at the 4% and 1% numbers respectively?

Again guesstimated from the existing doc saying " Warm tier nodes hold time series data that is accessed less-frequently and rarely needs to be updated. ... Cold tier nodes hold time series data that is accessed infrequently and not normally updated.". I don't know what these numbers should be which is why I requested your feedback 🙂 .

I'm on board if in general we're concerned about explicit percentages, but at least from what I see users feel unguided and don't realize for desiring performance that they haven't architected in a way that'd get themselves there. That's the need I'm hoping to fill in better, but I'm not tied on how we do that. So if wording needs to change or we need to have an "it depends" blog instead and just link to it from here, all that's fine by me. But I would like to advocate for something more concrete to point users to for base level architecture / expectation setting.

dakrone

I left some comments for this change.

I also have concerns that we give a false sense of specificity with giving hard recommendations for percentages in these docs. My preference would be to teach the reader to weigh the values of cost, performance, and configuration complexity rather than giving hard numbers that are likely to mislead a user. I'm curious what your thoughts about this are.

dakrone · 2024-04-29T23:00:28Z

docs/reference/datatiers.asciidoc

+A _data tier_ is a collection of <<modules-node,nodes>> within a cluster which share the same 
+<<node-roles,data node role>>. Elastic recommends this collection of nodes also shares the same 
+hardware profile to avoid <<hotspotting,hot spotting>>. Data tiers' usage generally splits along 
+<<data-management,data categories>> for _content_ and _time series_ data. {es} available 


I think "content data" is a little ambiguous here, perhaps … for time series and non time series data.?

dakrone · 2024-04-29T23:01:41Z

docs/reference/datatiers.asciidoc

+* <<content-tier,Content tier>> nodes handle the indexing and query load for content 
+indices, such as a <<system-indices,system index>> or a product catalog.


System indices and data streams can also be time series data, so I don't think we should use it as an example here. I think we should stick with a timeseries/non-timeseries distinction.

This might be another 😕 point for me then if we can discuss:

Lower down on the existing page under Content header already says "System indices and other indices that aren’t part of a data stream are automatically allocated to the content tier." which is why I didn't realize I might be misunderstanding.

Support encourages users to keep all system indices on hot/content. Does Dev agree?

AFAIK (and it's an ongoing discussion / definition-problem) system indices are the indices which report from the snapshot's feature states. So from the unofficial list I wrote for Support we later learned e.g. .ilm-history and .kibana-event-log don't qualify as system indices. So e.g. only (A) qualify as system indices and AFAICT that subset doesn't have time series data (at least no indices which'd rollover. EDIT: other than the ML ones if that's what you were referencing?).

(A)

{ "feature_states": [ { "feature_name": "security", "indices": [".security-tokens-7",".security-7",".security-profile-8"] }, { "feature_name": "geoip", "indices": [".geoip_databases"] }, { "feature_name": "async_search", "indices": [".async-search"] }, { "feature_name": "machine_learning", "indices": [".ml-inference-native-000002",".ml-inference-000005",".ml-config"] }, { "feature_name": "transform", "indices": [".transform-internal-007"] }, { "feature_name": "kibana", "indices": [ ".kibana_analytics_8.12.2_001", ".kibana_task_manager_8.12.2_001", ".kibana_ingest_8.12.2_001", ".apm-custom-link", ".apm-agent-configuration", ".kibana_8.12.2_001", ".kibana_security_session_1", ".kibana_security_solution_8.12.2_001", ".kibana_alerting_cases_8.12.2_001" ] }, { "feature_name": "tasks", "indices": [".tasks"] }, { "feature_name": "fleet", "indices": [ ".fleet-agents-7", ".fleet-enrollment-api-keys-7", ".fleet-actions-7", ".fleet-policies-7", ".fleet-servers-7", ".fleet-policies-leader-7" ] } ] }

docs/reference/datatiers.asciidoc

dakrone · 2024-05-09T20:01:42Z

docs/reference/datatiers.asciidoc

+
+	- Search: 85% hot, 10% warm, 5% cold, and 1% frozen
+	- Ingest: 95% hot, 4% warm, 1% cold, and 0% frozen
+


Where did we get these numbers? I don't think we can make generalizations for these kinds of percentages, for example, it's perfectly valid to have a "search" load that's hot and frozen, where the searches hit each tier 50% of the time (again, the performance requirements aren't something we can supply, they have to come from the user).

On the ingestion side, I wouldn't expect any indexing at all on the warm and cold tiers, how did we arrive at the 4% and 1% numbers respectively?

dakrone · 2024-05-09T20:04:01Z

docs/reference/datatiers.asciidoc

+	- Search: 85% hot, 10% warm, 5% cold, and 1% frozen
+	- Ingest: 95% hot, 4% warm, 1% cold, and 0% frozen
+
+	You can check how your access requests are distributed among your data tiers using the <<cat-thread-pool,CAT thread pools>> API.  If your lower temperature tiers are being accessed at higher proportions, then your cluster performance might be impacted. 


I think looking through the cat threadpool API is a big request for an end user. It would be fairly easy to misunderstand, and since it's non-persistent it may give a very skewed view of a workload.

That's fair! I'm curious what alternative investigation you'd recommend since it's a current user need?

(I again may be ignorant of better ways. For the limited view I have: IME there's only hodge-podge answers like this outlined API currently but that would be a design improvement takeaway but not stop us from telling users the best they can introspect right now. A possible alternative would might be enabling Monitoring and then comparing node ingest rates; would that be better?)

dakrone · 2024-05-09T20:06:02Z

docs/reference/datatiers.asciidoc

+	These proportions are intended to serve as a general baseline that you can apply to your specific 
+	use case, hardware profiles, and architecture.


How would these actually be applied? You mention above "your requests should be distributed to data tiers in the following approximate proportions", but that's not something prescriptive a user can actually do.

We don't want them to try and route queries to different tiers based on ratios, but rather to size things accordingly. Again, I'm worried that we simplify the problem here, it's not only a performance trade-off but also one of cost (for which this does not account).

This is fair 🤔.

I did not list (my miss) but expected the answer to line up to Support's (A) hold data in higher tiers longer probably by updating an ILM policy, (B) where possible filter searches by time range to avoid load on lower tiers, or (C) review performance vs billing needs via the currently listed "apply to your specific use case, hardware profiles, and architecture". +(D) we recommend Searchable Snapshots to reduce billing while extending data retention.

docs/reference/datatiers.asciidoc

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>

stefnestor added >enhancement >docs General docs changes Team:Data Management Meta label for data/management team Team:Docs Meta label for docs team Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. labels Apr 27, 2024

elasticsearchmachine added v8.15.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Apr 27, 2024

elasticsearchmachine removed the Team:Data Management Meta label for data/management team label Apr 27, 2024

stefnestor added the Team:Data Management Meta label for data/management team label Apr 27, 2024

elasticsearchmachine removed the Team:Data Management Meta label for data/management team label Apr 27, 2024

shainaraskas self-requested a review April 29, 2024 15:33

shainaraskas reviewed Apr 29, 2024

View reviewed changes

stefnestor and others added 2 commits May 2, 2024 11:05

Grammar feedback

362201b

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

_tier_preference section feedback

9eed70d

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

stefnestor commented May 2, 2024

View reviewed changes

docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved

stefnestor commented May 2, 2024

View reviewed changes

docs/reference/datatiers.asciidoc Show resolved Hide resolved

shainaraskas reviewed May 2, 2024

View reviewed changes

Apply suggestions from code review

24035b3

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

shainaraskas approved these changes May 3, 2024

View reviewed changes

docs/reference/datatiers.asciidoc Outdated Show resolved Hide resolved

Merge branch 'main' into stefnestor-patch-7

955650a

shainaraskas reviewed May 3, 2024

View reviewed changes

docs/reference/datatiers.asciidoc Show resolved Hide resolved

shainaraskas added 2 commits May 3, 2024 13:50

Update docs/reference/datatiers.asciidoc

e6388c2

Update docs/reference/datatiers.asciidoc

b71b016

shainaraskas added the v8.14.0 label May 3, 2024

shainaraskas added v8.13.3 v8.14.1 v8.13.4 labels May 3, 2024

Apply suggestions from code review

0b9ca75

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

stefnestor commented May 3, 2024

View reviewed changes

elasticsearchmachine added v8.13.5 and removed v8.13.4 labels May 7, 2024

dakrone requested changes May 9, 2024

View reviewed changes

Grammar feedback

c72d632

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Doc+) Flush out Data Tiers #107981

(Doc+) Flush out Data Tiers #107981

stefnestor commented Apr 27, 2024

github-actions bot commented Apr 27, 2024

elasticsearchmachine commented Apr 27, 2024

elasticsearchmachine commented Apr 27, 2024

shainaraskas left a comment

stefnestor commented May 2, 2024

shainaraskas left a comment

shainaraskas left a comment •

edited

shainaraskas commented May 3, 2024

stefnestor May 3, 2024

dakrone May 9, 2024

stefnestor May 10, 2024

dakrone left a comment

dakrone Apr 29, 2024

dakrone Apr 29, 2024

stefnestor May 10, 2024 •

edited

dakrone May 9, 2024

dakrone May 9, 2024

stefnestor May 10, 2024

dakrone May 9, 2024

stefnestor May 10, 2024


		- Search: 85% hot, 10% warm, 5% cold, and 1% frozen
		- Ingest: 95% hot, 4% warm, 1% cold, and 0% frozen

		* <<content-tier,Content tier>> nodes handle the indexing and query load for content
		indices, such as a <<system-indices,system index>> or a product catalog.

		These proportions are intended to serve as a general baseline that you can apply to your specific
		use case, hardware profiles, and architecture.

(Doc+) Flush out Data Tiers #107981

Are you sure you want to change the base?

(Doc+) Flush out Data Tiers #107981

Conversation

stefnestor commented Apr 27, 2024

github-actions bot commented Apr 27, 2024

elasticsearchmachine commented Apr 27, 2024

elasticsearchmachine commented Apr 27, 2024

shainaraskas left a comment

Choose a reason for hiding this comment

stefnestor commented May 2, 2024

shainaraskas left a comment

Choose a reason for hiding this comment

shainaraskas left a comment • edited

Choose a reason for hiding this comment

shainaraskas commented May 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dakrone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefnestor May 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shainaraskas left a comment •

edited

stefnestor May 10, 2024 •

edited