Adding a wait-for-status parameter to the health_report API #107963

masseyke · 2024-04-26T17:23:32Z

This adds a wait-for-status parameter to the health_report API, much like the one already in the cluster health API. This is mainly useful in integration tests, where we want to wait until the cluster is in a known state before proceeding. For a multi-node cluster it can take a little time for the health node to come up and start reporting.
Closes #107796

github-actions · 2024-04-26T17:23:42Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2024-04-26T17:23:56Z

Hi @masseyke, I've created a changelog YAML for you.

…masseyke/elasticsearch into adding-wait-for-status-to-health-report

DaveCTurner · 2024-04-26T18:59:01Z

Hmm I'm a bit hesitant to do this. All those ?wait_for_X parameters in GET _cluster/health are really only useful in tests, and they're kind of a pain to maintain in the production API. Are you sure there's no other way to achieve this? For ESIntegTestCase variants we should be able to reach into the cluster to wait more directly. For REST tests, maybe an entirely separate API that's only available in those tests would be better?

DaveCTurner · 2024-04-26T19:00:23Z

Also we're only waiting on cluster state updates here, but I believe the health report API will change state for reasons that don't directly relate to a cluster state update.

masseyke · 2024-04-26T19:23:26Z

Also we're only waiting on cluster state updates here, but I believe the health report API will change state for reasons that don't directly relate to a cluster state update.

Yeah good point. The reason I originally set out to deal with was the status being UNKNOWN, which is because the health service isn't' online yet. And we get a cluster state change event when it comes online.

DaveCTurner · 2024-04-26T20:28:17Z

I originally set out to deal with was the status being UNKNOWN

I could see value in waiting until the health service comes up, similar to how TransportMasterNodeAction waits (for 30s by default, controllable by ?master_timeout=) if there's no master at the time the request arrives. Indeed I think waiting like that could make sense as the default behaviour.

nielsbauman · 2024-05-05T12:00:55Z

@dct @masseyke I'm afraid we currently don't have a way to determine whether the health service is online. The log line Node [...] is selected as the current health node. doesn't mean the full health service is online. After this log line (which will result in a cluster state update as Keith mentioned), the LocalHealthMonitor instances on the individual nodes will still have to send their health info to the health node. So, even if we wait for the cluster state to include a health node, there's still a chance that some health indicators will report UNKNOWN.

Either we'll have to implement a mechanism for determining whether the health service is online, or we're back to the original idea of this PR -- where we'd still need to address David's point regarding health updates not necessarily resulting in cluster state updates. I admittedly don't immediately have a good idea for such a mechanism.

DaveCTurner · 2024-05-05T12:38:52Z

Either we'll have to implement a mechanism for determining whether the health service is online, or...

More strongly, I think the health API should wait to respond in the situation where it hasn't heard from all nodes yet, only returning UNKNOWN after some timeout.

nielsbauman · 2024-05-05T18:15:45Z

Ah yeah that could work I think. I wonder what would be a good default behavior. If a cluster has trouble starting up (i.e. it's not just slowly starting up, it's not starting up) or a new node has issues, it might be confusing if the Health API takes 10/30/whatever seconds to load. Also, we'd need to take managed/internal use cases into account (e.g. the health page on cloud.elastic.co and maybe AutoOps in the future).

Adding a wait-for-status parameter to the health_report API

6d3683e

masseyke added >enhancement :Data Management/Health v8.15.0 labels Apr 26, 2024

Update docs/changelog/107963.yaml

66818b2

masseyke added 2 commits April 26, 2024 13:43

updating skip version

7012166

Merge branch 'adding-wait-for-status-to-health-report' of github.com:…

2290c9b

…masseyke/elasticsearch into adding-wait-for-status-to-health-report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a wait-for-status parameter to the health_report API #107963

Adding a wait-for-status parameter to the health_report API #107963

masseyke commented Apr 26, 2024

github-actions bot commented Apr 26, 2024

elasticsearchmachine commented Apr 26, 2024

DaveCTurner commented Apr 26, 2024

DaveCTurner commented Apr 26, 2024

masseyke commented Apr 26, 2024

DaveCTurner commented Apr 26, 2024 •

edited

nielsbauman commented May 5, 2024

DaveCTurner commented May 5, 2024

nielsbauman commented May 5, 2024

Adding a wait-for-status parameter to the health_report API #107963

Are you sure you want to change the base?

Adding a wait-for-status parameter to the health_report API #107963

Conversation

masseyke commented Apr 26, 2024

github-actions bot commented Apr 26, 2024

elasticsearchmachine commented Apr 26, 2024

DaveCTurner commented Apr 26, 2024

DaveCTurner commented Apr 26, 2024

masseyke commented Apr 26, 2024

DaveCTurner commented Apr 26, 2024 • edited

nielsbauman commented May 5, 2024

DaveCTurner commented May 5, 2024

nielsbauman commented May 5, 2024

DaveCTurner commented Apr 26, 2024 •

edited