Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libbeat/management]: support filebeat inputs to report their status to elastic-agent #39209

Merged

Conversation

pkoutsovasilis
Copy link
Contributor

@pkoutsovasilis pkoutsovasilis commented Apr 25, 2024

Proposed commit message

This PR introduces the following:

  • Hierarchical StatusUnit that essentially wraps elastic-agent-client Units and calculates always the appropriate Status based on the both the State of the input but also the ones of individual streams. This hierarchy is the following:
    • When the client Unit state is anything besides Healthy this is immediately the status of the StatusUnit. This allows the existing runner allocation/deallocation logic to properly propagate any given status and highlight that something is changing, e.g. when a unit is modified the propagated status is configuring which is what we want.
    • When the client Unit state is Healthy then all statuses of stream are taken into consideration to calculate the final one. Here we care about Degraded and Failed stream statuses which we account and emit the appropriate status.
    • All active stream states are emitted in the payload of the checkin message to the agent and hopefully it can be used to augment even more the User experience.
  • Inject StatusReporter in v2.Context (used by v2.Plugins), this reporter affects only a specific stream status and thus the respective input can emit stream-wise statuses.
  • Add support of status reporting for CEL input. With this implementation I propose the following semantic meaning for statuses
    • Running: everything is happy, no error or warning produced during the operation of an input
    • Failed: when the input encountered an error that it can't continue from
    • Degraded: when the input encountered something abnormal but, due to lack of a better expression, it hasn't given up yet 😄 CEL does that a lot, it denies to say bb.
    • Configuring, Stopping, Stopped, Starting: These statuses are most suitable to be used by the input allocation/deallocation code and not directly from inside the input, as the former can and should override the status of the whole input.

Noteworthy code changes:

  • When we get a unitRemoved change from the elastic-agent-client, we don't directly remove the unit from the map of the manager but we mark it as soft deleted. Instead this is gonna be removed from the map just before we reload the runners. This is kinda necessary because if another unit change happens before the input runners are reloaded and the same input with the same stream ID is re-introduced the corresponding runner won't reload but now it won't hold an association to the StatusUnit as it was removed with the original code.
  • While I was writing an integration test to check all of the above, I noticed that the elastic-agent-client didn't pick up the change of input cfg and thus not Unit Modified change was received by the manager. Thus I did this; I will speak with the agent team and validate if this is an actual issue

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Remove custom elastic-agent-client replace in go.mod
  • Utilise spawned ES stack for integration test
  • Try to run this beats through an actual agent

How to test this PR locally

cd x-pack/filebeat
mage docker:composeUp
go test -v input/cel/integration/integration_test.go

Related issues

Use cases

N/A

Screenshots

output.mp4

Logs

=== RUN   TestCELInput
    integration_test.go:431: observed: version_info:{name:"beat-v2-client" meta:{key:"build_time" value:"0001-01-01 00:00:00 +0000 UTC"} meta:{key:"commit" value:"unknown"} build_hash:"unknown"}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 message:"Starting"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:1}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:1}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:DEGRADED message:"Some streams are Degraded" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:DEGRADED message:"All streams are Degraded" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:DEGRADED message:"Some streams are Degraded" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:DEGRADED message:"Some streams are Degraded" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:CONFIGURING message:"Configuring" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:1}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"}

@pkoutsovasilis pkoutsovasilis added the discuss Issue needs further discussion. label Apr 25, 2024
@pkoutsovasilis pkoutsovasilis self-assigned this Apr 25, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 25, 2024
Copy link
Contributor

mergify bot commented Apr 25, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @pkoutsovasilis? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 118 min 16 sec

Pipeline error 1

This error is likely related to the pipeline itself. Click here
and then you will see the error (either incorrect syntax or an invalid configuration).

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@pkoutsovasilis pkoutsovasilis force-pushed the pkoutsovasilis/input_agent_health branch from 149a0c5 to c3ef49b Compare May 9, 2024 11:36
Copy link
Contributor

mergify bot commented May 9, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b pkoutsovasilis/input_agent_health upstream/pkoutsovasilis/input_agent_health
git merge upstream/main
git push upstream pkoutsovasilis/input_agent_health

@pkoutsovasilis pkoutsovasilis force-pushed the pkoutsovasilis/input_agent_health branch from c3ef49b to c6294b7 Compare May 9, 2024 11:46
@pkoutsovasilis pkoutsovasilis added the Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution label May 9, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 9, 2024
@pkoutsovasilis pkoutsovasilis changed the title [Do not merge]: WIP allow inputs to report their state to the agent [libbeat/management]: allow inputs to report their state to the agent May 9, 2024
@pkoutsovasilis pkoutsovasilis changed the title [libbeat/management]: allow inputs to report their state to the agent [libbeat/management]: support filebeat inputs to report their status to elastic-agent May 9, 2024
@pkoutsovasilis pkoutsovasilis marked this pull request as ready for review May 9, 2024 13:22
@pkoutsovasilis pkoutsovasilis requested review from a team as code owners May 9, 2024 13:22
@elasticmachine
Copy link
Collaborator

Pinging @elastic/sec-deployment-and-devices (Team:Security-Deployment and Devices)

@pkoutsovasilis
Copy link
Contributor Author

@andrewkroh @cmacknz @belimawr there are some rough edges (checking if the issue I spotted is actually an issue with elastic-agent-client, making the integration test use the integration stack and not the one I spawned from elastic-package) but the logic is pretty much there

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label May 10, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label May 10, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@pkoutsovasilis
Copy link
Contributor Author

@pierrehilbert can somebody from the @elastic/elastic-agent-control-plane have a look on this PR?

Copy link
Contributor

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a filebeat or libbeat expert so I may have missed something, but I left a couple of comments about sections that can be simplified/refactored a bit a couple of questions

x-pack/filebeat/input/cel/integration_test.go Show resolved Hide resolved
x-pack/filebeat/input/cel/integration_test.go Show resolved Hide resolved
x-pack/libbeat/management/managerV2.go Show resolved Hide resolved
x-pack/libbeat/management/managerV2.go Outdated Show resolved Hide resolved
x-pack/libbeat/management/unit.go Outdated Show resolved Hide resolved
x-pack/libbeat/management/unit.go Outdated Show resolved Hide resolved
@cmacknz
Copy link
Member

cmacknz commented May 17, 2024

Looking at the video, instead of "All streams are degraded" you would be much better off picking the first error and setting it to the overall input state. This will immediately bring it to the user's attention and then they can solve errors one by one. You don't want to require forcing users to look directly at the contents of the .fleet-agent's document via the View agent JSON button.

To allow displaying the state of the streams, you can create an issue in Kibana for Fleet to start supporting this. Alternatively you can change your integrations to stop using streams and just template individual instances of each input, each of which will have their own ID and status. The net number of processes agent runs will be the same. If you are relying on the stream IDs for tracking persistent state this will change the ID on you though, so maybe this isn't viable everywhere.

@pkoutsovasilis
Copy link
Contributor Author

pkoutsovasilis commented May 18, 2024

Looking at the video, instead of "All streams are degraded" you would be much better off picking the first error and setting it to the overall input state. This will immediately bring it to the user's attention and then they can solve errors one by one. You don't want to require forcing users to look directly at the contents of the .fleet-agent's document via the View agent JSON button.

Having potentially more than one streams with either Degraded or Failed statuses but showing only one of them without saying to the user how many in total are having an issue seems weird to me. As a User flow, I would expect the following; An integration is reported as not Healthy, open up the logs and check for errors initially. When something similar to my answer below is implemented this will get finer granularity. Thus, to have a more wider consensus on the matter, I am gonna request the opinion of the reviewers that have already approved this PR; @andrewkroh, @efd6 , @belimawr what do you think?

To allow displaying the state of the streams, you can create an issue in Kibana for Fleet to start supporting this. Alternatively you can change your integrations to stop using streams and just template individual instances of each input, each of which will have their own ID and status. The net number of processes agent runs will be the same. If you are relying on the stream IDs for tracking persistent state this will change the ID on you though, so maybe this isn't viable everywhere.

I would go for supporting this in Kibana through the payload (with streams in it) but before I open an issue I need to have this PR finalised, aka approved by all teams that are the code-owners

@andrewkroh
Copy link
Member

With the status of the streams being routed to Fleet now, I think it makes sense for the UI to be updated to take advantage of this information directly. It can make sure that the most important information is presented to user for them to investigate, and aid them in that investigation.

I think that even the message that the Beat is creating about "Out of N streams, M are failed, X are degraded" should be the responsibility of the presentation layer to construct. If the Beat puts some input status message in there then the UI is basically obligated to use it even though it is something it could compute on its own (perhaps in a better form that is localized) because it won't be able to distinguish it from status data sourced from the input.

@cmacknz
Copy link
Member

cmacknz commented May 21, 2024

With the status of the streams being routed to Fleet now, I think it makes sense for the UI to be updated to take advantage of this information directly. It can make sure that the most important information is presented to user for them to investigate, and aid them in that investigation.

I agree we should update Fleet to take advantage of the extra streams payload if it is present, but since the change in Fleet currently is not coordinated or aligned with this change I want the information this PRs adds to be as useful as possible without the UI change.

I would actually rather us forbid using streams at all in integrations and just have everything be inputs so there is a 1:1 mapping of policy inputs to beat inputs but that is a breaking change because of the ID changing for the input today. Then there would be no UI work to do.

I think that even the message that the Beat is creating about "Out of N streams, M are failed, X are degraded" should be the responsibility of the presentation layer to construct. If the Beat puts some input status message in there then the UI is basically obligated to use it even though it is something it could compute on its own (perhaps in a better form that is localized) because it won't be able to distinguish it from status data sourced from the input.

I agree but there is currently a non-zero chance this lands in 8.15.0 and the UI change would land in 8.16.0+. We can mitigate this by ensuring the additional error context this change introduces is useful without the UI change. Making the overall input message something like "N of M streams failed, first error X" would accomplish this easily with little work and avoids temporarily requiring users to at the Fleet system indices to debug anything.

@pkoutsovasilis
Copy link
Contributor Author

pkoutsovasilis commented May 21, 2024

I agree we should update Fleet to take advantage of the extra streams payload if it is present, but since the change in Fleet currently is not coordinated or aligned with this change I want the information this PRs adds to be as useful as possible without the UI change.

I would actually rather us forbid using streams at all in integrations and just have everything be inputs so there is a 1:1 mapping of policy inputs to beat inputs but that is a breaking change because of the ID changing for the input today. Then there would be no UI work to do.

I agree but there is currently a non-zero chance this lands in 8.15.0 and the UI change would land in 8.16.0+. We can mitigate this by ensuring the additional error context this change introduces is useful without the UI change. Making the overall input message something like "N of M streams failed, first error X" would accomplish this easily with little work and avoids temporarily requiring users to at the Fleet system indices to debug anything.

UPDATE: @cmacknz this is now in 2e1e21f

I don't see any strong concerns raised on having such a behaviour from the reviewers that have already approved the previous behaviour - still such a behaviour feels weird to me - but I am gonna change the code to do what you propose here

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants