Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Change logger level to debug when delete a rule without alerts produce an error or warn when untracking alerts #183207

Merged
merged 9 commits into from May 15, 2024

Conversation

cnasikas
Copy link
Member

Summary

Fixes: #182754

For maintainers

@cnasikas cnasikas added enhancement New value added to drive a business result Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.15.0 labels May 11, 2024
@cnasikas cnasikas self-assigned this May 11, 2024
@cnasikas cnasikas requested a review from a team as a code owner May 11, 2024 09:13
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@cnasikas cnasikas requested a review from pmuellr May 11, 2024 09:15
@cnasikas
Copy link
Member Author

@elasticmachine merge upstream

@cnasikas
Copy link
Member Author

@elasticmachine merge upstream

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pmuellr
Copy link
Member

pmuellr commented May 13, 2024

Hmmm, perhaps my suggestion of changing the error/warn to debug was not the best :-)

It looks like this would end up setting the level for any errors when alerts are updated. My main issue was having messages when there aren't any alerts to update.

Peeking at the code, I found this:

if (total === 0 && response.total === 0) {
throw new Error('No active alerts matched the query');
}

Can we just not throw an error here? Or maybe that is indicative of some kind of error, in some case through the loop and with that query? But it is definitely the case that when a rule is deleted / disabled and has no alerts, these logs are generated - which is what I want to avoid.

@cnasikas
Copy link
Member Author

cnasikas commented May 14, 2024

Good point @pmuellr! I missed that. I reverted the log levels and stopped throwing an error when alerts were not found in e0c65fe (#183207). I decided to return an empty array if there are no results. I checked where the function is being called and it seems safe to do so. The only place where the response is used is at

const taskIds = [...new Set(result.map((doc) => doc[ALERT_RULE_UUID]))];
. Let me know if that's ok.

@cnasikas cnasikas requested a review from ymao1 May 14, 2024 13:18
@@ -426,6 +426,8 @@ describe('setAlertsToUntracked()', () => {
},
});

clusterClient.updateByQuery.mockResponseOnce({ total: 2, updated: 2 });
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mock the response of updateByQuery for the test to test the correct behavior of the code.

@pmuellr
Copy link
Member

pmuellr commented May 14, 2024

I reverted the log levels and stopped throwing an error when alerts were not found in e0c65fe (#183207). I decided to return an empty array if there are no results. I checked where the function is being called and it seems safe to do so. The only place where the response is used is at

const taskIds = [...new Set(result.map((doc) => doc[ALERT_RULE_UUID]))];

. Let me know if that's ok.

That seems like the right thing to do. Following that bulk untrack module, it seems like it might actually try to make some kind of bulk call later (to clear out some alerts in task state docs) with an empty array. Starting here:

const taskIds = [...new Set(result.map((doc) => doc[ALERT_RULE_UUID]))];
await context.taskManager.bulkUpdateState(taskIds, (state, id) => {
try {
const uuidsToClear = result
.filter((doc) => doc[ALERT_RULE_UUID] === id)
.map((doc) => doc[ALERT_UUID]);

Not sure how ES will handle a bulk update with no actual updates :-), or perhaps it's short-circuited elsewhere. Seems like deleting a rule which never had an active alert will tell us. I have a feeling we may need to short-circuit that ourselves, if the number of alerts is 0 ...

@cnasikas
Copy link
Member Author

I have a feeling we may need to short-circuit that ourselves, if the number of alerts is 0 ...

I was also skeptical about it but wanted your opinion before doing it 🙂. Thanks! if we return early like

 if (taskIds.length === 0) { <---- should we context.auditLogger?.log here???
      return;
    }
    
    await context.taskManager.bulkUpdateState(taskIds, (state, id) => { //// })
    
    context.auditLogger?.log(
      ruleAuditEvent({
        action: RuleAuditAction.UNTRACK_ALERT,
        outcome: 'success',
      })
    );

should we also audit log when we return without calling context.taskManager.bulkUpdateState?

@pmuellr
Copy link
Member

pmuellr commented May 14, 2024

should we also audit log when we return without calling context.taskManager.bulkUpdateState?

Great question - should we log that we did nothing? :-) I guess I'm thinking we should. If we don't, then the pattern of audit logs would look no different than if something went wrong and that log doc didn't get written. So, it might look like an error to someone, to not see that. @mikecote thoughts?

@mikecote
Copy link
Contributor

mikecote commented May 14, 2024

Great question - should we log that we did nothing? :-) I guess I'm thinking we should. If we don't, then the pattern of audit logs would look no different than if something went wrong and that log doc didn't get written. So, it might look like an error to someone, to not see that. @mikecote thoughts?

Yes I think so, it matches patterns like enabling an already enabled rule and such (audit log events are still logged).

@cnasikas
Copy link
Member Author

@pmuellr @mikecote Done in 45bef8e (#183207)

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @cnasikas

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cnasikas cnasikas merged commit d1806c9 into elastic:main May 15, 2024
35 checks passed
@cnasikas cnasikas deleted the fix_182754 branch May 15, 2024 14:50
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting enhancement New value added to drive a business result Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ResponseOps] warnings/errors when deleting/disabling rules with no active alerts
7 participants