Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: Mark the rule's restoration process as completed always #14048

Merged
merged 3 commits into from
May 3, 2024

Conversation

gotjosh
Copy link
Member

@gotjosh gotjosh commented May 3, 2024

In #13980 I introduced a change to reduce the number of queries executed when we restore alert statuses.

With this, the querying semantics changed as we now need to go through all series before we enter the alert restoration loop and I missed the fact that exiting early when there are no rules to restore would lead to an incomplete restoration.

An alert being restored is used as a proxy for "we're now ready to write ALERTS/ALERTS_FOR_SERIES metrics" so as a result we weren't writing the series if we didn't restore anything the first time around

PD: I have not updated the changelog as technically this is still unreleased.

In #13980 I introduced a change to reduce the number of queries executed when we restore alert statuses.

With this, the querying semantics changed as we now need to go through all series before we enter the alert restoration loop and I missed the fact that exiting early when there are no rules to restore would lead to an incomplete restoration.

An alert being restored is used as a proxy for "we're now ready to write `ALERTS/ALERTS_FOR_SERIES` metrics" so as a result we weren't writing the series if we didn't restore anything the first time around.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice catch

rules/group.go Outdated Show resolved Hide resolved
- Improve the log line when no series are found for the rule
- Drop a comment explaining why we set the alert rule to restored even when we failed to query

Signed-off-by: gotjosh <josue.abreu@gmail.com>
@gotjosh gotjosh merged commit c10186e into main May 3, 2024
42 checks passed
@gotjosh gotjosh deleted the gotjosh/fix-alert-restoration branch May 3, 2024 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants