Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promtool check rules does not take into account expression #7301

Closed
gotjosh opened this issue May 27, 2020 · 6 comments
Closed

promtool check rules does not take into account expression #7301

gotjosh opened this issue May 27, 2020 · 6 comments

Comments

@gotjosh
Copy link
Member

gotjosh commented May 27, 2020

What did you do?

Run promtool check rules example.yaml where example.yaml is:

groups:
  - name: example
    rules:
      - expr: |
          sum by (code, verb) (increase(apiserver_request_total{job="default/kubernetes",verb="LIST",code=~"2.."}[30d]))
        record: code_verb:apiserver_request_total:increase30d
      - expr: |
          sum by (code, verb) (increase(apiserver_request_total{job="default/kubernetes",verb="GET",code=~"2.."}[30d]))
        record: code_verb:apiserver_request_total:increase30d

What did you expect to see?

Rules are all good, with no duplicates.

What did you see instead? Under which circumstances?

Checking example.yaml
1 duplicate rules(s) found.
Metric: code_verb:apiserver_request_total:increase30d
Label(s):
Might cause inconsistency while recording expressions.
  SUCCESS: 2 rules found

The reason for this is that our duplicate check only takes into account metric names and added labels but not the actual expression. Is this correct? Happy to submit a fix if this is not the case.

for _, group := range groups {
for index, rule := range group.Rules {
inst := compareRuleType{
metric: ruleMetric(rule),
label: rule.Labels,
}
for i := 0; i < index; i++ {
t := compareRuleType{
metric: ruleMetric(group.Rules[i]),
label: group.Rules[i].Labels,
}
if reflect.DeepEqual(t, inst) {
duplicates = append(duplicates, t)
}
}

@brian-brazil
Copy link
Contributor

Hmm, I had thought this was meant to check for the rule being identical.

@roidelapluie
Copy link
Member

I think that this is valid as long as it is a warning and it succeeds at the end.

The rule should be written as

groups:
  - name: example
    rules:
      - expr: |
          sum by (code, verb) (increase(apiserver_request_total{job="default/kubernetes",verb="LIST",code=~"2.."}[30d]))
          or
          sum by (code, verb) (increase(apiserver_request_total{job="default/kubernetes",verb="GET",code=~"2.."}[30d]))
        record: code_verb:apiserver_request_total:increase30d

In which case it would not be able to produce identical results.

@gotjosh
Copy link
Member Author

gotjosh commented May 27, 2020

I think that this is valid as long as it is a warning and it succeeds at the end.

A warning or not, the message seems... Deceiving? Given we're saying this is a rule/alert is a duplicate when it actually isn't. The name is a duplicate but the queries are different.

@wbh1
Copy link
Contributor

wbh1 commented Jun 8, 2022

Since v2.35.0, this is reporting failures instead of just warning. Seems intentional based on #10435, but now causes misleading failures by default since the expressions are not taken into account.

v2.34.0

Checking /tmp/default.yml
1 duplicate rule(s) found.
Metric: VeryLowDiskSpace
Label(s):
        severity: critical
Might cause inconsistency while recording expressions.
  SUCCESS: 54 rules found

v2.35.0

Checking /tmp/default.yml
  FAILED:
lint error 1 duplicate rule(s) found.
Metric: VeryLowDiskSpace
Label(s):
        severity: critical
Might cause inconsistency while recording expressions

@dgl
Copy link
Member

dgl commented Jun 9, 2022

@wbh1 sorry, yes, it was a regression, already fixed by #10815 -- we should probably backport that to a 2.36.1.

Note for the general issue, in most cases where rules are split along a label value you can provide a label to avoid the warning:

groups:
  - name: example
    rules:
      - expr: |
          sum by (code, verb) (increase(apiserver_request_total{job="default/kubernetes",verb="LIST",code=~"2.."}[30d]))
        record: code_verb:apiserver_request_total:increase30d
        labels:
          verb: LIST
      - expr: |
          sum by (code, verb) (increase(apiserver_request_total{job="default/kubernetes",verb="GET",code=~"2.."}[30d]))
        record: code_verb:apiserver_request_total:increase30d
        labels:
          verb: GET

It might be nice for promtool to just not warn if it can work out the expression can't overlap, although PromQL semantics are such it's not guaranteed a queried label is in the returned set. However part of the reason for the --lint change is to make it possible for us to add more lint style warnings (on an opt-in basis) so we could do this while giving people with remote read setups (the main reason you'd find a "missing" label) a way to not get warnings.

@beorn7
Copy link
Member

beorn7 commented Apr 30, 2024

Hello from the bug scrub.

We feel that the various additional flags address the issue sufficiently. If somebody wants to refine things even more, please follow up here and we'll re-open the issue.

@beorn7 beorn7 closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants