Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert rule panic; runtime error: invalid memory address or nil pointer dereference #61304

Closed
robinkanters opened this issue Jan 11, 2023 · 6 comments 路 Fixed by #61721
Closed
Assignees
Labels
area/alerting/evaluation Issues when evaluating alerts area/expressions Server Side Expressions (SSE) triage/needs-confirmation used for OSS triage rotation - reported issue needs to be reproduced type/bug

Comments

@robinkanters
Copy link

robinkanters commented Jan 11, 2023

What happened:

  • Grafana quits unexpectedly from time to time, logging alert rule panic (stacktrace below)

What you expected to happen:

  • For it not to crash 馃槈

How to reproduce it (as minimally and precisely as possible):
I have the alert rule that I pasted below, if I click the Refresh button, it shows an error popup in the top right, I think this might be the cause. The screenshot below is of the rule with the uid from the stack trace.

Anything else we need to know?:

Environment:

  • Grafana version: v9.3.2 (21c1d14)
  • Data source type & version: AlertManager, Athena, cloudwatch, ElasticSearch, OpenSearch, Prometheus
  • OS Grafana is installed on: Linux (AWS ECS)
  • User OS & Browser: EndeavourOS; Firefox 108.0.2
  • Grafana plugins:
    • Amazon Athena
    • Direct Input
    • Grafana OnCall
    • OpenSearch
  • Others:

image

@mellieA
Copy link
Contributor

mellieA commented Jan 12, 2023

@robinkanters Does this same query complete as expected in the explore view? If so, are you able to share a sample of what the return data looks like?

@mellieA mellieA added needs more info Issue needs more information, like query results, dashboard or panel json, grafana version etc triage/needs-confirmation used for OSS triage rotation - reported issue needs to be reproduced area/alerting/evaluation Issues when evaluating alerts and removed needs more info Issue needs more information, like query results, dashboard or panel json, grafana version etc labels Jan 12, 2023
@yuri-tceretian yuri-tceretian added the area/expressions Server Side Expressions (SSE) label Jan 13, 2023
@yuri-tceretian
Copy link
Contributor

yuri-tceretian commented Jan 13, 2023

I can confirm the bug. The Cloudwatch plugin returns no data result as a data frame with a single value. After #55347 that value gets converted to a single value NoData

case mathexp.NoData:
newRes.Values = append(newRes.Values, v.New())

then when math expression tree is executed, the is_nan method walkFunc produces a slice with a nil element.

func (e *State) walkFunc(node *parse.FuncNode) (Results, error) {
var res Results
var err error
var in []reflect.Value
for _, a := range node.Args {
var v interface{}
switch t := a.(type) {
case *parse.StringNode:
v = t.Text
case *parse.VarNode:
v = e.Vars[t.Name]
case *parse.ScalarNode:
v = NewScalarResults(e.RefID, &t.Float64)
case *parse.FuncNode:
v, err = e.walkFunc(t)
case *parse.UnaryNode:
v, err = e.walkUnary(t)
case *parse.BinaryNode:
v, err = e.walkBinary(t)
default:
return res, fmt.Errorf("expr: unknown func arg type: %T", t)
}
if err != nil {
return res, err
}
in = append(in, reflect.ValueOf(v))
}

because functin isNill cannot handle NoData result correctly
func isNaN(e *State, varSet Results) (Results, error) {
newRes := Results{}
for _, res := range varSet.Values {
newVal, err := perFloat(e, res, func(f float64) float64 {
if math.IsNaN(f) {
return 1
}
return 0
})
if err != nil {
return newRes, err
}
newRes.Values = append(newRes.Values, newVal)
}
return newRes, nil

Then that result is provided to unary operator ! that crashes because nil it does not match any type in walkUnary matcher


and in the default branch it tries to call method Type() on the nil, which causes a crash.

cc @kylebrandt

@Roberto6969
Copy link

Update from 9.4.0-960xxx to 9.4.0-96993pre caused same error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x27a2d8c]

How can I resolve this?

Thank You.

@Roberto6969
Copy link

Update: Update from 9.4.0-96993pre to version=9.4.0-97045pre didn't solve problem.

Is this error somehow in relation with:

logger=ngalert.multiorg.alertmanager t=2023-01-18T11:35:46.792823325Z level=error msg="unable to create Alertmanager for org" org=6 error="unable to initialize the notification log component of alerting: proto: wrong wireType = 2 for field Idx"

@yuri-tceretian
Copy link
Contributor

yuri-tceretian commented Jan 18, 2023

Update: Update from 9.4.0-96993pre to version=9.4.0-97045pre didn't solve problem.

Is this error somehow in relation with:

logger=ngalert.multiorg.alertmanager t=2023-01-18T11:35:46.792823325Z level=error msg="unable to create Alertmanager for org" org=6 error="unable to initialize the notification log component of alerting: proto: wrong wireType = 2 for field Idx"

@Roberto6969 this is not related. Main seems to be broken (panics when starts with alertmanager configuration that were created by previous version). We're fixing this right now

@Roberto6969
Copy link

@yuri-tceretian Great - I appreciate your effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alerting/evaluation Issues when evaluating alerts area/expressions Server Side Expressions (SSE) triage/needs-confirmation used for OSS triage rotation - reported issue needs to be reproduced type/bug
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants