Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruler: register sender errors (thanos_alert_sender_errors_total) metric. #2100

Closed
OGKevin opened this issue Feb 5, 2020 · 2 comments · Fixed by #2101
Closed

Ruler: register sender errors (thanos_alert_sender_errors_total) metric. #2100

OGKevin opened this issue Feb 5, 2020 · 2 comments · Fixed by #2101

Comments

@OGKevin
Copy link
Contributor

OGKevin commented Feb 5, 2020

Thanos, Prometheus and Golang version used:
Thanos: https://github.com/thanos-io/thanos/releases/tag/v0.10.1

Object Storage Provider:

What happened:
There is a bug in the code here:

thanos/pkg/alert/alert.go

Lines 290 to 318 in 7c02430

s := &Sender{
logger: logger,
alertmanagers: alertmanagers,
versions: versions,
sent: prometheus.NewCounterVec(prometheus.CounterOpts{
Name: "thanos_alert_sender_alerts_sent_total",
Help: "Total number of alerts sent by alertmanager.",
}, []string{"alertmanager"}),
errs: prometheus.NewCounterVec(prometheus.CounterOpts{
Name: "thanos_alert_sender_errors_total",
Help: "Total number of errors while sending alerts to alertmanager.",
}, []string{"alertmanager"}),
dropped: prometheus.NewCounter(prometheus.CounterOpts{
Name: "thanos_alert_sender_alerts_dropped_total",
Help: "Total number of alerts dropped in case of all sends to alertmanagers failed.",
}),
latency: prometheus.NewHistogramVec(prometheus.HistogramOpts{
Name: "thanos_alert_sender_latency_seconds",
Help: "Latency for sending alert notifications (not including dropped notifications).",
}, []string{"alertmanager"}),
}
if reg != nil {
reg.MustRegister(s.sent, s.dropped, s.latency)
}
return s

The errors metric is not registered in the registry. This means that this metric is missing and therefore not exposed.

What you expected to happen:
This metric to exist.

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

Anything else we need to know:

OGKevin added a commit to OGKevin/thanos that referenced this issue Feb 5, 2020
thanos-io#2100
Fixes thanos-io#2100

Signed-off-by: Kevin Hellemun <17928966+OGKevin@users.noreply.github.com>
@bwplotka
Copy link
Member

bwplotka commented Feb 5, 2020

Ups!

bwplotka pushed a commit that referenced this issue Feb 5, 2020
* Register thanos_alert_sender_errors_total metric.

#2100
Fixes #2100

Signed-off-by: Kevin Hellemun <17928966+OGKevin@users.noreply.github.com>

* Updated CHANGELOG.

Signed-off-by: Kevin Hellemun <17928966+OGKevin@users.noreply.github.com>
@bwplotka
Copy link
Member

bwplotka commented Feb 5, 2020

Ok it's enough, we have too many of such issues. Let's automate this and add some test: #2102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants