Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry broker connections? #45

Open
reefland opened this issue Aug 15, 2022 · 2 comments
Open

Retry broker connections? #45

reefland opened this issue Aug 15, 2022 · 2 comments

Comments

@reefland
Copy link

Is there a way to configure connection retries? I had to bounce the broker and the mosquitto-exporter log ended with:

2022/08/15 00:49:37 Error: Connection to tcp://mosquitto-mqtt.mosquitto:1883 lost: EOF

No sign of a retry, the program doesn't exit out to trigger a container restart policy.

I manually restarted mosquitto-exporter and connected fine.

2022/08/15 17:57:14 Starting mosquitto_broker 0.8.0 (e268064), go1.17.2
2022/08/15 17:57:14 Connected to tcp://mosquitto-mqtt.mosquitto:1883
[store]    memorystore wiped
2022/08/15 17:57:14 Listening on 0.0.0.0:9234...

If the exporter can't connect, it should retry a set number of times and then exit out to allow the restart policy to kick in. Then it becomes a condition that can be monitored and fixed.

@reefland
Copy link
Author

I see in the code it tries to "connect forever" to the Broker, but I still get left with this in the logs:

2022/08/17 16:19:54 Error: Connection to tcp://mosquitto-mqtt.mosquitto:1883 lost: EOF

The broker is up and running, can connect fine with a client. Just restart the mosquitto-exporter container manually and it is able to connect again. Due to this forever loop I don't see a way to automate this restart when it is unable to connect since it does not error out and trigger a container restart policy.

I don't see a way to monitor that its unable to connect as it still publishes stale metrics even though it is unable to connect to broker. Seems the metrics it publishes are just stuck in time. These should drop to zero or become unavailable after some point to allow alerting.

The only thing I could think of was to detect the rate of change on messages published is stuck, then generate an alert:

    - alert: MosquittoPublishedMessagedAtZeroError
      annotations:
        description: Mosquitto MQTT published message rate is at zero for more than 1 minute.
        summary: Mosquitto MQTT published message rate is at zero for more than 1 minute.
      expr: rate(broker_publish_messages_sent[1m]) == 0
      for: 1m
      labels:
        issue: Mosquitto MQTT published message rate is at zero for more than 1 minute.
        severity: critical

I at least have an alert now, when mosquitto-exporter is not updating metrics, when I check its logs, its not connected, but I can't automate a solution to restart it. Zigbee2MQTT, HomeAssistant, Frigate, etc.... all connecting fine and maintaining connection. Just this exporter having a problem that I can tell.

@mateuszdrab
Copy link

mateuszdrab commented Dec 21, 2022

Came here to check the same issue 😂
I wonder if I could somehow trigger a remediation based on an alert to restart the pod

Could it be that the instantiation of the client needs to be repeated (moved into the for loop) after connection fails?
client := mqtt.NewClient(opts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants