Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: allows using multiple web listen addresses #213

Conversation

wookietreiber
Copy link
Contributor

@wookietreiber wookietreiber commented Aug 30, 2023

these variants should work now:

alertmanager_web_listen_address: '127.0.0.1:9093'

alertmanager_web_listen_address:
  - '127.0.0.1:9093'
  - '127.0.1.1:9093'

fixes #115

@wookietreiber wookietreiber force-pushed the alertmanager/multiple-web-listen-addresses branch 3 times, most recently from b9f18e5 to febecbc Compare August 30, 2023 08:38
@wookietreiber wookietreiber changed the title [enhancement] allows using multiple web listen addresses enhancement: allows using multiple web listen addresses Aug 30, 2023
@github-actions github-actions bot added enhancement New feature or request roles/alertmanager labels Aug 30, 2023
@wookietreiber wookietreiber force-pushed the alertmanager/multiple-web-listen-addresses branch from febecbc to ee4c479 Compare August 30, 2023 08:43
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 30, 2023
@wookietreiber wookietreiber force-pushed the alertmanager/multiple-web-listen-addresses branch from ee4c479 to 1513a78 Compare August 30, 2023 08:44
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 30, 2023
@wookietreiber wookietreiber force-pushed the alertmanager/multiple-web-listen-addresses branch from 1513a78 to 279bf0d Compare August 30, 2023 08:46
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 30, 2023
Copy link
Member

@gardar gardar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the version of alertmanager used in the alternative molecule test is too old for multiple web listen addresses.
Please add a version check in the template so that we won't attempt to use the unsupported flag on older versions.

We should probably also add support for multiple web listen addresses in the other roles where possible.
prometheus/exporter-toolkit#91

@wookietreiber
Copy link
Contributor Author

wookietreiber commented Aug 31, 2023

Please add a version check in the template so that we won't attempt to use the unsupported flag on older versions.

Do you know which version this was introduced? The changelog doesn't say (searched for both listen-address and toolkit) and I really don't wanna browse git log -p 😅


Edit: Okay, so I did browse the git log -p of prometheus/alertmanager a bit. I went with 0.25.0 for now, although I'm not sure at all if that's really the first version supporting multiple web listen addresses.

@wookietreiber wookietreiber force-pushed the alertmanager/multiple-web-listen-addresses branch from 279bf0d to 74574d4 Compare August 31, 2023 07:54
@github-actions github-actions bot added enhancement New feature or request roles/systemd_exporter and removed enhancement New feature or request labels Aug 31, 2023
@gardar
Copy link
Member

gardar commented Aug 31, 2023

Please add a version check in the template so that we won't attempt to use the unsupported flag on older versions.

Do you know which version this was introduced? The changelog doesn't say (searched for both listen-address and toolkit) and I really don't wanna browse git log -p 😅

Edit: Okay, so I did browse the git log -p of prometheus/alertmanager a bit. I went with 0.25.0 for now, although I'm not sure at all if that's really the first version supporting multiple web listen addresses.

I think the change was introduced in this PR prometheus/exporter-toolkit#95 so any version which uses exporter-toolkit v0.8.0 or later (prometheus/exporter-toolkit@6b1221e)
You can see that in alertmanager v0.25.0 the version of exporter-toolkit was bumped to v0.8.1 prometheus/alertmanager@ce7b475

Can you add multiple web listen address support to the other roles as well?

@wookietreiber wookietreiber force-pushed the alertmanager/multiple-web-listen-addresses branch from 74574d4 to 4066fd4 Compare September 4, 2023 08:38
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 4, 2023
@wookietreiber
Copy link
Contributor Author

I think the change was introduced in this PR prometheus/exporter-toolkit#95 so any version which uses exporter-toolkit v0.8.0 or later (prometheus/exporter-toolkit@6b1221e) You can see that in alertmanager v0.25.0 the version of exporter-toolkit was bumped to v0.8.1 prometheus/alertmanager@ce7b475

@gardar thanks, I was looking for exporter-toolkit in this collection 😅

@wookietreiber
Copy link
Contributor Author

Can you add multiple web listen address support to the other roles as well?

Will do.

@wookietreiber wookietreiber force-pushed the alertmanager/multiple-web-listen-addresses branch from 4066fd4 to 65111b1 Compare September 4, 2023 09:09
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 4, 2023
@github-actions github-actions bot added the enhancement New feature or request label Nov 28, 2023
@gardar
Copy link
Member

gardar commented Nov 28, 2023

Don't know what to do about this one, the only chrony fail with Ansible 2.9:

TASK [prometheus.prometheus.chrony_exporter : Discover latest version] *********
fatal: [almalinux-8]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://api.github.com/repos/superq/chrony_exporter/releases/latest : HTTP Error 403: rate limit exceeded"}

Is this spurious? Can we change rate limiting / retries / et al. for CI?

We already do include a github authentication token that we get from the CI env, which raised the limit substantially #91
Not sure what else we could do, but suggestions definitely welcome!

Edit: When I drop the snmp_exporter_version: 0.21.0 line I added to use 0.24.1 from the defaults, the tests succeed, so I'm gonna commit that 😅

Great! It's always good when you can avoid things such asignore_errors failed_when: false etc.

@wookietreiber
Copy link
Contributor Author

Prometheus itself does not yet support multiple web listen addresses, that's why its alternative is failing:

Nov 28 11:58:45 ubuntu-22.04 prometheus[2231]: Error parsing command line arguments: flag 'web.listen-address' cannot be repeated

Tested with both 2.40.0 and 2.48.0. I'm cutting it out but I'm leaving the prep in like I did with both smartctl_exporter and systemd_exporter.

Signed-off-by: Christian Krause <christian.krause@idiv.de>
Signed-off-by: Christian Krause <christian.krause@idiv.de>
Signed-off-by: Christian Krause <christian.krause@idiv.de>
Signed-off-by: Christian Krause <christian.krause@idiv.de>
Signed-off-by: Christian Krause <christian.krause@idiv.de>
Signed-off-by: Christian Krause <christian.krause@idiv.de>
@wookietreiber wookietreiber force-pushed the alertmanager/multiple-web-listen-addresses branch from 0e11f6b to 38f0e7c Compare November 28, 2023 12:29
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 28, 2023
@wookietreiber
Copy link
Contributor Author

The last one I don't get working is smokeping_prober, succeeds locally, but fails in CI:

$ ANSIBLE_DIFF_ALWAYS=True molecule test -s alternative -p ubuntu-22.04

...

--- before                                                                                                                                                                                                 [373/9543]
+++ after: /home/umcdev/.ansible/tmp/ansible-local-245143asywggq4/tmp4d3313t_/smokeping_prober.service.j2
@@ -0,0 +1,41 @@
+#
+# Ansible managed: Do NOT edit this file manually!
+#
+[Unit]
+Description=Smokeping Prober
+After=network-online.target
+StartLimitInterval=0
+StartLimitIntervalSec=0
+
+[Service]
+Type=simple
+User=smokeping
+Group=smokeping
+PermissionsStartOnly=true
+ExecReload=/bin/kill -HUP $MAINPID
+ExecStart=/usr/local/bin/smokeping_prober \
+  --config.file=/etc/smokeping_prober//probes.yml \
+  --web.listen-address=127.0.0.1:8080 \
+  --web.listen-address=127.0.1.1:8080 \
+
+SyslogIdentifier=smokeping_prober
+KillMode=process
+Restart=always
+RestartSec=5
+
+LockPersonality=true
+NoNewPrivileges=true
+MemoryDenyWriteExecute=true
+PrivateTmp=true
+ProtectHome=true
+RemoveIPC=true
+RestrictSUIDSGID=true
+
+AmbientCapabilities=CAP_NET_RAW
+ProtectControlGroups=true
+ProtectKernelModules=true
+ProtectKernelTunables=yes
+ProtectSystem=strict
+
+[Install]
+WantedBy=multi-user.target

...

INFO     Running alternative > verify
INFO     Executing Testinfra tests found in /home/umcdev/src/ansible/prometheus/roles/smokeping_prober/molecule/alternative/tests/...
============================= test session starts ==============================
platform linux -- Python 3.11.6, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/umcdev
plugins: testinfra-10.0.0
collected 5 items

tests/test_alternative.py .....                                          [100%]

============================== 5 passed in 1.79s ===============================
INFO     Verifier completed successfully.

@SuperQ
Copy link
Contributor

SuperQ commented Nov 28, 2023

Yes, I recently did a major rewrite of the snmp_exporter config, so configs need to match the version.

If you want, I can open a separate PR to make that test cleaner.

@wookietreiber
Copy link
Contributor Author

wookietreiber commented Nov 30, 2023

blackbox exporter alternative fails, but only on debian 10:

[Unit]
Description=Blackbox Exporter
After=network-online.target
StartLimitInterval=0
StartLimitIntervalSec=0

[Service]
Type=simple
User=blackbox-exp
Group=blackbox-exp
PermissionsStartOnly=true
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/blackbox_exporter \
  --config.file=/etc/blackbox_exporter.yml \
  --log.level=warn \
    --web.listen-address=127.0.0.1:9000 \
  --web.listen-address=127.0.1.1:9000 \

SyslogIdentifier=blackbox_exporter
KillMode=process
Restart=always
RestartSec=5

LockPersonality=true
NoNewPrivileges=true
MemoryDenyWriteExecute=true
PrivateTmp=true
ProtectHome=true
RemoveIPC=true
RestrictSUIDSGID=true

AmbientCapabilities=CAP_NET_RAW
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=yes
ProtectSystem=strict

[Install]
WantedBy=multi-user.target
    "-- Logs begin at Thu 2023-11-30 09:06:49 UTC, end at Thu 2023-11-30 09:07:35 UTC. --",
    "Nov 30 09:07:32 debian-10 systemd[1]: /etc/systemd/system/blackbox_exporter.service:22: Unknown lvalue 'RestrictSUIDSGID' in section 'Service', ignoring",
    "Nov 30 09:07:33 debian-10 systemd[1]: /etc/systemd/system/blackbox_exporter.service:22: Unknown lvalue 'RestrictSUIDSGID' in section 'Service', ignoring",
    "Nov 30 09:07:33 debian-10 systemd[1]: Started Blackbox Exporter.",
    "-- Subject: A start job for unit blackbox_exporter.service has finished successfully",
    "-- Defined-By: systemd",
    "-- Support: https://www.debian.org/support",
    "-- ",
    "-- A start job for unit blackbox_exporter.service has finished successfully.",
    "-- ",
    "-- The job identifier is 98.",
    "Nov 30 09:07:33 debian-10 blackbox_exporter[1597]: blackbox_exporter: error: unexpected SyslogIdentifier=blackbox_exporter, try --help",
    "Nov 30 09:07:33 debian-10 systemd[1]: blackbox_exporter.service: Main process exited, code=exited, status=1/FAILURE",
    "-- Subject: Unit process exited",
    "-- Defined-By: systemd",
    "-- Support: https://www.debian.org/support",
    "-- ",
    "-- An ExecStart= process belonging to unit blackbox_exporter.service has exited.",
    "-- ",
    "-- The process' exit code is 'exited' and its exit status is 1.",
    "Nov 30 09:07:33 debian-10 systemd[1]: blackbox_exporter.service: Failed with result 'exit-code'.",
    "-- Subject: Unit failed",
    "-- Defined-By: systemd",
    "-- Support: https://www.debian.org/support",
    "-- ",
    "-- The unit blackbox_exporter.service has entered the 'failed' state with result 'exit-code'.",
    "Nov 30 09:07:33 debian-10 systemd[1]: blackbox_exporter.service: Consumed 27ms CPU time.",
    "-- Subject: Resources consumed by unit runtime",
    "-- Defined-By: systemd",
    "-- Support: https://www.debian.org/support",
    "-- ",
    "-- The unit blackbox_exporter.service completed and consumed the indicated resources."

I think the problem is this:

Nov 30 09:07:33 debian-10 blackbox_exporter[1597]: blackbox_exporter: error: unexpected SyslogIdentifier=blackbox_exporter, try --help

I think systemd on debian 10 is too old that it can't deal with the trailing backslash like the newer versions:

  --web.listen-address=127.0.1.1:9000 \

SyslogIdentifier=blackbox_exporter

It turns it into

ExecStart=/usr/local/bin/blackbox_exporter ... --web.listen-address=127.0.1.1:9000 SyslogIdentifier=blackbox_exporter

I'm gonna try nukeing the trailing backslash with loop last hacks.

…sses

Signed-off-by: Christian Krause <christian.krause@idiv.de>
…esses

Signed-off-by: Christian Krause <christian.krause@idiv.de>
…eflight

this is for consistency with the other roles' preflight asserts for when
prometheus itself finally supports multiple web listen addresses

Signed-off-by: Christian Krause <christian.krause@idiv.de>
…s in preflight

this is for consistency with the other roles' preflight asserts for when the
exporter itself finally supports multiple web listen addresses

Signed-off-by: Christian Krause <christian.krause@idiv.de>
… in preflight

this is for consistency with the other roles' preflight asserts for when the
exporter itself finally supports multiple web listen addresses

Signed-off-by: Christian Krause <christian.krause@idiv.de>
@wookietreiber wookietreiber force-pushed the alertmanager/multiple-web-listen-addresses branch from 38f0e7c to c8a340e Compare November 30, 2023 10:12
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 30, 2023
@wookietreiber
Copy link
Contributor Author

I'm gonna try nukeing the trailing backslash with loop last hacks.

That's done now, please re-run the tests. I also did that for smokeping 🤞 that this also fixes the tests. If not, I got no idea what to do about smokeping and need help with that.

@gardar
Copy link
Member

gardar commented Nov 30, 2023

All tests passing now so I'm going to go ahead with the merge. Thanks again!

@gardar gardar merged commit 002e5b3 into prometheus-community:main Nov 30, 2023
242 checks passed
@SuperQ
Copy link
Contributor

SuperQ commented Nov 30, 2023

Awesome!

@wookietreiber wookietreiber deleted the alertmanager/multiple-web-listen-addresses branch December 5, 2023 09:02
@wookietreiber
Copy link
Contributor Author

Thanks for the merge and the guidance, this helped a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment