-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding prometheus host to listen on to config #976
Adding prometheus host to listen on to config #976
Conversation
Defaulting to "localhost" otherwise (prior behavior) Signed-off-by: Michael Weissbacher <mweissbacher@squareup.com>
Hey @mweissbacher thanks for sending this. This has come up a couple times recently. The reason we initially hardcoded this to localhost was that we were reluctant to allow the agent to be reachable over the network (i.e. bind a non-localhost port). The hesitance resulted mostly from a security assessment that was done on SPIRE last year... in that assessment, it became clear that one of the largest reasons that SPIRE could confidently deliver the security posture we originally designed for was that it wasn't possible for an attacker on a compromised node to move laterally by way of an agent vulnerability. The worry is that as soon as the agent can listen for remote network connections, then there's a pathway that could undermine the posture and result in the compromise of (functionally) all identities in the trust domain without needing to compromise the server or its signing key. That being said, some folks might be OK assuming this risk. The hard part IMO is communicating this risk effectively so that users are making an informed decision when configuring the agent to bind a port. Until now, I have avoided adding this functionality because I haven't been able to come up with a way to both enable this functionality as well as clearly communicate the risk associated with it. I know that only allowing a localhost listener for prometheus isn't exactly a normal or supported pattern. I also don't know enough about prometheus to suggest an alternative that still allows us to preserve the posture we intended. Curious to hear your thoughts on this - it is definitely worth a longer conversation. |
@evan2645 I'm new to the project, so excuse (a) me commenting in a diff and (b) any newb questions: how do we feel about moving metrics to a plugin model? There will be a performance overhead but would allow (a) easier development of metrics plugins off of SPIRE master and (b) potential security benefits by isolating plugins' memory. I'm happy to move this to a task. |
Interesting, if the security assumption relies on being inaccessible from the network this does require more discussion. The pattern I'm familiar with is allowing remote access to prometheus. A quick workaround would be port forwarding, but this would also circumvent this security property and seems equivalent to exposing the port without added benefit. Alternative suggestion: Since the relation spire to prometheus is write-only, do you think it would make sense to decouple logging, i.e. dump logs to a file, and run a dedicated prometheus service that exclusively serves that file and does nothing else? Would you consider this as equivalent? |
Re clear communication - would a switch of the kind |
IMO, restricting the listen address has the opposite effect on hardening than desired. If we force the stats collecting agent to be in the same network namespace (for shared localhost) as the agent, that makes other much more important hardening work more difficult. EG, we may want to only allow the spire-agent to talk to the AWS instance metadata on 169.254.169.254, or to have pod-specific rules on what's allowed to talk to the spire server. |
We've discussed this on/off in the past and the general consensus has been to take a wait-and-see approach. I've received a lot of feedback over the last year about SPIRE's complexity, and as a result am generally hesitant to introduce more plugin types unless there is a lot of momentum to do so. We have, however, introduced
Yes, the port forwarding thing was considered, and the thinking was that if the user is jumping through hoops like that then there is probably some implied consent, but maybe that is a false assumption.
Yes, I think something like this may be "enough". Particularly logging a warning. I'd like to have @JustinCappos weigh in here to see what he thinks.
Hmm, yes that's not ideal... When I wrote the telemetry export code, picking up prometheus support was basically free. I figured "some prometheus is better than no prometheus" and put the localhost restriction in due to above concerns. You're right, I wouldn't want to grant my stats agent any of the rights that I grant the SPIRE agent, but at the same time exposing an agent port is also not super palatable. I realize that this concern can be mitigated through L3/L4 access control etc, but I don't think it's a good idea to assume that users have put the appropriate controls in place. |
To check whether there is risk of remote code execution I looked for use of I think it's fair to say that there is no increased risk compared to using the healthcheck or other usage of |
Thank you for looking into this further @mweissbacher It doesn't sound like we're going to find a better solution any time soon, and I am hesitant to continue blocking on this. How would you feel about adding a warning? Something along the lines of |
Signed-off-by: Michael Weissbacher <mweissbacher@squareup.com>
runner.c.Host = "localhost" | ||
} | ||
|
||
if runner.c.Host != "localhost" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to put this into an else branch of the above check because it would warn on an explicit localhost
configuration. It will still warn on 127.0.0.0
etc. but checking for it would clutter the code unnecessarily, I think.
Defaulting to "localhost" otherwise (prior behavior)
Signed-off-by: Michael Weissbacher mweissbacher@squareup.com