Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding prometheus host to listen on to config #976

Merged

Conversation

mweissbacher
Copy link
Contributor

Defaulting to "localhost" otherwise (prior behavior)

Signed-off-by: Michael Weissbacher mweissbacher@squareup.com

Defaulting to "localhost" otherwise (prior behavior)

Signed-off-by: Michael Weissbacher <mweissbacher@squareup.com>
@evan2645
Copy link
Member

Hey @mweissbacher thanks for sending this.

This has come up a couple times recently. The reason we initially hardcoded this to localhost was that we were reluctant to allow the agent to be reachable over the network (i.e. bind a non-localhost port).

The hesitance resulted mostly from a security assessment that was done on SPIRE last year... in that assessment, it became clear that one of the largest reasons that SPIRE could confidently deliver the security posture we originally designed for was that it wasn't possible for an attacker on a compromised node to move laterally by way of an agent vulnerability. The worry is that as soon as the agent can listen for remote network connections, then there's a pathway that could undermine the posture and result in the compromise of (functionally) all identities in the trust domain without needing to compromise the server or its signing key.

That being said, some folks might be OK assuming this risk. The hard part IMO is communicating this risk effectively so that users are making an informed decision when configuring the agent to bind a port. Until now, I have avoided adding this functionality because I haven't been able to come up with a way to both enable this functionality as well as clearly communicate the risk associated with it.

I know that only allowing a localhost listener for prometheus isn't exactly a normal or supported pattern. I also don't know enough about prometheus to suggest an alternative that still allows us to preserve the posture we intended. Curious to hear your thoughts on this - it is definitely worth a longer conversation.

@willsalz
Copy link

@evan2645 I'm new to the project, so excuse (a) me commenting in a diff and (b) any newb questions: how do we feel about moving metrics to a plugin model? There will be a performance overhead but would allow (a) easier development of metrics plugins off of SPIRE master and (b) potential security benefits by isolating plugins' memory. I'm happy to move this to a task.

@mweissbacher
Copy link
Contributor Author

Interesting, if the security assumption relies on being inaccessible from the network this does require more discussion. The pattern I'm familiar with is allowing remote access to prometheus. A quick workaround would be port forwarding, but this would also circumvent this security property and seems equivalent to exposing the port without added benefit.

Alternative suggestion: Since the relation spire to prometheus is write-only, do you think it would make sense to decouple logging, i.e. dump logs to a file, and run a dedicated prometheus service that exclusively serves that file and does nothing else? Would you consider this as equivalent?

@mweissbacher
Copy link
Contributor Author

Re clear communication - would a switch of the kind -listenAllDegradedSecurity + print warning on startup with explanation be a more verbose way of communicating in the interim of a more definite solution?

@mcpherrinm
Copy link
Contributor

IMO, restricting the listen address has the opposite effect on hardening than desired.

If we force the stats collecting agent to be in the same network namespace (for shared localhost) as the agent, that makes other much more important hardening work more difficult. EG, we may want to only allow the spire-agent to talk to the AWS instance metadata on 169.254.169.254, or to have pod-specific rules on what's allowed to talk to the spire server.

@evan2645
Copy link
Member

how do we feel about moving metrics to a plugin model? There will be a performance overhead but would allow (a) easier development of metrics plugins off of SPIRE master and (b) potential security benefits by isolating plugins' memory

We've discussed this on/off in the past and the general consensus has been to take a wait-and-see approach. I've received a lot of feedback over the last year about SPIRE's complexity, and as a result am generally hesitant to introduce more plugin types unless there is a lot of momentum to do so. We have, however, introduced Notifier plugin which is primarily meant for allowing users to take actions on certain events, however it's a far cry from full blown pluggable metrics.

Interesting, if the security assumption relies on being inaccessible from the network this does require more discussion. The pattern I'm familiar with is allowing remote access to prometheus. A quick workaround would be port forwarding, but this would also circumvent this security property and seems equivalent to exposing the port without added benefit.

Yes, the port forwarding thing was considered, and the thinking was that if the user is jumping through hoops like that then there is probably some implied consent, but maybe that is a false assumption.

Re clear communication - would a switch of the kind -listenAllDegradedSecurity + print warning on startup with explanation be a more verbose way of communicating in the interim of a more definite solution?

Yes, I think something like this may be "enough". Particularly logging a warning. I'd like to have @JustinCappos weigh in here to see what he thinks.

If we force the stats collecting agent to be in the same network namespace (for shared localhost) as the agent, that makes other much more important hardening work more difficult. EG, we may want to only allow the spire-agent to talk to the AWS instance metadata on 169.254.169.254

Hmm, yes that's not ideal... When I wrote the telemetry export code, picking up prometheus support was basically free. I figured "some prometheus is better than no prometheus" and put the localhost restriction in due to above concerns. You're right, I wouldn't want to grant my stats agent any of the rights that I grant the SPIRE agent, but at the same time exposing an agent port is also not super palatable. I realize that this concern can be mitigated through L3/L4 access control etc, but I don't think it's a good idea to assume that users have put the appropriate controls in place.

@mweissbacher
Copy link
Contributor Author

To check whether there is risk of remote code execution I looked for use of import unsafe including the libraries used by telemetry/prometheus.go. It's used in one instance which was introduced here. This functionality does not take any outside input, i.e. there's no attacker controlled input going into that unsafe call. This is exposed via net/http, which is widely used including healthcheck.

I think it's fair to say that there is no increased risk compared to using the healthcheck or other usage of net/http.

@evan2645
Copy link
Member

Thank you for looking into this further @mweissbacher

It doesn't sound like we're going to find a better solution any time soon, and I am hesitant to continue blocking on this. How would you feel about adding a warning? Something along the lines of Agent is now configured to accept remote network connections for Prometheus stats collection. Please ensure access to this port is tightly controlled.

Signed-off-by: Michael Weissbacher <mweissbacher@squareup.com>
runner.c.Host = "localhost"
}

if runner.c.Host != "localhost" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to put this into an else branch of the above check because it would warn on an explicit localhost configuration. It will still warn on 127.0.0.0 etc. but checking for it would clutter the code unnecessarily, I think.

@evan2645 evan2645 merged commit e6f7f70 into spiffe:master Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants