Adding prometheus host to listen on to config #976

mweissbacher · 2019-06-21T23:10:51Z

Defaulting to "localhost" otherwise (prior behavior)

Signed-off-by: Michael Weissbacher mweissbacher@squareup.com

Defaulting to "localhost" otherwise (prior behavior) Signed-off-by: Michael Weissbacher <mweissbacher@squareup.com>

evan2645 · 2019-06-24T17:00:53Z

Hey @mweissbacher thanks for sending this.

This has come up a couple times recently. The reason we initially hardcoded this to localhost was that we were reluctant to allow the agent to be reachable over the network (i.e. bind a non-localhost port).

The hesitance resulted mostly from a security assessment that was done on SPIRE last year... in that assessment, it became clear that one of the largest reasons that SPIRE could confidently deliver the security posture we originally designed for was that it wasn't possible for an attacker on a compromised node to move laterally by way of an agent vulnerability. The worry is that as soon as the agent can listen for remote network connections, then there's a pathway that could undermine the posture and result in the compromise of (functionally) all identities in the trust domain without needing to compromise the server or its signing key.

That being said, some folks might be OK assuming this risk. The hard part IMO is communicating this risk effectively so that users are making an informed decision when configuring the agent to bind a port. Until now, I have avoided adding this functionality because I haven't been able to come up with a way to both enable this functionality as well as clearly communicate the risk associated with it.

I know that only allowing a localhost listener for prometheus isn't exactly a normal or supported pattern. I also don't know enough about prometheus to suggest an alternative that still allows us to preserve the posture we intended. Curious to hear your thoughts on this - it is definitely worth a longer conversation.

willsalz · 2019-06-24T17:40:20Z

@evan2645 I'm new to the project, so excuse (a) me commenting in a diff and (b) any newb questions: how do we feel about moving metrics to a plugin model? There will be a performance overhead but would allow (a) easier development of metrics plugins off of SPIRE master and (b) potential security benefits by isolating plugins' memory. I'm happy to move this to a task.

mweissbacher · 2019-06-24T17:43:19Z

Interesting, if the security assumption relies on being inaccessible from the network this does require more discussion. The pattern I'm familiar with is allowing remote access to prometheus. A quick workaround would be port forwarding, but this would also circumvent this security property and seems equivalent to exposing the port without added benefit.

Alternative suggestion: Since the relation spire to prometheus is write-only, do you think it would make sense to decouple logging, i.e. dump logs to a file, and run a dedicated prometheus service that exclusively serves that file and does nothing else? Would you consider this as equivalent?

mweissbacher · 2019-06-24T17:56:10Z

Re clear communication - would a switch of the kind -listenAllDegradedSecurity + print warning on startup with explanation be a more verbose way of communicating in the interim of a more definite solution?

mcpherrinm · 2019-06-24T18:28:45Z

IMO, restricting the listen address has the opposite effect on hardening than desired.

If we force the stats collecting agent to be in the same network namespace (for shared localhost) as the agent, that makes other much more important hardening work more difficult. EG, we may want to only allow the spire-agent to talk to the AWS instance metadata on 169.254.169.254, or to have pod-specific rules on what's allowed to talk to the spire server.

evan2645 · 2019-06-24T22:59:27Z

how do we feel about moving metrics to a plugin model? There will be a performance overhead but would allow (a) easier development of metrics plugins off of SPIRE master and (b) potential security benefits by isolating plugins' memory

We've discussed this on/off in the past and the general consensus has been to take a wait-and-see approach. I've received a lot of feedback over the last year about SPIRE's complexity, and as a result am generally hesitant to introduce more plugin types unless there is a lot of momentum to do so. We have, however, introduced Notifier plugin which is primarily meant for allowing users to take actions on certain events, however it's a far cry from full blown pluggable metrics.

Interesting, if the security assumption relies on being inaccessible from the network this does require more discussion. The pattern I'm familiar with is allowing remote access to prometheus. A quick workaround would be port forwarding, but this would also circumvent this security property and seems equivalent to exposing the port without added benefit.

Yes, the port forwarding thing was considered, and the thinking was that if the user is jumping through hoops like that then there is probably some implied consent, but maybe that is a false assumption.

Re clear communication - would a switch of the kind -listenAllDegradedSecurity + print warning on startup with explanation be a more verbose way of communicating in the interim of a more definite solution?

Yes, I think something like this may be "enough". Particularly logging a warning. I'd like to have @JustinCappos weigh in here to see what he thinks.

If we force the stats collecting agent to be in the same network namespace (for shared localhost) as the agent, that makes other much more important hardening work more difficult. EG, we may want to only allow the spire-agent to talk to the AWS instance metadata on 169.254.169.254

Hmm, yes that's not ideal... When I wrote the telemetry export code, picking up prometheus support was basically free. I figured "some prometheus is better than no prometheus" and put the localhost restriction in due to above concerns. You're right, I wouldn't want to grant my stats agent any of the rights that I grant the SPIRE agent, but at the same time exposing an agent port is also not super palatable. I realize that this concern can be mitigated through L3/L4 access control etc, but I don't think it's a good idea to assume that users have put the appropriate controls in place.

mweissbacher · 2019-07-12T16:01:44Z

To check whether there is risk of remote code execution I looked for use of import unsafe including the libraries used by telemetry/prometheus.go. It's used in one instance which was introduced here. This functionality does not take any outside input, i.e. there's no attacker controlled input going into that unsafe call. This is exposed via net/http, which is widely used including healthcheck.

I think it's fair to say that there is no increased risk compared to using the healthcheck or other usage of net/http.

evan2645 · 2019-07-18T19:53:00Z

Thank you for looking into this further @mweissbacher

It doesn't sound like we're going to find a better solution any time soon, and I am hesitant to continue blocking on this. How would you feel about adding a warning? Something along the lines of Agent is now configured to accept remote network connections for Prometheus stats collection. Please ensure access to this port is tightly controlled.

Signed-off-by: Michael Weissbacher <mweissbacher@squareup.com>

mweissbacher · 2019-07-19T13:46:03Z

pkg/common/telemetry/prometheus.go

+		runner.c.Host = "localhost"
+	}
+
+	if runner.c.Host != "localhost" {


I didn't want to put this into an else branch of the above check because it would warn on an explicit localhost configuration. It will still warn on 127.0.0.0 etc. but checking for it would clutter the code unnecessarily, I think.

Adding prometheus host to listen on to config

b31e452

Defaulting to "localhost" otherwise (prior behavior) Signed-off-by: Michael Weissbacher <mweissbacher@squareup.com>

mweissbacher requested review from amartinezfayo, azdagron, evan2645, MarcosDY, marcosy and martincapello as code owners June 21, 2019 23:10

Adding warning for external prometheus listener

f956239

Signed-off-by: Michael Weissbacher <mweissbacher@squareup.com>

mweissbacher commented Jul 19, 2019

View reviewed changes

evan2645 approved these changes Jul 23, 2019

View reviewed changes

Merge branch 'master' into prometheus-add-host-configuration

2a55e6b

evan2645 merged commit e6f7f70 into spiffe:master Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding prometheus host to listen on to config #976

Adding prometheus host to listen on to config #976

mweissbacher commented Jun 21, 2019

evan2645 commented Jun 24, 2019

willsalz commented Jun 24, 2019

mweissbacher commented Jun 24, 2019

mweissbacher commented Jun 24, 2019

mcpherrinm commented Jun 24, 2019

evan2645 commented Jun 24, 2019

mweissbacher commented Jul 12, 2019

evan2645 commented Jul 18, 2019

mweissbacher Jul 19, 2019

Adding prometheus host to listen on to config #976

Adding prometheus host to listen on to config #976

Conversation

mweissbacher commented Jun 21, 2019

evan2645 commented Jun 24, 2019

willsalz commented Jun 24, 2019

mweissbacher commented Jun 24, 2019

mweissbacher commented Jun 24, 2019

mcpherrinm commented Jun 24, 2019

evan2645 commented Jun 24, 2019

mweissbacher commented Jul 12, 2019

evan2645 commented Jul 18, 2019

mweissbacher Jul 19, 2019

Choose a reason for hiding this comment