Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add http node attestor #4909

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Add http node attestor #4909

wants to merge 23 commits into from

Conversation

kfox1111
Copy link
Contributor

@kfox1111 kfox1111 commented Feb 23, 2024

Adds an http node attestor

Fixes: #4788

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @kfox1111 and for your patience. I left a handful of high level comments/questions. I had many smaller comments that I held back, I think we're clear to move this out of draft and add tests etc whenever you have a chance

@@ -0,0 +1,47 @@
# Agent plugin: NodeAttestor "httppop"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should name this plugin something around DNS instead of HTTP, since what we're really attesting is that a DNS entry points at a machine, and the fact that we're confirming it using HTTP is an implementation detail

NodeAttestor "dns" ?

Copy link
Contributor Author

@kfox1111 kfox1111 Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be why they called it acme rather then anything more specific....

acme has dns mode and http mode. Both do pretty different things. This plugin is most akin to the acme http protocol and may be a little confusing to people calling it dns as it doesn't do proof of possession over dns txt records like acme dns.

It still may be a little bit clearer I think being httppop, as the proof of possession token is hosted out of a http server. almost all http servers use dns, so thats a bit implied?

It would leave room for an dnspop plugin later that could function like acme dns mode, should that be desirable. (not sure it is)

naming things is hard. :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin is most akin to the acme http protocol and may be a little confusing to people calling it dns as it doesn't do proof of possession over dns txt records like acme dns.

Hmm ... that is a good point. I feel it's more about reachability when a certain DNS record is used, proving that you can serve traffic for a record .. I also see what you mean by ACME prior art, we are not proving control over DNS, but proving that we can serve a DNS name. I guess I'll spend some more time thinking about it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "http_challenge"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| `hostname` | Hostname to use for handshaking. If unset, it will be automatically detected. | |
| `agentname` | Name of this agent on the host. Useful if you have multilpe agents bound to different spire servers on the same host. | "default" |
| `port` | The port to listen on. | 80 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should choose a random port here by default. Low port numbers require root which we don't always have. Defaulting to a static number can be error prone since the port might be in use.

Copy link
Contributor Author

@kfox1111 kfox1111 Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acme chose to standardize on port 80 as its most likely to make it across the internet unhurt. If we choose a random port, I think we're probably going to get a bunch of support requests asking why the plugin is broken? :/

But maybe I'm assuming something here. Do we think intranet usage will be the most common and spire-agent -> spire-server over the internet will be uncommon?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acme chose to standardize on port 80 as its most likely to make it across the internet unhurt

Well it's also the standard HTTP port for web facing services, and ACME traditionally fills requests for web facing services. This case feels different.

But maybe I'm assuming something here. Do we think intranet usage will be the most common and spire-agent -> spire-server over the internet will be uncommon?

I do think internet traversal of this traffic is uncommon. We use mTLS there, which frequently has trouble across the internet (e.g. corporate and ISP TLS interception). My bet is that cost/benefit will outweigh use of port 80 - cons: root required, may already be in use ... pros: less likely to be filtered by a firewall. Agent/server traffic currently defaults to port 8081. If I can't or don't want to use port 80, I have to choose some other random static port number, which also feels funny

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's also the standard HTTP port for web facing services, and ACME traditionally fills requests for web facing services. This case feels different.

I have seen a fair amount of use of certbot in http mode for not webservers. But, it is much more common to usefor webservers.

But maybe I'm assuming something here. Do we think intranet usage will be the most common and spire-agent -> spire-server over the internet will be uncommon?

I do think internet traversal of this traffic is uncommon. We use mTLS there, which frequently has trouble across the internet (e.g. corporate and ISP TLS interception). My bet is that cost/benefit will outweigh use of port 80 - cons: root required, may already be in use ... pros: less likely to be filtered by a firewall. Agent/server traffic currently defaults to port 8081. If I can't or don't want to use port 80, I have to choose some other random static port number, which also feels funny

Where I think it may get used on the internet is for things like edge computing, where you have one spire server on the internet, and then you have a spire-agent at multiple different organizations. Likely in that scenerio, spire-server would be setup on port 443 which would make it out of all the orgs that were deploying the agent, and would need port 80 made back into each organization. So, getting N number of firewall teams to let in a particular port 80 to a host might be easier then random port at N orgs.

Again, not sure how common this will be, but somehow supporting the mode of operation for those that may need to do this kind of thing seems useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of iffy on the con list too:

cons: root required, may already be in use

I kind of see those in some ways as extra security checks rather then cons. But I agree some may see it that way, and
thats one of the reasons to have flags to allow_non_root_ports and allow_alternate_ports.
Maybe we should have an agent slide "use_random_port" flag too? Then it could be configured both ways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we change up the args...
On the agent, if no port is specified, pick a random one. This still allows port 80 when desired.

On the server, allow all non root and alternate ports by default but still keep them for those that want to lock down the system further?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the agent, if no port is specified, pick a random one. This still allows port 80 when desired.
On the server, allow all non root and alternate ports by default but still keep them for those that want to lock down the system further?

❤️ I like this much better

Comment on lines 18 to 19
| `allow_alternate_ports` | Set to true to allow ports other then 80 to be specified by the agent and honored during the handshake. If false, ports other then 80 will be rejected. | false |
| `allow_non_root_ports` | Set to true to allow ports >= 1024 to be used by the agents with the advertised_port | false |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there danger in enabling these? Should we just always allow it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some situations I can see (not exhaustive)

For allow_alternate_ports, if you have an internet facing spire-server, agents can make the server spend extra time/resources asking for callbacks to ports that are more likely to get blackholed I think. It could take longer for the server to decide to give up. Forcing it to be just one port controls the issue somewhat. Same with unintentially asking for an arbitrary port and have an intermediate firewall just block non port 80/443 and users wonder why the plugin is broken.

allow_non_root_ports being false adds an extra bit of security to things like NFS does. Say you have a shared unix box where multiple untrusted users can login, but only as their own users (no sudo root). They could http attest with a high port and get their own agent running on the node under their own user, when the system admin wants to use http attestation for the whole node with a root owned agent. These type nodes are common in HPC environments amongst others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow_non_root_ports being false adds an extra bit of security to things like NFS does. Say you have a shared unix box where multiple untrusted users can login, but only as their own users (no sudo root). They could http attest with a high port and get their own agent running on the node under their own user, when the system admin wants to use http attestation for the whole node with a root owned agent. These type nodes are common in HPC environments amongst others.

This is an important observation! It is the same problem as most CSP attestors , and there we work around it using TOFU. Should this attestor also have TOFU behavior for a given DNS name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOFU sounds like a good option in addition to the allow_non_root_ports feature. It adds a different, and mostly complimentary feature I think.

For bare metal nodes, its more common to need to reuse the same hostname when reinstalling the node then in the cloud I think. Needing to clear out the name out of the spire server so it can be re-provisioned can painful and run into issues with automation. It may be worth it to some users, but not others to use the TOFU option due to this.

Also thinking forward, when spire has support for multiple attestors together so that say, a tpm plugin and an http_challenge plugin are both required, it would be desirable to not TOFU but reattest so that both the tpm and http challenge are valid with regular re-attestation, TOFU wouldn't work in that environment. Unless periodic reattestation could remove TOFU registration when reattesting I guess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the hybrid direction, I think that's something we'll need to figure out when it's time to cross that bridge since other attestor types will probably be in the same boat

For here, it does seem like TOFU is needed ... but, if you're root and can bind a low port number, then perhaps we don't need TOFU? I think the multi-tenancy aspect that drives the TOFU requirement assumes those workloads don't have root. If they do, then they own the box anyways. So with that in mind, how about we have a use_privileged_port_number configurable or similar, where you statically configure a port below 1024? The server side attestor can detect the low port number and automatically flip is_reattestable in its response based on that ... ?

return &Plugin{}
}

func (p *Plugin) ServeNonce(agentName string, nonce string) (err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these functions should be unexported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

return &Challenge{Nonce: nonce}, nil
}

func CalculateResponse(challenge *Challenge) (*Response, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you envision happening here? Since the challenge is fulfilled by out of band call back from the server, I don't think we need to send anything back on the stream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly a copy/paste thing from other plugins that use it...

But, there does need to be a message from the agent to the server after the agent starts up the http server and hosts the token to tell the server it can now try and call it. So, there is a Response, even though its blank.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove the extra function or leave it for consistency with the other plugins?

@kfox1111
Copy link
Contributor Author

@evan2645 Thanks for the review and the discussion! All good things to consider. :)

Copy link

@Paul-Luciano-2003 Paul-Luciano-2003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helllo, just saying hi.
i have to build a better domain, pitch in for team players.

Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
@kfox1111 kfox1111 marked this pull request as ready for review May 16, 2024 22:42
@azdagron azdagron self-assigned this Jun 4, 2024
Copy link
Member

@azdagron azdagron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @kfox1111. I'm still thinking through the security of the challenge/response but here is some preliminary feedback.


If `advertised_port` != `port`, you will need to setup an http proxy between the two ports. This is useful if you already run a webserver on port 80.

A sample configuration:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two sample configurations in this file....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed an issue with it. but the intention was for the second example to be specifically for the Proxies section... I can see how that could be confusing though. Maybe include "proxy" in the example string for it?

```

## Proxies

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: multilple

| Configuration | Description | Default |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| `hostname` | Hostname to use for handshaking. If unset, it will be automatically detected. | |
| `agentname` | Name of this agent on the host. Useful if you have multilpe agents bound to different spire servers on the same host and sharing the same port. | "default" |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: multilpe


| Configuration | Description | Default |
|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|
| `dns_patterns` | A list of regular expressions to apply to the hostname being attested. If none match, attestation will fail. If unset, all hostnames are allowed. | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : allowed_dns_patterns

It also isn't clear what it means to "apply" the regex, i.e., we should be clear that the hostname must match at least one pattern.

| `dns_patterns` | A list of regular expressions to apply to the hostname being attested. If none match, attestation will fail. If unset, all hostnames are allowed. | |
| `required_port` | Set to a port number to require clients to listen only on that port. If unset, all port numbers are allowed | |
| `allow_non_root_ports` | Set to true to allow ports >= 1024 to be used by the agents with the advertised_port | true |
| `agent_path_template` | A URL path portion format of Agent's SPIFFE ID. Describe in text/template format. | "{{ .PluginName }}/{{ .HostName }}" |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? What parameters outside of HostName seem relevant? Is this mostly copy-paste from the other challenge-based attestors or is there a use-case in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

azure_msi, gcp_iit, sshpop and x509pop all have it. Just copy/pasted from the plugin I started with, but seems pretty common.

pkg/common/plugin/httpchallenge/httpchallenge.go Outdated Show resolved Hide resolved
return idutil.AgentID(td, agentPath)
}

func generateNonce() ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest the nonce either be represented as raw bytes, or as a hex encoded string. If the latter, this should return a string type.

Comment on lines 103 to 114
notfound := false
for _, re := range config.dnsPatterns {
notfound = true
l := re.FindAllStringSubmatch(attestationData.HostName, -1)
if len(l) > 0 {
notfound = false
break
}
}
if notfound {
return status.Errorf(codes.PermissionDenied, "the requested hostname is not allowed to connect")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it would be cleaner to extract this to a function. The notFound variable wouldn't be needed then (function could early-return if the hostname matches a pattern).

Comment on lines 98 to 101
l := config.agentNamePattern.FindAllStringSubmatch(attestationData.AgentName, -1)
if len(l) != 1 || len(l[0]) == 0 || len(l[0]) > 32 {
return status.Error(codes.InvalidArgument, "agent name is not valid")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we extract this to a function, e.g. validateAgentName

kfox1111 and others added 4 commits June 5, 2024 09:47
Co-authored-by: Andrew Harding <azdagron@gmail.com>
Signed-off-by: kfox1111 <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Path: fmt.Sprintf("/.well-known/spiffe/nodeattestor/http_challenge/%s/%s", attestationData.AgentName, challenge.Nonce),
}

resp, err := http.Get(turl.String())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, code should use the context so that it can be cancelled (e.g http.NewRequest+req.WithContext+http.DefaultClient.Do(req))

Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Signed-off-by: Kevin Fox <Kevin.Fox@pnnl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DNS/HTTP Node Attestor
4 participants