-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new config_file_service_registration token #15828
Conversation
agent/local/state.go
Outdated
// | ||
// The fallback function will return the config file registration token if the | ||
// given service was sourced from a service definition in a config file. | ||
func (l *State) RegistrationTokenFallback(key structs.ServiceID) func() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think this is used outside package state
so it could be private.
Alternatively, what do you think about merging this logic into the body of aclTokenForServiceSync
since it already does a lookup of l.services[key]
?
Passing around a lock-guarded map in a closure makes me a little cautious; in this PR the codepaths are synchronized but this could be accidentally misused in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is used outside package state so it could be private.
I agree. I had to export the method because the test package is different (local
vs local_test
). Which seems unusual to me. But, without exporting it I can't call it in unit tests.
Alternatively, what do you think about merging this logic into the body of aclTokenForServiceSync since it already does a lookup of l.services[key]?
I opted against this because aclTokenForServiceSync
is also used in deleteService.
So, do we want deleteService to also incorporate the config file registration token in its list of fallback tokens? Generally, it seems better to me if it does not.
The main concern to me is if the config_file_registration token has been deleted, then it would fail to deregister the service and we'd see errors in logs. Also, it should fallback to using the agent token anyway. (It is able to use the agent token for service deregistrations because the Catalog.Deregister RPC accepts a token with the relevant node:write permissions).
Because the agent token must have node:write permissions (or else it could not have registered it's node into the catalog) and because the agent token is probably less likely to have been deleted (because agent lifecycle is longer than service instance lifecycle), it seems like the agent could skip straight to using the agent token without considering the config_file_registration token for the deregistration, rather than incorporating the agent token in the list of "fallback" tokens. And that would be faster and would not generate a deceptive log message. There's some relevant discussion on this here: #8078
That's why I opted against inlining the config_file_registration token fallback into aclTokenForServiceSync. Although it does leave me a question: why does deleteService try using the service token and then fallback to agent token, instead of unilaterally using the agent token for deregistrations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the agent
token typically only has node:write
on itself. The agent token generally wouldn't have service:write
needed to deregister a service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the agent token typically only has node:write on itself. The agent token generally wouldn't have service:write needed to deregister a service.
Right, but the Consul servers will accept a deregistration if the token contains node:write for the node containing that service. See #5217 and
consul/agent/consul/catalog_endpoint.go
Lines 441 to 448 in 275a0b8
// Allow service deregistration if the token has write permission for the node. | |
// This accounts for cases where the agent no longer has a token with write permission | |
// on the service to deregister it. | |
nodeWriteErr := authz.ToAllowAuthorizer().NodeWriteAllowed(subj.Node, &authzContext) | |
if nodeWriteErr == nil { | |
return nil | |
} | |
And all services registered with an agent must be registered to that agent's node. (Or from a perms perspective, only to the nodes which the agent has permission to update)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it does leave me a question: why does deleteService try using the service token and then fallback to agent token, instead of unilaterally using the agent token for deregistrations?
I've tested this.
The agent only falls back to the agent token if the service token is unset for that particular service (i.e. token
field is empty or absent from the service definition in any config files). It does not fallback to other tokens on failure to deregister (i.e. it will not try the deregistration with the service token and, on failure, then try the agent token).
If the original service token has been deleted from the servers, because the agent has stored that service token in its local state, it continues to use that original service token to deregister that service during each state sync - which will repeatedly fail each time the state sync is retried. This feels like a bit of a gotcha.
Basically, I'm weighing two options:
- The existing behavior is "good", so we should have deleteService include the config_file_registration token in its list of fallbacks. If set, the config_file_registration would be used instead of the agent token. And if the config_file_registration token was deleted, then the deregistration would fail forever.
- Or, the existing behavior isn't great, and it would be better for it to only use the agent token for service deregistrations because that will "just work" because of the node:write "bypass" for service deregistrations.
Thoughts @jkirschner-hashicorp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it accurate to say that the agent basically isn't functional if it lacks a token with node:write
on itself?
If so... Is there any downside to approach 2? If I understand correctly, approach 2 would always work assuming the node is still functional (has a token the node:write
). Why use approach 1 (which has some edge cases) if approach 2 must necessarily work?
Is there a separate case where a service is being deregistered directly from the server agents rather than from the node that owns the service, in cases where the node no longer exists but the service was never deregistered (but needs to be cleaned up)?
Happy to have a quick Zoom about this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think this is used outside package state so it could be private.
I reworked these tests so the methods are unexported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments but LGTM. What do you think about renaming config_file_registration
to something shorter like registration
or static_registration
? It feels a little verbose but on the other hand maybe the detailed name makes it more clear.
In my experience, Consul agent token names are commonly misunderstood, such as |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
159e516
to
afb3b9d
Compare
I've updated this, so I've requested a re-review @kisunji
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; left mostly docs suggestions
AgentRecovery *string `mapstructure:"agent_recovery"` | ||
Default *string `mapstructure:"default"` | ||
Agent *string `mapstructure:"agent"` | ||
ConfigFileRegistration *string `mapstructure:"config_file_service_registration"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the _service_
part omitted for brevity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. With Golang's general style preference for shorter variables, I thought ConfigFileRegistration
was already long enough but still clear enough.
agent/local/state.go
Outdated
Token: token, | ||
Service: service, | ||
Token: token, | ||
IsLocallyDefined: isLocal, | ||
}) | ||
return nil | ||
} | ||
|
||
// AddServiceWithChecks adds a service entry and its checks to the local state atomically | ||
// This entry is persistent and the agent will make a best effort to | ||
// ensure it is registered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the godoc to describe what isLocal
should represent?
I can imagine someone confusing the "local" concept with peered services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I updated this.
fwiw, I used "local" in order match ConfigSourceLocal
.
agent/local/state.go
Outdated
return tok | ||
} | ||
} | ||
return "" | ||
} | ||
|
||
// AddCheck is used to add a health check to the local state. | ||
// This entry is persistent and the agent will make a best effort to | ||
// ensure it is registered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about updating godocs
agent/token/store.go
Outdated
// configFileRegistrationToken is used to register services defined | ||
// with a service definitions in a config file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// configFileRegistrationToken is used to register services defined | |
// with a service definitions in a config file. | |
// configFileRegistrationToken is used to register services and checks | |
// defined with a service/check definition in a config file. |
@@ -46,6 +46,13 @@ The token types are: | |||
operations. This token will need to be configured with read access to | |||
whatever data is being replicated. | |||
|
|||
- `config_file_service_registration` - This is the token that the agent uses to | |||
register services and checks defined in config files. This token needs to be | |||
configured with permission for the service or checks being registered. If not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configured with permission for the service or checks being registered. If not | |
configured with write permissions for the service or checks being registered. If not |
[check definitions](/docs/discovery/checks) foudn in configuration files or in configuration | ||
strings passed to the agent using the `-hcl` flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[check definitions](/docs/discovery/checks) foudn in configuration files or in configuration | |
strings passed to the agent using the `-hcl` flag. | |
[check definitions](/docs/discovery/checks) found in configuration files or in configuration | |
strings passed to the agent using the `-hcl` flag. |
Would this be more concise and still convey the same information? @jkirschner-hashicorp
[check definitions](/docs/discovery/checks) foudn in configuration files or in configuration | |
strings passed to the agent using the `-hcl` flag. | |
[check definitions](/docs/discovery/checks) on startup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does registration also happen on reload?
If so, we could do something like:
[check definitions](/docs/discovery/checks) foudn in configuration files or in configuration | |
strings passed to the agent using the `-hcl` flag. | |
[check definitions](/docs/discovery/checks) loaded by the agent on startup and reload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If we keep the -hcl
flag mention, would we also need to include -json
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, -json
doesn't exist. The consul agent
command only has -hcl
for config fragments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -hcl
flag enables operators to specify agent configuration values on the CLI. There is currently no equivalent -json
flag for allowing agent configuration to be provided in JSON format. If we wanted to support that, it would be a new feature that requires additional development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to suggest we support that. For some reason I thought I had seen an invocation of a Consul agent with that recently, but it seems like I misremembered (e.g., perhaps it was a config file being passed in with JSON format).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer to be more elaborate / specific here to help reduce confusion?
If we say "configuration passed at startup", I feel like "startup" leaves room for interpretation. Is sending an HTTP request passing configuration to the agent? Does that include if I send a service registration request while the agent is "starting up"?
I wanted to be clear about what services/checks the token is specifically used for (those services/checks in files or -hcl
config fragments).
If an inline token is defined in the service or check definition, then the inline token is | ||
used to register that service or check instead. If the `config_file_service_registration` token is not | ||
defined and if a service or check has no inline token, then the agent uses the | ||
[`default`](#acl_tokens_default) token to register the service or check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "inline" might be confusing to users
If an inline token is defined in the service or check definition, then the inline token is | |
used to register that service or check instead. If the `config_file_service_registration` token is not | |
defined and if a service or check has no inline token, then the agent uses the | |
[`default`](#acl_tokens_default) token to register the service or check. | |
If the `token` field is defined in the service or check definition, then that token is | |
used to register that service or check instead. If the `config_file_service_registration` token is not | |
defined and if a service or check has no defined `token` field, then the agent uses the | |
[`default`](#acl_tokens_default) token to register the service or check. |
`config_file_service_registration` token needs multiple `service:write` permissions in order for | ||
the agent to register those services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "multiple" could be interpreted as needing N>1 perms.
`config_file_service_registration` token needs multiple `service:write` permissions in order for | |
the agent to register those services. | |
`config_file_service_registration` token needs `service:write` permissions for all services | |
in order for the agent to register them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note, what happens if a config_file_service_registration
token has permissions for a partial set of services? Does it fail to write all services or does it skip only the service with the missing perm?
Could make the behavior clear in the docs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this using two named services "A" and "B" as an example. I also elaborated a bit more on the failure case (maybe too much?). Let me know what you think!
Description
This adds a new agent token:
config_file_service_registration
. This token is used to register services and checks that are defined in local config files (including when defined in flags, such as with-hcl
).This adds:
acl.tokens.config_file_service_registration
PUT /agent/token/config_file_service_registration
HTTP API requestconsul acl set-agent-token config_file_service_registration <token>
commandThe precedence of tokens when registering a service from a service definition or a check from a check definition is:
token
field in the service/check definition is used, if setTesting & Reproduction steps
Updated unit tests
Also, manually tested:
acl.tokens.config_file_service_registration
and seeing successful registration of a service definitionacl.tokens.config_file_service_registration
and seeing successful registration of a check definitionacl.tokens.config_file_service_registration
acl.tokens.config_file_service_registration
acl.tokens.config_file_service_registration
and settingacl.tokens.default
to check that checks and services fall back to the default tokenconfig_file_service_registration
token is only used for registering services sourced from config filesconsul acl set-agent-token config_file_service_registration
and checking that<data-dir>/acl-tokens.json
is updated when token persistence is enabled, and that the updated token is used for subsequent service registrationsLinks
#4478
PR Checklist