Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new config_file_service_registration token #15828

Merged
merged 14 commits into from
Jan 10, 2023

Conversation

pglass
Copy link

@pglass pglass commented Dec 16, 2022

Description

This adds a new agent token: config_file_service_registration. This token is used to register services and checks that are defined in local config files (including when defined in flags, such as with -hcl).

This adds:

  • The config field acl.tokens.config_file_service_registration
  • The PUT /agent/token/config_file_service_registration HTTP API request
  • The consul acl set-agent-token config_file_service_registration <token> command

The precedence of tokens when registering a service from a service definition or a check from a check definition is:

  1. Inline service token: The token from the token field in the service/check definition is used, if set
  2. Config File Service Registration token: otherwise, the config file registration token is used, if set
  3. Default token: otherwise, the default token is used, if set
  4. Anonymous token: otherwise, the anonymous token is used

Testing & Reproduction steps

  • Updated unit tests

  • Also, manually tested:

    • Setting acl.tokens.config_file_service_registration and seeing successful registration of a service definition
    • Setting acl.tokens.config_file_service_registration and seeing successful registration of a check definition
    • Defining an inline token in the service definition and validating that token is used instead of the acl.tokens.config_file_service_registration
    • Defining an inline token in the check definition and validating that token is used instead of the acl.tokens.config_file_service_registration
    • Unsetting acl.tokens.config_file_service_registration and setting acl.tokens.default to check that checks and services fall back to the default token
    • Registering services via the HTTP ensure that the config_file_service_registration token is only used for registering services sourced from config files
    • Running consul acl set-agent-token config_file_service_registration and checking that <data-dir>/acl-tokens.json is updated when token persistence is enabled, and that the updated token is used for subsequent service registrations

Links

#4478

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

Sorry, something went wrong.

@pglass pglass requested review from a team, skpratt and kisunji and removed request for a team December 16, 2022 23:08
@github-actions github-actions bot added theme/api Relating to the HTTP API interface theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading labels Dec 16, 2022
//
// The fallback function will return the config file registration token if the
// given service was sourced from a service definition in a config file.
func (l *State) RegistrationTokenFallback(key structs.ServiceID) func() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think this is used outside package state so it could be private.

Alternatively, what do you think about merging this logic into the body of aclTokenForServiceSync since it already does a lookup of l.services[key]?

Passing around a lock-guarded map in a closure makes me a little cautious; in this PR the codepaths are synchronized but this could be accidentally misused in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used outside package state so it could be private.

I agree. I had to export the method because the test package is different (local vs local_test). Which seems unusual to me. But, without exporting it I can't call it in unit tests.

Alternatively, what do you think about merging this logic into the body of aclTokenForServiceSync since it already does a lookup of l.services[key]?

I opted against this because aclTokenForServiceSync is also used in deleteService.

So, do we want deleteService to also incorporate the config file registration token in its list of fallback tokens? Generally, it seems better to me if it does not.

The main concern to me is if the config_file_registration token has been deleted, then it would fail to deregister the service and we'd see errors in logs. Also, it should fallback to using the agent token anyway. (It is able to use the agent token for service deregistrations because the Catalog.Deregister RPC accepts a token with the relevant node:write permissions).

Because the agent token must have node:write permissions (or else it could not have registered it's node into the catalog) and because the agent token is probably less likely to have been deleted (because agent lifecycle is longer than service instance lifecycle), it seems like the agent could skip straight to using the agent token without considering the config_file_registration token for the deregistration, rather than incorporating the agent token in the list of "fallback" tokens. And that would be faster and would not generate a deceptive log message. There's some relevant discussion on this here: #8078

That's why I opted against inlining the config_file_registration token fallback into aclTokenForServiceSync. Although it does leave me a question: why does deleteService try using the service token and then fallback to agent token, instead of unilaterally using the agent token for deregistrations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the agent token typically only has node:write on itself. The agent token generally wouldn't have service:write needed to deregister a service.

Copy link
Author

@pglass pglass Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the agent token typically only has node:write on itself. The agent token generally wouldn't have service:write needed to deregister a service.

Right, but the Consul servers will accept a deregistration if the token contains node:write for the node containing that service. See #5217 and

// Allow service deregistration if the token has write permission for the node.
// This accounts for cases where the agent no longer has a token with write permission
// on the service to deregister it.
nodeWriteErr := authz.ToAllowAuthorizer().NodeWriteAllowed(subj.Node, &authzContext)
if nodeWriteErr == nil {
return nil
}

And all services registered with an agent must be registered to that agent's node. (Or from a perms perspective, only to the nodes which the agent has permission to update)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it does leave me a question: why does deleteService try using the service token and then fallback to agent token, instead of unilaterally using the agent token for deregistrations?

I've tested this.

The agent only falls back to the agent token if the service token is unset for that particular service (i.e. token field is empty or absent from the service definition in any config files). It does not fallback to other tokens on failure to deregister (i.e. it will not try the deregistration with the service token and, on failure, then try the agent token).

If the original service token has been deleted from the servers, because the agent has stored that service token in its local state, it continues to use that original service token to deregister that service during each state sync - which will repeatedly fail each time the state sync is retried. This feels like a bit of a gotcha.

Basically, I'm weighing two options:

  1. The existing behavior is "good", so we should have deleteService include the config_file_registration token in its list of fallbacks. If set, the config_file_registration would be used instead of the agent token. And if the config_file_registration token was deleted, then the deregistration would fail forever.
  2. Or, the existing behavior isn't great, and it would be better for it to only use the agent token for service deregistrations because that will "just work" because of the node:write "bypass" for service deregistrations.

Thoughts @jkirschner-hashicorp?

Copy link
Contributor

@jkirschner-hashicorp jkirschner-hashicorp Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it accurate to say that the agent basically isn't functional if it lacks a token with node:write on itself?

If so... Is there any downside to approach 2? If I understand correctly, approach 2 would always work assuming the node is still functional (has a token the node:write). Why use approach 1 (which has some edge cases) if approach 2 must necessarily work?

Is there a separate case where a service is being deregistered directly from the server agents rather than from the node that owns the service, in cases where the node no longer exists but the service was never deregistered (but needs to be cleaned up)?

Happy to have a quick Zoom about this tomorrow.

Copy link
Author

@pglass pglass Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think this is used outside package state so it could be private.

I reworked these tests so the methods are unexported.

Copy link
Contributor

@kisunji kisunji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments but LGTM. What do you think about renaming config_file_registration to something shorter like registration or static_registration? It feels a little verbose but on the other hand maybe the detailed name makes it more clear.

@jkirschner-hashicorp
Copy link
Contributor

jkirschner-hashicorp commented Jan 3, 2023

What do you think about renaming config_file_registration to something shorter like registration or static_registration? It feels a little verbose but on the other hand maybe the detailed name makes it more clear.

In my experience, Consul agent token names are commonly misunderstood, such as agent and default. I personally prefer that we have an unambiguous name that is 3 words (config_file_registration) rather than a shorter name with ambiguity, especially since this isn't something that will be typed all the time (like partition in enterprise to refer to an administrative partition). I feel like registration is too ambiguous, as HTTP API calls are also methods to perform "registration", but wouldn't use this token. static_registration resolves that ambiguity, though because config files are the only way to perform static registration (AFAIK), using config_file_ rather than static_ seems best to me, as there's no interpretation required on the part of the user (to figure out what "static" refers to in this context).

@vercel
Copy link

vercel bot commented Jan 5, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
consul 🔄 Building (Inspect) Jan 6, 2023 at 10:25PM (UTC)
consul-ui-staging 🔄 Building (Inspect) Jan 6, 2023 at 10:25PM (UTC)

@pglass pglass changed the title Add new config_file_registration token Add new config_file_service_registration token Jan 6, 2023
@pglass
Copy link
Author

pglass commented Jan 6, 2023

I've updated this, so I've requested a re-review @kisunji

  • Rename config_file_registration to config_file_service_registration
  • Checks in config files also use the config_file_service_registration token. This is because checks can be inlined in service definitions and service-level checks do not need additional permissions, so it makes sense for service-level checks to be registered with this token as well.
  • Updated the external docs
  • Added changelog
  • Rebased onto main

Copy link
Contributor

@kisunji kisunji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; left mostly docs suggestions

AgentRecovery *string `mapstructure:"agent_recovery"`
Default *string `mapstructure:"default"`
Agent *string `mapstructure:"agent"`
ConfigFileRegistration *string `mapstructure:"config_file_service_registration"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the _service_ part omitted for brevity?

Copy link
Author

@pglass pglass Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. With Golang's general style preference for shorter variables, I thought ConfigFileRegistration was already long enough but still clear enough.

Token: token,
Service: service,
Token: token,
IsLocallyDefined: isLocal,
})
return nil
}

// AddServiceWithChecks adds a service entry and its checks to the local state atomically
// This entry is persistent and the agent will make a best effort to
// ensure it is registered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the godoc to describe what isLocal should represent?
I can imagine someone confusing the "local" concept with peered services.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I updated this.

fwiw, I used "local" in order match ConfigSourceLocal.

return tok
}
}
return ""
}

// AddCheck is used to add a health check to the local state.
// This entry is persistent and the agent will make a best effort to
// ensure it is registered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about updating godocs

Comment on lines 78 to 79
// configFileRegistrationToken is used to register services defined
// with a service definitions in a config file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// configFileRegistrationToken is used to register services defined
// with a service definitions in a config file.
// configFileRegistrationToken is used to register services and checks
// defined with a service/check definition in a config file.

@@ -46,6 +46,13 @@ The token types are:
operations. This token will need to be configured with read access to
whatever data is being replicated.

- `config_file_service_registration` - This is the token that the agent uses to
register services and checks defined in config files. This token needs to be
configured with permission for the service or checks being registered. If not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
configured with permission for the service or checks being registered. If not
configured with write permissions for the service or checks being registered. If not

Comment on lines 919 to 920
[check definitions](/docs/discovery/checks) foudn in configuration files or in configuration
strings passed to the agent using the `-hcl` flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[check definitions](/docs/discovery/checks) foudn in configuration files or in configuration
strings passed to the agent using the `-hcl` flag.
[check definitions](/docs/discovery/checks) found in configuration files or in configuration
strings passed to the agent using the `-hcl` flag.

Would this be more concise and still convey the same information? @jkirschner-hashicorp

Suggested change
[check definitions](/docs/discovery/checks) foudn in configuration files or in configuration
strings passed to the agent using the `-hcl` flag.
[check definitions](/docs/discovery/checks) on startup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does registration also happen on reload?

If so, we could do something like:

Suggested change
[check definitions](/docs/discovery/checks) foudn in configuration files or in configuration
strings passed to the agent using the `-hcl` flag.
[check definitions](/docs/discovery/checks) loaded by the agent on startup and reload.

Copy link
Contributor

@jkirschner-hashicorp jkirschner-hashicorp Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If we keep the -hcl flag mention, would we also need to include -json?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, -json doesn't exist. The consul agent command only has -hcl for config fragments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -hcl flag enables operators to specify agent configuration values on the CLI. There is currently no equivalent -json flag for allowing agent configuration to be provided in JSON format. If we wanted to support that, it would be a new feature that requires additional development.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to suggest we support that. For some reason I thought I had seen an invocation of a Consul agent with that recently, but it seems like I misremembered (e.g., perhaps it was a config file being passed in with JSON format).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer to be more elaborate / specific here to help reduce confusion?

If we say "configuration passed at startup", I feel like "startup" leaves room for interpretation. Is sending an HTTP request passing configuration to the agent? Does that include if I send a service registration request while the agent is "starting up"?

I wanted to be clear about what services/checks the token is specifically used for (those services/checks in files or -hcl config fragments).

Comment on lines 922 to 925
If an inline token is defined in the service or check definition, then the inline token is
used to register that service or check instead. If the `config_file_service_registration` token is not
defined and if a service or check has no inline token, then the agent uses the
[`default`](#acl_tokens_default) token to register the service or check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "inline" might be confusing to users

Suggested change
If an inline token is defined in the service or check definition, then the inline token is
used to register that service or check instead. If the `config_file_service_registration` token is not
defined and if a service or check has no inline token, then the agent uses the
[`default`](#acl_tokens_default) token to register the service or check.
If the `token` field is defined in the service or check definition, then that token is
used to register that service or check instead. If the `config_file_service_registration` token is not
defined and if a service or check has no defined `token` field, then the agent uses the
[`default`](#acl_tokens_default) token to register the service or check.

Comment on lines 929 to 930
`config_file_service_registration` token needs multiple `service:write` permissions in order for
the agent to register those services.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "multiple" could be interpreted as needing N>1 perms.

Suggested change
`config_file_service_registration` token needs multiple `service:write` permissions in order for
the agent to register those services.
`config_file_service_registration` token needs `service:write` permissions for all services
in order for the agent to register them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note, what happens if a config_file_service_registration token has permissions for a partial set of services? Does it fail to write all services or does it skip only the service with the missing perm?

Could make the behavior clear in the docs here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this using two named services "A" and "B" as an example. I also elaborated a bit more on the failure case (maybe too much?). Let me know what you think!

@pglass pglass merged commit f5231b9 into main Jan 10, 2023
@pglass pglass deleted the pglass/NET-1768-config-file-registration-token branch January 10, 2023 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/api Relating to the HTTP API interface theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants