Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support projected bound service account tokens when run in-cluster #2271

Closed
aiman-alsari opened this issue Jun 8, 2020 · 36 comments

Comments

@aiman-alsari
Copy link

Kubernetes has a beta feature to increase service account token security that is planned to go to General Availability and will become the default at some point.

The general idea is that rather than using the automounted service account token at /var/run/secrets/kubernetes.io/serviceaccount/token, you can project a volume that contains a pod scoped token that is valid for the lifetime of the pod using it. The token can also be auto-rotated based on some TTL. This means accidental token leakage doesn't require a manual rotation of keys etc.

See https://github.com/kubernetes/community/blob/master/contributors/design-proposals/auth/bound-service-account-tokens.md and https://github.com/mikedanese/community/blob/2bf41bd80a9a50b544731c74c7d956c041ec71eb/contributors/design-proposals/storage/svcacct-token-volume-source.md

It would be great if the kubernetes-client could handle the SA token being updated in the filesystem. Currently it stores the value of the token in the io.fabric8.kubernetes.client.Config object's oauthToken property. AFAIK this never gets reloaded.
Currently the downstream consumers of the client will need to reload their client objects to refresh the config when a token rotates. Whilst this is doable, it feels like it should be taken care of by the kubernetes-client.

It's also not possible (that I could see) to configure the SA token path, which could be projected to somewhere other than the default location of /var/run/secrets/kubernetes.io/serviceaccount/token.

@stale
Copy link

stale bot commented Sep 20, 2020

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@titisan
Copy link

titisan commented May 12, 2021

The BoundServiceAccountTokenVolume feature has been promoted to beta, and enabled by default in Kubernetes 1.21
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#changelog-since-v1200

I think this feature request is more relevant now that BoundServiceAccountTokenVolume is enabled by default in latest Kubernetes.

@rohanKanojia
Copy link
Member

@titisan : Sorry, looks like I missed this while upgrading kubernetes model to v1.21.0 . Do you know in which package BoundServiceAccountTokenVolume exists? Would it be possible for you to contribute a PR for this ? We'll be happy to provide code pointers

@titisan
Copy link

titisan commented May 18, 2021

I was checking the code and I wonder if the Interceptor for handling expired OIDC tokens (TokenRefreshInterceptor) will also refresh service account token when expires.

I think the "else" will be executed in case of authenticating with the service account token. The autoconfigure() will reload the token.

According to Kubernetes 1.21 release notes: "Clients should reload the token from disk periodically (once per minute is recommended) to ensure they continue to use a valid token." With current implementation of TokenRefreshInterceptor will only refresh (reload) the token when API server rejects with HTTP status code 401.

@manusa
Copy link
Member

manusa commented May 18, 2021

I think #3105 took care of this.

With current implementation of TokenRefreshInterceptor will only refresh (reload) the token when API server rejects with HTTP status code 401.

IMHO it's cheaper to do this (+backwards compatible), than having a periodic job that reloads the token from the disk files (i.e. once every time is needed instead of once per minute even if it's not needed).

@titisan
Copy link

titisan commented May 18, 2021

Thanks @manusa, I do agree it is a cheaper solution than reloading the token from disk periodically.

Looking forward to have #3105 in next release.

@mikebell90
Copy link

mikebell90 commented Feb 1, 2022

It is cheaper but incorrect. What happens if I read the docs right is you will not get the 401. Not for a year!

As part of the transition to time limited tokens, the initial token is good for a year, but after a time D (between 1-3) hours, it will be refreshed, the old one remains valid, but the whole point is to generate prometheus metrics to find old updated clients (such as this one). And this client (well its users) will be flagged as "stale"

See https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/1205-bound-service-account-tokens

A 401 is therefore necessary but insufficient, and will make it impossible for consumers using this library to know if is working properly (and in fact in a year it will stop working)

@mikebell90
Copy link

mikebell90 commented Feb 1, 2022

From the 1.21 readme:

The BoundServiceAccountTokenVolume feature has been promoted to beta, and enabled by default.

  • This changes the tokens provided to containers at /var/run/secrets/kubernetes.io/serviceaccount/token to be time-limited, auto-refreshed, and invalidated when the containing pod is deleted.
  • Clients should reload the token from disk periodically (once per minute is recommended) to ensure they continue to use a valid token. k8s.io/client-go version v11.0.0+ and v0.15.0+ reload tokens automatically.
  • By default, injected tokens are given an extended lifetime so they remain valid even after a new refreshed token is provided. The metric serviceaccount_stale_tokens_total can be used to monitor for workloads that are depending on the extended lifetime and are continuing to use tokens even after a refreshed token is provided to the container. If that metric indicates no existing workloads are depending on extended lifetimes, injected token lifetime can be shortened to 1 hour by starting kube-apiserver with --service-account-extend-token-expiration=false. (#95667, @zshihang) [SIG API Machinery, Auth, Cluster Lifecycle and Testing]

@mikebell90
Copy link

Obviously a hack could be used (a fragile one) wherein everyone that gets a Client instances gets it from a Supplier class, and "pinky-promises" to use that locally only (not per instance). Then the Supplier could reload the Client in a background thread and replace an AtomicReference. That seems hacky and kind of expensive.

@mikebell90
Copy link

@manusa I don't see anyway for us to plugin alternative interceptors (conveniently at least). And have you see the above? It's an issue IMO

@manusa
Copy link
Member

manusa commented Mar 2, 2022

We could add a configuration option that would mark the token stale after n period.

Then a few options:

  • Scheduled thread that reloads the Token file
  • Check staleness for each operation and reload Token file if needed

@mikebell90
Copy link

This also affects informers, which I believe are not as easy to hack as basic usage.

@victornoel
Copy link

Hi,

As AWS is rolling out Kubernetes 1.21 nowadays, I was wondering what is the status of the support of this feature?

Has it been released already?

Cheers!

@balonik
Copy link

balonik commented May 12, 2022

This needs more visibility, because as far as I understand AWS EKS disabled the default 1-year extended period and configured 90d period instead.

@PettitWesley
Copy link

PettitWesley commented May 31, 2022

I am working on implementing this

EDIT: Actually I am not sure how but somehow I was confused and thought this repo was related to the k8s ruby plugin for fluentd: fabric8io/fluent-plugin-kubernetes_metadata_filter#337

@PettitWesley
Copy link

Change is here: fabric8io/fluent-plugin-kubernetes_metadata_filter#337

It is already released in the plugin version 2.11.1

Though I am still working on doing final testing actually... but as far as we can tell it works. I will post an update if it actually doesn't.

@PettitWesley
Copy link

(It works don't worry)

@vgaddavcg
Copy link

vgaddavcg commented Jul 1, 2022

@PettitWesley @manusa Any update on this? Are you changes merged to kubernetes-client.

@PettitWesley
Copy link

@vgaddavcg Changes are merged here: fabric8io/fluent-plugin-kubernetes_metadata_filter#337

And released in plugin version 2.11.1 from what I saw. Anything else beyond that is up to you folks to handle

@scholzj
Copy link
Contributor

scholzj commented Jul 6, 2022

@PettitWesley I'm a bit confused how does an issue in Java based Kubernetes client relate to a PR in Ruby based Fluentd plugin.

@PettitWesley
Copy link

@scholzj @vgaddavcg Sorry yea it seems that I got confused and thought this repo was also related to Fluentd... apologies, I have not fixed anything in this repo.

@AbdelrhmanHamouda
Copy link
Contributor

Hello,
Any update on this? critical parts of our operations are impacted and the 90 days grace period allowed by EKS to ensure compatibility is running out.
Thanks in advance,

@victornoel
Copy link

@AbdelrhmanHamouda I believe it was fixed in #4264 and shipped with 6.1.0.

@AbdelrhmanHamouda
Copy link
Contributor

thanks @victornoel, this is a great help.

@aiman-alsari
Copy link
Author

Thanks Victor, I'll close the issue seeing as it has been fixed and merged.

@manusa manusa modified the milestones: 6.x, 6.1.0 Sep 22, 2022
@apiwoni
Copy link

apiwoni commented Sep 22, 2022

I think provided solution is more of a brute force approach to refresh service account token every minute. Could we attempt to test for recommended expires_in before preemptively refreshing token or just let it fail and then refresh access token and retry?

@andreaTP
Copy link
Member

@apiwoni I haven't looked at the current implementation but your question does make sense to me. Would you be able to get a PR together for this improvement?

@mikebell90
Copy link

I believe that will break the metrics . Please read the kep. There may be extended tokens for a year but not reareading and reapplying every minute violates the kep

@mikebell90
Copy link

It is very similar to the original “let’s wait for a 403 approach” already demonstrated to be incorrect

@apiwoni
Copy link

apiwoni commented Sep 22, 2022

@mikebell90 Does any of this really applies in cases where client provides access token without using KUBECONFIG but rather by getting it via custom API? I know duration of my access tokens.
Microsoft Azure Ad does not provide refresh tokens for Oauth client credentials flow so I need to generate new token when it expires. It does not seem I can use OpenIDConnectionUtils#resolveOIDCTokenFromAuthConfig to generate new access token when refresh is not supported and token expired.

@mikebell90
Copy link

Only to serviceaccount tokens AFAIK. What I would suggest is making the default remain 1 minute but allow configuration in a. builder somewhere.

@apiwoni
Copy link

apiwoni commented Sep 29, 2022

@mikebell90 Since client credentials flow does not support refresh token and token from Azure Ad expires after 60 minutes I need a way to acquire new token in interceptor. I have my custom version of Interceptor for old version of Kubernetes client which I was able to wire up as follows:

      val httpClientBuilder = new DefaultKubernetesClient(config).getHttpClient.newBuilder();
      val httpClient   = httpClientBuilder
        //.addInterceptor(new CustomTokenRefreshInterceptor(config,self,clientName))
        .addOrReplaceInterceptor(TokenRefreshInterceptor.NAME,new CustomTokenRefreshInterceptor(config,self,clientName))
        .build()
      new DefaultKubernetesClient(httpClient,config)

Direct usage of DefaultKubernetesClient is deprecated now and builder patterns are not that helpful in replacing TokenRefreshInterceptor with custom one. I was hoping below shenanigans would work but it doesn't:

      val kubernetesClient = new KubernetesClientBuilder()
        .withConfig(config)
        .build()

      val httpClientFactory = kubernetesClient
        .getHttpClient
        .newBuilder()
        .addOrReplaceInterceptor(TokenRefreshInterceptor.NAME,new CustomTokenRefreshInterceptor(config,self,clientName))
        .build().getFactory

      new KubernetesClientBuilder()
        .withConfig(config)
        .withHttpClientFactory(httpClientFactory)
        .build()

Is there a way to replace TokenRefreshInterceptor without using deprecated API or creating my own custom HttpClientFactory?

@mikebell90
Copy link

It's a good question and I don't know. TBH I think this is a leaky abstraction design flaw of the client. It really shouldn't leak low level okhttp specific details.

@shawkins
Copy link
Contributor

I was hoping below shenanigans would work but it doesn't:

Can you explain what doesn't work?

TBH I think this is a leaky abstraction design flaw of the client. It really shouldn't leak low level okhttp specific details.

Can you explain what you mean here? The api referenced above is abstracted from OkHttp. It is based upon io.fabric8.kubernetes.client.http.Interceptor

@apiwoni
Copy link

apiwoni commented Sep 29, 2022

@shawkins First of all, I should not have to create Kubernetes client first so that I can get http client builder to replace interceptor and then get http client factory and finally create Kubernetes clients from builder using that http client factory. This is pretty messy builder pattern implementation from API point of view.

Now as far as what doesn't work.

kubernetesClient.getHttpClient.newBuilder().addOrReplaceInterceptor() .build() builds instance of HttpClient with replaced interceptor but when I call HttpClient.getFactory so I could build Kubernetes client with that factory only default interceptors are included. It looks like factory from HttpClient.getFactory does not create HttpClient with interceptor I have replaced when going through debugging session but with interceptors from this chain:

io.fabric8.kubernetes.client.okhttp.OkHttpClientFactory#createHttpClient -> io.fabric8.kubernetes.client.utils.HttpClientUtils#applyCommonConfiguration ->
io.fabric8.kubernetes.client.utils.HttpClientUtils#createApplicableInterceptors

In short, interceptor is added via http client builder to build HttpClient but that does not carry over to factory returned from getFactory call on build HttpClient

Why not pass HttpClientBuilder or HttpClientFactoryBuilder to KubernetesClientBuilder?

@shawkins
Copy link
Contributor

First of all, I should not have to create Kubernetes client first so that I can get http client builder to replace interceptor and then get http client factory and finally create Kubernetes clients from builder using that http client factory. This is pretty messy builder pattern implementation from API point of view.

That's obviously not intended. There are really two issues here. The first is that there's no current way exposed for getting the default HttpClient Factory. A method for that can easily be extracted from HttpClientUtils.createHttpClient. The second is that the factory logic assumes customization via sub-classing (see the XXXClientFactory.additionalConfig methods). There wasn't a mechanism envisioned for modifying across any factory - such as modifying the interceptors in a common way.

Why not pass HttpClientBuilder or HttpClientFactoryBuilder to KubernetesClientBuilder?

Since we added HttpClient.getFactory, HttpClient.Builder is viable as a possible argument to KubernetesClientBuilder. To fully decompose it's also possible to change HttpClient.Factory.createHttpClient(Config config) to be HttpClient.Factory.processConfig(Config config, HttpClient.Builder).

That looks approximately like:

HttpClient.Factory factory = HttpClient.Factory.new();
HttpClient.Builder builder = factory.newBuilder();
// make modifications that could be overridden by processing the config
factory.processConfig(config, builder);
// make modifications that override the config
builder.addOrReplaceInterceptor ...
// then supply the builder to the KubernetesClientBuilder

However it's also possible to just associate a Consumer<HttpClient.Builder> with the KubernetesClientBuilder:

new KubernetesClientBuilder().withConfig(config).withAdditionalHttpClientConfiguration(b -> b.addOrReplaceInterceptor... ).build();

That eliminates a few of the steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests