Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests from outside the cluster may have stale tokens and fail with status code 401 #1946

Closed
AlexisZam opened this issue Nov 8, 2022 · 2 comments · Fixed by #1947
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@AlexisZam
Copy link
Contributor

AlexisZam commented Nov 8, 2022

What happened

A request to Kubernetes from outside the cluster failed with status code 401, even though the configuration had been loaded and previous requests had been successful.

The traceback is

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/kubernetes/client/api/core_v1_api.py", line 14721, in list_namespace
    return self.list_namespace_with_http_info(**kwargs)  # noqa: E501
  File "/usr/lib/python3/dist-packages/kubernetes/client/api/core_v1_api.py", line 14842, in list_namespace_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python3/dist-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/usr/lib/python3/dist-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python3/dist-packages/kubernetes/client/api_client.py", line 377, in request
    headers=headers)
  File "/usr/lib/python3/dist-packages/kubernetes/client/rest.py", line 244, in GET
    query_params=query_params)
  File "/usr/lib/python3/dist-packages/kubernetes/client/rest.py", line 234, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Audit-Id': '7dd1383a-497d-4d3c-85cc-26a13dc9406f', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 08 Nov 2022 10:07:04 GMT', 'Content-Length': '129'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

What you expected to happen

Given the 410 Unauthorized response, we can safely assume that the problem lies with tokens and more specifically the way tokens are refreshed.

The client has had an issue with tokens, namely issue #741, but it has been closed and (should have been) fixed by the kubernetes-client/python-base/pull/250 PR. Therefore no requests should be made with stale tokens and fail with 410 Unauthorized.

How to reproduce it

A minimal example is

import time

from kubernetes import config
from kubernetes.client import CoreV1Api

config.load_config()

api_1 = CoreV1Api()
api_2 = CoreV1Api()

_ = api_1.list_namespace()  # succeeds
_ = api_2.list_namespace()  # succeeds

time.sleep(15 * 60)

_ = api_1.list_namespace()  # succeeds
_ = api_2.list_namespace()  # fails

That is, we

  1. load the configuration,
  2. create two clients,
  3. make one request with each client,
  4. sleep for 15 minutes, and
  5. make one more request with each client.

The tokens we receive from EKS have a TTL of 14 minutes (see also https://github.com/aws/aws-cli/blob/2.8.9/awscli/customizations/eks/get_token.py#L62). Therefore we sleep for 15 minutes waiting for the token to expire.

Before waiting both requests will succeed, but after waiting the first request will succeed and the second one will fail.

Note that what matters is which client makes the request first after the wait (not which client was created or made a request first before the wait). For example, if api_2 makes the request first after the wait, then it will succeed and api_1 will fail.

Also note that we receive 410 Unauthorized responses only when we make requests

  1. with multiple clients. If we create a single client, all the requests we make with it will be successful. This suggests that different clients have different tokens they make requests with, and some of these tokens are stale.
  2. from outside the cluster. Requests from inside the cluster succeed. This means the problem lies in kube_config.

Anything else we need to know?

A closer look at the example shows that (under the hood) we

  1. a. instantiate a loader (KubeConfigLoader()),
    b. get a token and update the token (token) and expiration timestamp (expiry) of the loader, and
    c. update the token (api_key['authorization']) of the default configuration (Configuration._default),
  2. a. instantiate two clients (CoreV1Api()), and
    b. set copies of the default configuration as their configuration,
  3. make a request with each client,
  4. wait for the token to expire, and
  5. a. get a fresh token and update the token and expiration timestamp of the loader,
    b. update the token of the first client,
    c. make a request with the first client (with the updated token), and
    d. make a request with the second client (with the expired token).

Notice that we do not update the token of the second client.

When we make a request with a client, if the token of the loader is expired or expiring in 5 minutes or less, we

  1. get a fresh token and update the token and expiration timestamp of the loader, and
  2. update the token of the client.

In the example, after the token expires,

  1. the first client makes a request which triggers the refresh hook. That is, the loader both refreshes its token and updates the token of the client.
  2. the second client makes a request which does not trigger the refresh hook. That is, the loader neither refreshes its token (as it should not) nor updates the token of the client (as it should).

To fix this issue, when a client makes a request, the loader should

  1. refresh its token, only if the token is expired or expiring, but
  2. update the token of the client every time.

The diff is

diff --git a/kubernetes/base/config/kube_config.py b/kubernetes/base/config/kube_config.py
index b95955448..f72890982 100644
--- a/kubernetes/base/config/kube_config.py
+++ b/kubernetes/base/config/kube_config.py
@@ -575,7 +575,7 @@ class KubeConfigLoader(object):
             def _refresh_api_key(client_configuration):
                 if ('expiry' in self.__dict__ and _is_expired(self.expiry)):
                     self._load_authentication()
-                    self._set_config(client_configuration)
+                self._set_config(client_configuration)
             client_configuration.refresh_api_key_hook = _refresh_api_key
         # copy these keys directly from self to configuration object
         keys = ['host', 'ssl_ca_cert', 'cert_file', 'key_file', 'verify_ssl']

Environment

  • Kubernetes version: v1.25.3
  • OS: Ubuntu 22.04.1 LTS
  • Python version: Python 3.10.6
  • Python client version: 25.3.0
@AlexisZam AlexisZam added the kind/bug Categorizes issue or PR as related to a bug. label Nov 8, 2022
AlexisZam added a commit to arrikto/kubernetes-client-python that referenced this issue Nov 8, 2022
Requests from outside the cluster may have stale tokens and fail with
status code `401`.

Closes kubernetes-client#1946

Signed-off-by: Alexis Zamanis <alexiszam@arrikto.com>
AlexisZam added a commit to arrikto/kubernetes-client-python that referenced this issue Nov 8, 2022
Requests from outside the cluster may have stale tokens and fail with
status code `401`.

Fixes kubernetes-client#1946

Signed-off-by: Alexis Zamanis <alexiszam@arrikto.com>
@roycaihw
Copy link
Member

roycaihw commented Dec 5, 2022

cc @yliaog

@roycaihw
Copy link
Member

roycaihw commented Dec 5, 2022

/assign @AlexisZam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants