Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authorization failed - ReadTimeout #1202

Closed
AndreaGiardini opened this issue Dec 13, 2022 · 5 comments
Closed

Authorization failed - ReadTimeout #1202

AndreaGiardini opened this issue Dec 13, 2022 · 5 comments

Comments

@AndreaGiardini
Copy link

Environment details

  • Python version: 3.10
  • google-auth version: 2.14.1

Steps to reproduce

We use the google-storage-python library to upload some files to GCS and, occasionally, we get errors from the google-auth library. Sometimes the run would work, sometimes a single upload would fail, sometimes multiple uploads would fail.

I've spent some time looking into the code and, while it looks like retries have been implemented in this PR ( #1113 ), I am not sure if the behavior also applies here ( https://github.com/googleapis/google-auth-library-python/blob/main/google/auth/transport/requests.py#L414-L418 ).

Why not use the default Retry object defined in the core-api? https://github.com/googleapis/python-storage/blob/3e4ea84163a6160703d219bea99e46b1e3965722/google/cloud/storage/retry.py#L57

2022-12-13T13:26:11.8192323Z 2022-12-13 13:25:53,991 - distributed.worker - WARNING - Compute Failed
2022-12-13T13:26:11.8192894Z Key:       upload_directory_to_gcs-52fd83e3-65a1-4bb8-890a-09f0b7d57d1d
2022-12-13T13:26:11.8193294Z Function:  execute_task
2022-12-13T13:26:11.8198326Z args:      ((<function apply at 0x7fc44c6df520>, <function upload_directory_to_gcs at 0x7fc41a282a70>, ['/vsigs/overstory-customer-test/risk_score_tests/dev_dagster_risk_score/no_chm/gd0a49e1c-1dec-4993-9193-8982afc7adce/', '/tmp/tmpzwg8ey_m/risk_score_span_38424'], (<class 'dict'>, [['delete_dir_after_cp', True]])))
2022-12-13T13:26:11.8199004Z kwargs:    {}
2022-12-13T13:26:11.8199650Z Exception: 'ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host=\'storage.googleapis.com\', port=443): Read timed out. (read timeout=60)"))'
2022-12-13T13:26:11.8200071Z 
2022-12-13T13:26:11.8230753Z 2022-12-13 13:25:54 +0000 - dagster - ERROR - encroachment_risk_score_graph - ff97359e-2a45-43e5-b2ed-221a90a7e208 - 4534 - risk_score_op - STEP_FAILURE - Execution of step "risk_score_op" failed.
2022-12-13T13:26:11.8231250Z 
2022-12-13T13:26:11.8235148Z dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "risk_score_op"::
2022-12-13T13:26:11.8235581Z 
2022-12-13T13:26:11.8235813Z requests.exceptions.ReadTimeout: None: None
2022-12-13T13:26:11.8236024Z 
2022-12-13T13:26:11.8236128Z Stack Trace:
2022-12-13T13:26:11.8244030Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/dagster/_core/execution/plan/utils.py", line 47, in solid_execution_error_boundary
2022-12-13T13:26:11.8244686Z     yield
2022-12-13T13:26:11.8245317Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/dagster/_utils/__init__.py", line 421, in iterate_with_context
2022-12-13T13:26:11.8245831Z     next_output = next(iterator)
2022-12-13T13:26:11.8275759Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/dagster/_core/execution/plan/compute_generator.py", line 73, in _coerce_solid_compute_fn_to_iterator
2022-12-13T13:26:11.8277774Z     result = fn(context, **kwargs) if context_arg_provided else fn(**kwargs)
2022-12-13T13:26:11.8278419Z   File "/home/runner/work/risk_score/risk_score/risk_score/dagster/ops/span_level_encroachment.py", line 208, in risk_score_op
2022-12-13T13:26:11.8278917Z     return _run_dask_span_level_encroachment(
2022-12-13T13:26:11.8279482Z   File "/home/runner/work/risk_score/risk_score/risk_score/dask_runner.py", line 582, in _run_dask_span_level_encroachment
2022-12-13T13:26:11.8279947Z     run_dask_risk_score_by_span(
2022-12-13T13:26:11.8280461Z   File "/home/runner/work/risk_score/risk_score/risk_score/dask_runner.py", line 444, in run_dask_risk_score_by_span
2022-12-13T13:26:11.8280927Z     dask_client.gather(futures)
2022-12-13T13:26:11.8282513Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/distributed/client.py", line 2291, in gather
2022-12-13T13:26:11.8283044Z     return self.sync(
2022-12-13T13:26:11.8283619Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/distributed/utils.py", line 339, in sync
2022-12-13T13:26:11.8283986Z     return sync(
2022-12-13T13:26:11.8284549Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/distributed/utils.py", line 406, in sync
2022-12-13T13:26:11.8285009Z     raise exc.with_traceback(tb)
2022-12-13T13:26:11.8285586Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/distributed/utils.py", line 379, in f
2022-12-13T13:26:11.8286199Z     result = yield future
2022-12-13T13:26:11.8286824Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/tornado/gen.py", line 762, in run
2022-12-13T13:26:11.8287262Z     value = future.result()
2022-12-13T13:26:11.8289132Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/distributed/client.py", line 2154, in _gather
2022-12-13T13:26:11.8289705Z     raise exception.with_traceback(traceback)
2022-12-13T13:26:11.8290317Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/dask/utils.py", line 71, in apply
2022-12-13T13:26:11.8291635Z     return func(*args, **kwargs)
2022-12-13T13:26:11.8292200Z   File "/home/runner/work/risk_score/risk_score/risk_score/risk_score_runner.py", line 50, in upload_directory_to_gcs
2022-12-13T13:26:11.8292784Z     upload_to_gcs(
2022-12-13T13:26:11.8293380Z   File "/home/runner/work/risk_score/risk_score/risk_score/risk_score_runner.py", line 102, in upload_to_gcs
2022-12-13T13:26:11.8294024Z     blob.upload_from_filename(local_path)
2022-12-13T13:26:11.8294850Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/google/cloud/storage/blob.py", line 2683, in upload_from_filename
2022-12-13T13:26:11.8295409Z     self.upload_from_file(
2022-12-13T13:26:11.8296187Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/google/cloud/storage/blob.py", line 2539, in upload_from_file
2022-12-13T13:26:11.8296801Z     created_json = self._do_upload(
2022-12-13T13:26:11.8297494Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/google/cloud/storage/blob.py", line 2354, in _do_upload
2022-12-13T13:26:11.8298136Z     response = self._do_multipart_upload(
2022-12-13T13:26:11.8298886Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/google/cloud/storage/blob.py", line 1889, in _do_multipart_upload
2022-12-13T13:26:11.8299493Z     response = upload.transmit(
2022-12-13T13:26:11.8300240Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/google/resumable_media/requests/upload.py", line 153, in transmit
2022-12-13T13:26:11.8300884Z     return _request_helpers.wait_and_retry(
2022-12-13T13:26:11.8301691Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/google/resumable_media/requests/_request_helpers.py", line 171, in wait_and_retry
2022-12-13T13:26:11.8312341Z     raise error
2022-12-13T13:26:11.8313401Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/google/resumable_media/requests/_request_helpers.py", line 148, in wait_and_retry
2022-12-13T13:26:11.8366773Z     response = func()
2022-12-13T13:26:11.8367844Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/google/resumable_media/requests/upload.py", line 145, in retriable_request
2022-12-13T13:26:11.8368516Z     result = transport.request(
2022-12-13T13:26:11.8369254Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/google/auth/transport/requests.py", line 549, in request
2022-12-13T13:26:11.8369946Z     response = super(AuthorizedSession, self).request(
2022-12-13T13:26:11.8370703Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
2022-12-13T13:26:11.8372332Z     resp = self.send(prep, **send_kwargs)
2022-12-13T13:26:11.8373186Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
2022-12-13T13:26:11.8374620Z     r = adapter.send(request, **kwargs)
2022-12-13T13:26:11.8376155Z   File "/usr/share/miniconda/envs/test/lib/python3.10/site-packages/requests/adapters.py", line 578, in send
2022-12-13T13:26:11.8470574Z     raise ReadTimeout(e, request=request)

Any suggestion is welcome

@clundin25
Copy link
Contributor

Thank you for opening this issue @AndreaGiardini.

#1113 Adjusts the retry behavior for the OAUTH2.0 logic. This stack trace doesn't seem to be related to token refresh, but a timeout when communicating with GCS.

Based on this data, the problem does not seem related to google-auth-library-python, but I am happy to be corrected.

A possible solution could be to increase the read time out https://github.com/googleapis/google-resumable-media-python/blob/main/google/resumable_media/requests/upload.py#L48?

At a glance wait_and_retry from the stacktrace doesn't seem to be retrying on a ReadTimeout exception. Disclaimer, I am unfamiliar with this code and I did not spend much time looking.

I'll try to take a deeper look later but I hope this helps move along the investigation.

This issue may be more appropriate at https://github.com/googleapis/python-storage but I will wait to hear from you.

@clundin25
Copy link
Contributor

Feel free to re-open this @AndreaGiardini ! Closing this as it has been 3 weeks with no response.

@AndreaGiardini
Copy link
Author

Hello @clundin25

Sorry for the late answer... Christmas holidays :)

I think the wait_and_retry function correctly retries the call here:

https://github.com/googleapis/google-resumable-media-python/blob/3dce90da83b9462b53450b632a41974c182edee7/google/resumable_media/requests/_request_helpers.py#L149-L150

ReadTimeout is a subclass of Timeout, so it should be retried:

https://github.com/googleapis/google-resumable-media-python/blob/3dce90da83b9462b53450b632a41974c182edee7/google/resumable_media/requests/_request_helpers.py#L37-L43

I've also modified the tests to use ReadTimeout rather than Timeout, and they still all pass.

What do you think?

@clundin25 clundin25 reopened this Jan 4, 2023
@clundin25
Copy link
Contributor

@AndreaGiardini Hope you had a great holiday!

Is it that the request is running out of retries in this scenario? Can you provide some more data as to why you believe this is an issue with the auth library?

Thank you!

@clundin25
Copy link
Contributor

Hi @AndreaGiardini,

Feel free to re-open this issue if you are still facing this problem.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants