Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

30s+ Hang When Using Manual Token Authentication #2224

Closed
Cooksauce opened this issue May 16, 2024 · 7 comments
Closed

30s+ Hang When Using Manual Token Authentication #2224

Cooksauce opened this issue May 16, 2024 · 7 comments
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@Cooksauce
Copy link

Cooksauce commented May 16, 2024

Bug Description

When using token auth with IAM (e.g. --auto-iam-authn --token --login-token), the proxy takes an unreasonably long time to do the IAM postgres authentication. This can be seen in the logs attached when using --run-connection-test.

This long delay also happens each time a postgres client tries to access the db thru the proxy - including subsequent accesses.
E.g. psql ... takes 30s+ to get into the postgres> repl

The same behavior is not observed when authenticating via Application Default Credentials.

Example code (or command)

No response

Stacktrace

No response

Steps to reproduce?

  1. Login via gcloud:gcloud auth login
  2. Run proxy

Environment

  1. OS type and version: MacOS 14.4.1
  2. Cloud SQL Proxy version (./cloud-sql-proxy --version): 2.10.1
  3. Proxy invocation command (for example, ./cloud-sql-proxy --port 5432 INSTANCE_CONNECTION_NAME):
   ./cloud-sql-proxy-v2.10.1 --auto-iam-authn --run-connection-test \
      --token "$(gcloud auth print-access-token)" \
      --login-token "$(gcloud sql generate-login-token)" \
      --debug-logs \
      my-project:us-central1:my-instance

Additional Details

Debug Logs:

Using ADC:

./cloud-sql-proxy-v2.10.1 --auto-iam-authn --run-connection-test --debug-logs \
my-project:us-central1:my-instance
2024/05/16 16:26:30 Authorizing with Application Default Credentials
2024/05/16 16:26:30 [my-project:us-central1:my-instance] Connection info added to cache
2024/05/16 16:26:30 [my-project:us-central1:my-instance] Connection info refresh operation started
2024/05/16 16:26:30 [my-project:us-central1:my-instance] Connection info refresh operation complete
2024/05/16 16:26:30 [my-project:us-central1:my-instance] Current certificate expiration = 2024-05-16T22:26:30Z
2024/05/16 16:26:30 [my-project:us-central1:my-instance] Connection info refresh operation scheduled at 2024-05-16T22:22:30Z (now + 56m0s)
2024/05/16 16:26:30 [my-project:us-central1:my-instance] Listening on 127.0.0.1:5432
2024/05/16 16:26:30 The proxy has started successfully and is ready for new connections!
2024/05/16 16:26:30 Connection test started
2024/05/16 16:26:30 [my-project:us-central1:my-instance] Connection info refresh operation started
2024/05/16 16:26:31 [my-project:us-central1:my-instance] Connection info refresh operation complete
2024/05/16 16:26:31 [my-project:us-central1:my-instance] Current certificate expiration = 2024-05-16T22:26:29Z
2024/05/16 16:26:31 [my-project:us-central1:my-instance] Connection info refresh operation scheduled at 2024-05-16T22:22:29Z (now + 56m0s)
2024/05/16 16:26:31 [my-project:us-central1:my-instance] Now = 2024-05-16T21:26:31Z, Current cert expiration = 2024-05-16T22:26:29Z
2024/05/16 16:26:31 [my-project:us-central1:my-instance] Cert is valid = true
2024/05/16 16:26:31 [my-project:us-central1:my-instance] Dialing xx.xx.xx.xx:3307
2024/05/16 16:26:31 Connection test passed
^C2024/05/16 16:26:33 SIGINT signal received. Shutting down...

Notice the connection test passes immediately


Using Manual Access Tokens

./cloud-sql-proxy-v2.10.1 --auto-iam-authn --run-connection-test \
   --token "$(gcloud auth print-access-token)" \
   --login-token "$(gcloud sql generate-login-token)" \
   --debug-logs \
   my-project:us-central1:my-instance
2024/05/16 16:27:01 Authorizing with OAuth2 token
2024/05/16 16:27:01 [my-project:us-central1:my-instance] Connection info added to cache
2024/05/16 16:27:01 [my-project:us-central1:my-instance] Connection info refresh operation started
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Connection info refresh operation complete
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Current certificate expiration = 2024-05-16T22:27:02Z
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Connection info refresh operation scheduled at 2024-05-16T22:23:02Z (now + 56m0s)
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Listening on 127.0.0.1:5432
2024/05/16 16:27:02 The proxy has started successfully and is ready for new connections!
2024/05/16 16:27:02 Connection test started
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Connection info refresh operation started
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Connection info refresh operation complete
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Current certificate expiration = 0001-01-01T00:00:00Z
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Connection info refresh operation scheduled at 2024-05-16T21:27:02Z (now + 0s)
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Connection info refresh operation started
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Now = 2024-05-16T21:27:02Z, Current cert expiration = 0001-01-01T00:00:00Z
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Cert is valid = false
2024/05/16 16:27:02 [my-project:us-central1:my-instance] Refreshing certificate now
2024/05/16 16:27:32 [my-project:us-central1:my-instance] Connection info refresh operation complete
2024/05/16 16:27:32 [my-project:us-central1:my-instance] Current certificate expiration = 0001-01-01T00:00:00Z
2024/05/16 16:27:32 [my-project:us-central1:my-instance] Connection info refresh operation scheduled at 2024-05-16T21:27:32Z (now + 0s)
2024/05/16 16:27:32 [my-project:us-central1:my-instance] Connection info refresh operation started
2024/05/16 16:27:32 [my-project:us-central1:my-instance] Dialing xx.xx.xx.xx:3307
2024/05/16 16:27:32 Connection test passed
^C2024/05/16 16:27:42 SIGINT signal received. Shutting down...
2024/05/16 16:27:42 [my-project:us-central1:my-instance] Connection info refresh operation failed, err = Dial error: context was canceled or expired before refresh completed (connection name = "my-project:us-central1:my-instance")
2024/05/16 16:27:42 [my-project:us-central1:my-instance] Connection info refresh operation scheduled immediately
2024/05/16 16:27:42 [my-project:us-central1:my-instance] Instance is closed, stopping refresh operations

Notice the connection test takes 30s to pass

@Cooksauce Cooksauce added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label May 16, 2024
@enocom
Copy link
Member

enocom commented May 16, 2024

This looks like a user-facing issue due to GoogleCloudPlatform/cloud-sql-go-connector#771.

@enocom enocom added the priority: p2 Moderately-important priority. Fix may not be included in next release. label May 16, 2024
@Cooksauce
Copy link
Author

If I understand that linked issue correctly, it's a caching issue?

Does that still explain the 30s hang happening every time a postgres client connects to the proxy? Even subsequent connects of the same client as the same user to the same running cloud-sql-proxy instance hangs for another 30s.

@enocom
Copy link
Member

enocom commented May 17, 2024

Yes, it has to do with how the internal rate limiter works -- or in this case, not working.

@enocom enocom assigned enocom and unassigned hessjcg May 17, 2024
@enocom
Copy link
Member

enocom commented May 17, 2024

We're seeing this in a separate place too now. I'll get this fixed.

@enocom
Copy link
Member

enocom commented May 20, 2024

This is definitely caused by GoogleCloudPlatform/cloud-sql-go-connector#771. Closing it as a duplicate. I have a fix that I'll put up today.

@enocom enocom closed this as completed May 20, 2024
@enocom
Copy link
Member

enocom commented May 20, 2024

Since we'll need to upgrade cloudsqlconn here, let's use #2224 as the tracking issue.

@Cooksauce
Copy link
Author

Confirmed v2.11.3 fixes this issue

ref: #2212 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants