Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres broken pipe regression #2312

Closed
jfyne opened this issue Jan 19, 2021 · 11 comments
Closed

Postgres broken pipe regression #2312

jfyne opened this issue Jan 19, 2021 · 11 comments

Comments

@jfyne
Copy link

jfyne commented Jan 19, 2021

Describe the bug

I think the bug described here #1599 and fixed by moving to https://github.com/jackc/pgx is back. After upgrading to v1.9 we have started seeing a lot of broken pipe errors. This could be due to the move to https://github.com/gobuffalo/pop ?

Reproducing the bug

Steps to reproduce the behavior:

  1. Wait some period of time for connections to go stale
  2. Make a request to /oauth2/auth/requests/login with the login challenge
  3. Request fails with response: The error is unrecognizable StatusCode:0, underlying error is write: broken pipe

Server logs

level=error msg=An error occurred while handling a request audience=application error=map[message:couldn't start a new transaction: could not create new transaction: write failed: write tcp x.x.x.x:35838->x.x.x.x:5432: write: broken pipe] http_request=map[headers:map[accept:application/json accept-encoding:gzip user-agent:Go-http-client/1.1 x-forwarded-proto:http x-request-id:3e43ac9a-24a2-490b-92ff-20629939a8dc] host:prod-hydra:4445 method:GET path:/oauth2/auth/requests/login query:Value is sensitive and has been redacted. To see the value set config key "log.leak_sensitive_values = true" or environment variable "LOG_LEAK_SENSITIVE_VALUES=true". remote:127.0.0.1:54926 scheme:http] http_response=map[status_code:500] service_name=ORY Hydra service_version=v1.9.0

Server configuration

  - name: hydra-config
    literals:
      - URLS_SELF_ISSUER=https://xxx/
      - URLS_SELF_PUBLIC=https://xxx/
      - URLS_LOGIN=https://xxxlogin/
      - URLS_CONSENT=https://xxx/login/consent
      - URLS_LOGOUT=https://xxx/login/logout
      - URLS_ERROR=http://xxx/login/error
      - SECRETS_SYSTEM=xxx
      - SECRETS_COOKIE=xxx
      - SERVE_PUBLIC_CORS_ENABLED=true
      - SERVE_PUBLIC_CORS_ALLOWED_ORIGINS=https://*.xxx,https://*.xxx
      - SERVE_TLS_ALLOW_TERMINATION_FROM=127.0.0.1/32
      - HYDRA_ADMIN_URL="http://localhost:4445"
      - LOG_LEVEL=warn
      - DSN=postgres://xxx:xxx@xxx/xxx?sslmode=disable

Expected behavior

In v1.8 we did not have any broken pipes. Prior to that pre jackc/pgx we experienced the broken pipe errors.

Environment

  • Version: v1.9.0, git sha hash 7120b4f
  • Environment: oryd/hyda:v1.9.0, on GKE
@jfyne
Copy link
Author

jfyne commented Jan 19, 2021

Is it my DSN?

@jfyne
Copy link
Author

jfyne commented Jan 19, 2021

I have tried adding &max_conn_lifetime=20s into my DSN to see if that alleviates the issue

@zepatrik
Copy link
Member

I checked, since using pop we are still using pgx but the version is newer. I'm not sure if that makes the difference though.

@jfyne
Copy link
Author

jfyne commented Jan 20, 2021

Just to follow up here, adding &max_conn_lifetime=20s fixed our issues.

It is interesting that even though pop uses pgx this started happening again. Anyway i'll leave the fix here and im going to close.

@jfyne jfyne closed this as completed Jan 20, 2021
@aeneasr
Copy link
Member

aeneasr commented Jan 21, 2021

@vinckr can we document this in ory/docs?

@vinckr
Copy link
Member

vinckr commented Jan 21, 2021

@vinckr can we document this in ory/docs?

Sure! You mean in the Hydra docs right?

@aeneasr
Copy link
Member

aeneasr commented Jan 21, 2021

No, somewhere here: https://www.ory.sh/docs/ecosystem/deployment

Maybe under best practices?

@haslersn
Copy link

haslersn commented Feb 1, 2021

I think this should be reopened until the issue is fixed by default. #1599 (comment) argued not to add max_conn_lifetime by default. If that's still the prevailing opinion, then the issue needs to fixed in another way.

@aeneasr
Copy link
Member

aeneasr commented Feb 1, 2021

The issue is fixed per default, if your network is flaky, or PostgreSQL configured to close connections (see linked comment), then you need to adjust your connection string. Setting an arbitrary default value here will probably not solve network issues as you need to choose the correct values for your environment.

We assume a stable connection per default. If your connection is not stable, you need to adjust your settings to improve connection reliability, but you need to do so in a way that reflects your network topology. We can not guess that for you!

@haslersn
Copy link

haslersn commented Feb 1, 2021

Today I had the same problem with broken pipes. I use hydra v1.9 using PostgreSQL 12 (deployed using Zalando postgres-operator), both running on the same bare-metal Kubernetes cluster. Our network is nothing special: 3 nodes, connected on L2, no overlay network.

@jonathan-neufeld-asurion

What is the rational behind the connection timeout of 20s? is it arbitrary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants