Postgres broken pipe regression #2312

jfyne · 2021-01-19T09:58:43Z

Describe the bug

I think the bug described here #1599 and fixed by moving to https://github.com/jackc/pgx is back. After upgrading to v1.9 we have started seeing a lot of broken pipe errors. This could be due to the move to https://github.com/gobuffalo/pop ?

Reproducing the bug

Steps to reproduce the behavior:

Wait some period of time for connections to go stale
Make a request to /oauth2/auth/requests/login with the login challenge
Request fails with response: The error is unrecognizable StatusCode:0, underlying error is write: broken pipe

Server logs

level=error msg=An error occurred while handling a request audience=application error=map[message:couldn't start a new transaction: could not create new transaction: write failed: write tcp x.x.x.x:35838->x.x.x.x:5432: write: broken pipe] http_request=map[headers:map[accept:application/json accept-encoding:gzip user-agent:Go-http-client/1.1 x-forwarded-proto:http x-request-id:3e43ac9a-24a2-490b-92ff-20629939a8dc] host:prod-hydra:4445 method:GET path:/oauth2/auth/requests/login query:Value is sensitive and has been redacted. To see the value set config key "log.leak_sensitive_values = true" or environment variable "LOG_LEAK_SENSITIVE_VALUES=true". remote:127.0.0.1:54926 scheme:http] http_response=map[status_code:500] service_name=ORY Hydra service_version=v1.9.0

Server configuration

  - name: hydra-config
    literals:
      - URLS_SELF_ISSUER=https://xxx/
      - URLS_SELF_PUBLIC=https://xxx/
      - URLS_LOGIN=https://xxxlogin/
      - URLS_CONSENT=https://xxx/login/consent
      - URLS_LOGOUT=https://xxx/login/logout
      - URLS_ERROR=http://xxx/login/error
      - SECRETS_SYSTEM=xxx
      - SECRETS_COOKIE=xxx
      - SERVE_PUBLIC_CORS_ENABLED=true
      - SERVE_PUBLIC_CORS_ALLOWED_ORIGINS=https://*.xxx,https://*.xxx
      - SERVE_TLS_ALLOW_TERMINATION_FROM=127.0.0.1/32
      - HYDRA_ADMIN_URL="http://localhost:4445"
      - LOG_LEVEL=warn
      - DSN=postgres://xxx:xxx@xxx/xxx?sslmode=disable

Expected behavior

In v1.8 we did not have any broken pipes. Prior to that pre jackc/pgx we experienced the broken pipe errors.

Environment

Version: v1.9.0, git sha hash 7120b4f
Environment: oryd/hyda:v1.9.0, on GKE

The text was updated successfully, but these errors were encountered:

jfyne · 2021-01-19T10:17:07Z

Is it my DSN?

jfyne · 2021-01-19T10:54:02Z

I have tried adding &max_conn_lifetime=20s into my DSN to see if that alleviates the issue

zepatrik · 2021-01-19T13:12:32Z

I checked, since using pop we are still using pgx but the version is newer. I'm not sure if that makes the difference though.

jfyne · 2021-01-20T10:23:28Z

Just to follow up here, adding &max_conn_lifetime=20s fixed our issues.

It is interesting that even though pop uses pgx this started happening again. Anyway i'll leave the fix here and im going to close.

aeneasr · 2021-01-21T09:02:18Z

@vinckr can we document this in ory/docs?

vinckr · 2021-01-21T10:17:54Z

@vinckr can we document this in ory/docs?

Sure! You mean in the Hydra docs right?

aeneasr · 2021-01-21T10:33:09Z

No, somewhere here: https://www.ory.sh/docs/ecosystem/deployment

Maybe under best practices?

haslersn · 2021-02-01T20:58:31Z

I think this should be reopened until the issue is fixed by default. #1599 (comment) argued not to add max_conn_lifetime by default. If that's still the prevailing opinion, then the issue needs to fixed in another way.

aeneasr · 2021-02-01T22:43:17Z

The issue is fixed per default, if your network is flaky, or PostgreSQL configured to close connections (see linked comment), then you need to adjust your connection string. Setting an arbitrary default value here will probably not solve network issues as you need to choose the correct values for your environment.

We assume a stable connection per default. If your connection is not stable, you need to adjust your settings to improve connection reliability, but you need to do so in a way that reflects your network topology. We can not guess that for you!

haslersn · 2021-02-01T23:43:31Z

Today I had the same problem with broken pipes. I use hydra v1.9 using PostgreSQL 12 (deployed using Zalando postgres-operator), both running on the same bare-metal Kubernetes cluster. Our network is nothing special: 3 nodes, connected on L2, no overlay network.

jonathan-neufeld-asurion · 2021-03-03T23:10:20Z

What is the rational behind the connection timeout of 20s? is it arbitrary?

jfyne closed this as completed Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres broken pipe regression #2312

Postgres broken pipe regression #2312

jfyne commented Jan 19, 2021

jfyne commented Jan 19, 2021

jfyne commented Jan 19, 2021

zepatrik commented Jan 19, 2021

jfyne commented Jan 20, 2021

aeneasr commented Jan 21, 2021

vinckr commented Jan 21, 2021

aeneasr commented Jan 21, 2021

haslersn commented Feb 1, 2021

aeneasr commented Feb 1, 2021

haslersn commented Feb 1, 2021

jonathan-neufeld-asurion commented Mar 3, 2021

Postgres broken pipe regression #2312

Postgres broken pipe regression #2312

Comments

jfyne commented Jan 19, 2021

jfyne commented Jan 19, 2021

jfyne commented Jan 19, 2021

zepatrik commented Jan 19, 2021

jfyne commented Jan 20, 2021

aeneasr commented Jan 21, 2021

vinckr commented Jan 21, 2021

aeneasr commented Jan 21, 2021

haslersn commented Feb 1, 2021

aeneasr commented Feb 1, 2021

haslersn commented Feb 1, 2021

jonathan-neufeld-asurion commented Mar 3, 2021