Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harbor-core failing to push images #15401

Closed
chrism417 opened this issue Aug 9, 2021 · 11 comments
Closed

Harbor-core failing to push images #15401

chrism417 opened this issue Aug 9, 2021 · 11 comments

Comments

@chrism417
Copy link

chrism417 commented Aug 9, 2021

Hello,
After a few days, apps can no longer pull images from Harbor version 2.2.1. The apps see the following in the logs:

Failed to pull image rpc error: code = Unknown desc = failed to pull and unpack image failed to resolve reference failed to authorize: failed to fetch anonymous token: unexpected status: 500 Internal Server Error

In harbor-core, we see the following:

2021-08-09T13:32:30Z [ERROR] [/lib/orm/orm.go:72]: begin transaction failed: write tcp 100.96.28.147:59496->10.144.60.29:5432: write: connection reset by peer 2021-08-09T13:32:30Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /service/token request in transaction failed: write tcp 100.96.28.147:59496-\u003e10.144.60.29:5432: write: connection reset by peer"}]} 2021-08-09T13:33:01Z [ERROR] [/lib/orm/orm.go:72]: begin transaction failed: write tcp 100.96.28.147:59496->10.144.60.29:5432: write: connection reset by peer 2021-08-09T13:33:01Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /service/token request in transaction failed: write tcp 100.96.28.147:59496-\u003e10.144.60.29:5432: write: connection reset by peer"}]} 2021-08-09T13:33:23Z [ERROR] [/lib/orm/orm.go:72]: begin transaction failed: write tcp 100.96.28.147:59496->10.144.60.29:5432: write: connection reset by peer 2021-08-09T13:33:23Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /service/token request in transaction failed: write tcp 100.96.28.147:59496-\u003e10.144.60.29:5432: write: connection reset by peer"}]} 2021-08-09T13:33:51Z [ERROR] [/lib/orm/orm.go:72]: begin transaction failed: write tcp 100.96.28.147:59496->10.144.60.29:5432: write: connection reset by peer 2021-08-09T13:33:51Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /service/token request in transaction failed: write tcp 100.96.28.147:59496-\u003e10.144.60.29:5432: write: connection reset by peer"}]} 2021-08-09T13:34:03Z [ERROR] [/lib/orm/orm.go:72]: begin transaction failed: write tcp 100.96.28.147:59496->10.144.60.29:5432: write: connection reset by peer 2021-08-09T13:34:03Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /service/token request in transaction failed: write tcp 100.96.28.147:59496-\u003e10.144.60.29:5432: write: connection reset by peer"}]} 2021-08-09T13:34:40Z [ERROR] [/lib/orm/orm.go:72]: begin transaction failed: write tcp 100.96.28.147:59496->10.144.60.29:5432: write: connection reset by peer 2021-08-09T13:34:40Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /service/token request in transaction failed: write tcp 100.96.28.147:59496-\u003e10.144.60.29:5432: write: connection reset by peer"}]}

The only resolution is to restart harbor-core, but this causes outages in production when pods are scaled in.

@wy65701436
Copy link
Contributor

wy65701436 commented Aug 10, 2021

can you share the configuration of postgresql? @heww it seems that the DB connection exceends the limit.

@chrism417
Copy link
Author

can you share the configuration of postgresql? @heww it seem that the DB connection exceends the limit.

It's running in Aurora postgres and has a connection limit of 2400. Looking at the logs over the last 7 days, db connections never exceeded 100.

@chrism417
Copy link
Author

chrism417 commented Aug 10, 2021

Hello,
We're seeing this issue again. I was able to catch more logs:

2021-08-10 13:58:35 | 2021-08-10T17:58:35.122774305Z stderr F 2021/08/10 17:58:35 http: proxy error: context canceled
-- | --
  |   | 2021-08-10 13:58:35 | 2021-08-10T17:58:35.122217858Z stderr F 2021/08/10 17:58:35 http: proxy error: context canceled
  |   | 2021-08-10 13:58:35 | 2021-08-10T17:58:35.117721763Z stderr F 2021/08/10 17:58:35 http: proxy error: context canceled
  |   | 2021-08-10 13:58:35 | 2021-08-10T17:58:35.115851969Z stderr F 2021/08/10 17:58:35 http: proxy error: context canceled
  |   | 2021-08-10 13:58:35 | 2021-08-10T17:58:35.106725147Z stderr F 2021/08/10 17:58:35 http: proxy error: context canceled
  |   | 2021-08-10 13:58:35 | 2021-08-10T17:58:35.111729528Z stderr F 2021/08/10 17:58:35 http: proxy error: context canceled
  |   | 2021-08-10 13:58:35 | 2021-08-10T17:58:35.111555266Z stderr F 2021/08/10 17:58:35 http: proxy error: context canceled
  |   | 2021-08-10 13:58:35 | 2021-08-10T17:58:35.107397416Z stderr F 2021/08/10 17:58:35 http: proxy error: context canceled
  |   | 2021-08-10 13:58:35 | 2021-08-10T17:58:35.107234275Z stderr F 2021/08/10 17:58:35 http: proxy error: context canceled
  |   | 2021-08-10 13:58:35 | 2021-08-10T17:58:35.034803987Z stdout F 2021-08-10T17:58:35Z [WARNING] [/common/rbac/project/evaluator.go:80]: Failed to get info of project 34 for permission evaluator, error: write tcp 100.96.218.144:59212->10.152.69.94:5432: write: broken pipe
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.386245223Z stdout F 2021-08-10T17:58:33Z [INFO] [/pkg/notifier/notifier.go:204]: Handle notification with Handler 'ArtifactWebhook' on topic 'PULL_ARTIFACT': ID-5140, Repository-phishlabs/crawler-core Tags-[] Digest-sha256:3c1335a2df557410bae40de16b30fe2b13800a52b9b30412caeadd0b407b4620 Operator- OccurAt-2021-08-10 17:58:33
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.385676556Z stdout F 2021-08-10T17:58:33Z [INFO] [/pkg/notifier/notifier.go:204]: Handle notification with Handler 'AuditLog' on topic 'PULL_ARTIFACT': ID-5140, Repository-phishlabs/crawler-core Tags-[] Digest-sha256:3c1335a2df557410bae40de16b30fe2b13800a52b9b30412caeadd0b407b4620 Operator- OccurAt-2021-08-10 17:58:33
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.382261534Z stdout F 2021-08-10T17:58:33Z [INFO] [/pkg/notifier/notifier.go:204]: Handle notification with Handler 'InternalArtifact' on topic 'PULL_ARTIFACT': ID-5140, Repository-phishlabs/crawler-core Tags-[] Digest-sha256:3c1335a2df557410bae40de16b30fe2b13800a52b9b30412caeadd0b407b4620 Operator- OccurAt-2021-08-10 17:58:33
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.159188205Z stderr F 2021/08/10 17:58:33 http: proxy error: context canceled
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.142703702Z stderr F 2021/08/10 17:58:33 http: proxy error: context canceled
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.141811061Z stderr F 2021/08/10 17:58:33 http: proxy error: context canceled
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.141450336Z stderr F 2021/08/10 17:58:33 http: proxy error: context canceled
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.140840979Z stderr F 2021/08/10 17:58:33 http: proxy error: context canceled
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.141955153Z stderr F 2021/08/10 17:58:33 http: proxy error: context canceled
  |   | 2021-08-10 13:58:33 | 2021-08-10T17:58:33.14167527Z stderr F 2021/08/10 17:58:33 http: proxy error: context canceled

@heww
Copy link
Contributor

heww commented Aug 11, 2021

This PR #15211 may fix the write: broken pipe problem, could you try to restart the harbor-core to check the problem is gone? See #14856 for more info.

@chrism417
Copy link
Author

Hi @hew and @wy65701436 we upgraded to 2.3.1 but we're still seeing the issue. Last night we got:

2021-08-27T01:04:38Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: write tcp 100.101.1.126:60046-\u003e10.152.69.94:5432: write: connection timed out"}]}

19
2021-08-27T01:05:04Z [ERROR] [/lib/config/userconfig.go:43]: failed to load config, error write tcp 100.101.1.126:60114->10.152.69.94:5432: write: connection timed out

18
2021-08-27T01:05:04Z [WARNING] [/server/middleware/security/security.go:56][requestID="fd9bf241-ed72-4364-bdf3-d5c832abdcb6"]: failed to get auth mode: write tcp 100.101.1.126:60114->10.152.69.94:5432: write: connection timed out

@chrism417
Copy link
Author

chrism417 commented Aug 27, 2021

Hey all, we're actually seeing this... which seems to start the issue:

2021-08-26 21:06:34 | 2021-08-27T01:06:34.278369907Z stderr F 	/usr/local/go/src/os/signal/signal.go:151 +0x45
-- | --
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278358447Z stderr F created by os/signal.Notify.func1.1
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278242484Z stderr F 	/usr/local/go/src/os/signal/signal_unix.go:23 +0x45
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278234434Z stderr F os/signal.loop()
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278223434Z stderr F 	/usr/local/go/src/os/signal/signal.go:245 +0x1a9
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278066111Z stderr F os/signal.process(0x12289f0, 0x17d08f8)
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278058121Z stderr F goroutine 82 [running]:
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.2780465Z stderr F
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.277986379Z stderr F panic: send on closed channel
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270311425Z stderr F 	/build/cmd/vault-env/main.go:314 +0x122e
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270305465Z stderr F main.main()
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270297985Z stderr F 	/go/pkg/mod/emperror.dev/errors@v0.8.0/wrap_go1_13.go:26
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270289065Z stderr F emperror.dev/errors.As(...)
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270281285Z stderr F 	/usr/local/go/src/errors/wrap.go:87 +0x533
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270274104Z stderr F errors.As(0x12208c0, 0xc000648060, 0x1018800, 0xc0006480a0, 0xc000139c70)
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.24059686Z stderr F 2021/08/27 01:06:34.240260 [INFO] (runner) received finish
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240558219Z stderr F 2021-08-27T01:06:34.240Z [INFO]  template.server: template server stopped
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.145045272Z stdout F 100.101.1.223 - - [27/Aug/2021:01:06:34 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.20"
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270265724Z stderr F goroutine 1 [running]:
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240052039Z stderr F 2021-08-27T01:06:34.239Z [INFO]  auth.handler: auth handler stopped
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270254314Z stderr F
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240030358Z stderr F 2021-08-27T01:06:34.239Z [INFO]  auth.handler: shutdown triggered, stopping lifetime watcher
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270193033Z stderr F panic: errors: *target must be interface or implement error
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.262265083Z stderr F time="2021-08-27T01:06:34Z" level=info msg="received signal: terminated" app=vault-env
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240022958Z stderr F 2021/08/27 01:06:34.239921 [INFO] (runner) stopping
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240013078Z stderr F 2021-08-27T01:06:34.239Z [INFO]  sinks finished, exiting
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.239990057Z stderr F 2021-08-27T01:06:34.239Z [INFO]  sink.server: sink server stopped
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.239936796Z stdout F ==> Vault agent shutdown triggered


@heww
Copy link
Contributor

heww commented Sep 8, 2021

Hi @hew and @wy65701436 we upgraded to 2.3.1 but we're still seeing the issue. Last night we got:

2021-08-27T01:04:38Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: write tcp 100.101.1.126:60046-\u003e10.152.69.94:5432: write: connection timed out"}]}

19
2021-08-27T01:05:04Z [ERROR] [/lib/config/userconfig.go:43]: failed to load config, error write tcp 100.101.1.126:60114->10.152.69.94:5432: write: connection timed out

18
2021-08-27T01:05:04Z [WARNING] [/server/middleware/security/security.go:56][requestID="fd9bf241-ed72-4364-bdf3-d5c832abdcb6"]: failed to get auth mode: write tcp 100.101.1.126:60114->10.152.69.94:5432: write: connection timed out

According to the log of the harbor 2.3.1, looks like that there is a network issue between the harbor core and db, please check your network.

@wy65701436
Copy link
Contributor

Hey all, we're actually seeing this... which seems to start the issue:

2021-08-26 21:06:34 | 2021-08-27T01:06:34.278369907Z stderr F 	/usr/local/go/src/os/signal/signal.go:151 +0x45
-- | --
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278358447Z stderr F created by os/signal.Notify.func1.1
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278242484Z stderr F 	/usr/local/go/src/os/signal/signal_unix.go:23 +0x45
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278234434Z stderr F os/signal.loop()
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278223434Z stderr F 	/usr/local/go/src/os/signal/signal.go:245 +0x1a9
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278066111Z stderr F os/signal.process(0x12289f0, 0x17d08f8)
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.278058121Z stderr F goroutine 82 [running]:
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.2780465Z stderr F
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.277986379Z stderr F panic: send on closed channel
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270311425Z stderr F 	/build/cmd/vault-env/main.go:314 +0x122e
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270305465Z stderr F main.main()
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270297985Z stderr F 	/go/pkg/mod/emperror.dev/errors@v0.8.0/wrap_go1_13.go:26
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270289065Z stderr F emperror.dev/errors.As(...)
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270281285Z stderr F 	/usr/local/go/src/errors/wrap.go:87 +0x533
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270274104Z stderr F errors.As(0x12208c0, 0xc000648060, 0x1018800, 0xc0006480a0, 0xc000139c70)
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.24059686Z stderr F 2021/08/27 01:06:34.240260 [INFO] (runner) received finish
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240558219Z stderr F 2021-08-27T01:06:34.240Z [INFO]  template.server: template server stopped
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.145045272Z stdout F 100.101.1.223 - - [27/Aug/2021:01:06:34 +0000] "GET / HTTP/1.1" 200 0 "" "kube-probe/1.20"
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270265724Z stderr F goroutine 1 [running]:
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240052039Z stderr F 2021-08-27T01:06:34.239Z [INFO]  auth.handler: auth handler stopped
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270254314Z stderr F
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240030358Z stderr F 2021-08-27T01:06:34.239Z [INFO]  auth.handler: shutdown triggered, stopping lifetime watcher
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.270193033Z stderr F panic: errors: *target must be interface or implement error
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.262265083Z stderr F time="2021-08-27T01:06:34Z" level=info msg="received signal: terminated" app=vault-env
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240022958Z stderr F 2021/08/27 01:06:34.239921 [INFO] (runner) stopping
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.240013078Z stderr F 2021-08-27T01:06:34.239Z [INFO]  sinks finished, exiting
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.239990057Z stderr F 2021-08-27T01:06:34.239Z [INFO]  sink.server: sink server stopped
  |   | 2021-08-26 21:06:34 | 2021-08-27T01:06:34.239936796Z stdout F ==> Vault agent shutdown triggered

what log is this? Is it the infrastructure that Harbor is running on?

@Jeremy-boo
Copy link

Jeremy-boo commented Mar 17, 2022

This problem should be caused by the settings on the pg side, whether your pg is a single instance or has other components such as pg-pool, etc. You can refer to this issue: #14943.

You can also enter the pg container or use the netstat -natl command on the host to view the current pg connection queue

@github-actions
Copy link

github-actions bot commented Jul 5, 2022

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Jul 5, 2022
@github-actions
Copy link

github-actions bot commented Aug 5, 2022

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.

@github-actions github-actions bot closed this as completed Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants