New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archiving to S3 stalls on "use of closed network connection" #656
Comments
I've noticed the same S3 error in my |
The root cause might be aws/aws-sdk-go#3406. In my case, it manifests for about 5% of our nightly backups. Here's the wal-g stack trace:
|
Regardless of the root cause, I strongly recommend to set timeout for wal-g commands. It doesn't break wal-g or database state and it resolves any problems with stalling: |
We've run into this a couple times. Any signal on if the aws-sdk-go version bump addresses this, and thoughts on when we might see a full release with that? |
`wal-g wal-push` has a known bug with hanging after file upload to S3[1]; set a rather long timeout on the upload process, so that we don't simply stall forever when archiving WAL segments. [1] wal-g/wal-g#656
`wal-g wal-push` has a known bug with occasionally hanging after file upload to S3[1]; set a rather long timeout on the upload process, so that we don't simply stall forever when archiving WAL segments. [1] wal-g/wal-g#656
`wal-g wal-push` has a known bug with occasionally hanging after file upload to S3[1]; set a rather long timeout on the upload process, so that we don't simply stall forever when archiving WAL segments. [1] wal-g/wal-g#656
@alexmv yeah, I think it's time for the release. Anyway, the current behavior of S3 storage is much better now. |
I can confirm our (I work with @jschaf) AWS S3 issues largely went away after we started using the pre-release which includes the aws-sdk-go version bump. Apologies for not reporting back earlier. @x4m I'd strongly recommend cutting a new release with this update - it dramatically reduced our overall error rate. |
@bunsenmcdubbs and others let's compose release notes here |
It looks like a release happened: https://github.com/wal-g/wal-g/releases/tag/v1.0 Does that mean this is fixed? |
Yeah, thanks for reminding! If anyone is observing something similar - plz reopen or create a new issue. |
wal-g version 0.2.15
postgres version 12.3
Every few weeks, the wal archiving to S3 stalls after a failed archiving and does not resume on its own. Restarting the postgresql service resumes the archiving. Unsure if this is wal-g or a postgres archiver issue. At the time, I was performing some large updates, so the rate of wal file generation was high, with archiving occurring multiple times per second, as opposed to 1-2x/minute.
Using standard archive command:
wal-g wal-push $1
You'll notice in logs that the wal archiving to S3 was working fine and then failed on 2020/05/21 17:22:46. Nothing occurred in the logs until I restarted the service at 2020-05-22 07:29:52. At this point, the pg_wal directory had about 43GB of files ready to push to S3, which it immediately did after restart. Thanks in advance!
The text was updated successfully, but these errors were encountered: