Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"failed to copy: io: read/write on closed pipe" on ctr images push while pushing large images #7972

Closed
akhilerm opened this issue Jan 17, 2023 · 20 comments · Fixed by #7985
Closed
Labels

Comments

@akhilerm
Copy link
Member

Description

While pushing large container image to a registry using

ctr images push ghcr.io/akhilerm/private-testing:ci container-registry.oracle.com/database/express:21.3.0-xe

hitting the following error

failed to copy: io: read/write on closed pipe

Notes:

Steps to reproduce the issue

  1. sudo ctr content fetch --all-platforms container-registry.oracle.com/database/express:21.3.0-xe
  2. ctr images push ghcr.io/akhilerm/testing-gha:io-failure container-registry.oracle.com/database/express:21.3.0-xe
$ ctr images ls
REF                                                      TYPE                                                      DIGEST                                                                  SIZE      PLATFORMS                                                                       LABELS
container-registry.oracle.com/database/express:21.3.0-xe application/vnd.docker.distribution.manifest.v2+json      sha256:016d1a2becd9c9b9bfb683eebf3aa092527fe1354ace5b23691e75759f301bed 3.3 GiB   linux/amd64                                                                     -

Reproduced using this image which is 3.3GiB in size.

I will update more info into the issue, if this can be reproduced easily; as currently testing requires uploading the 3GiB image and can take a lot of time.

Describe the results you received and expected

Expected the image to be successfully pushed to the registry

What version of containerd are you using?

412ca49

Any other relevant information

$ runc --version
runc version 1.1.4
commit: v1.1.4-0-g5fd4c4d
spec: 1.0.2-dev
go: go1.18.9
libseccomp: 2.5.3

$ uname -a
Linux ams-hz-ubu-055 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Show configuration if it is related to CRI plugin.

$ cat /etc/containerd/config.toml
#   Copyright 2018-2022 Docker Inc.

#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at

#       http://www.apache.org/licenses/LICENSE-2.0

#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.

#disabled_plugins = ["cri"]

#root = "/var/lib/containerd"
#state = "/run/containerd"
#subreaper = true
#oom_score = 0

#[grpc]
#  address = "/run/containerd/containerd.sock"
#  uid = 0
#  gid = 0

#[debug]
#  address = "/run/containerd/debug.sock"
#  uid = 0
#  gid = 0
#  level = "info"
@crazy-max
Copy link

crazy-max commented Jan 17, 2023

Similar to docker/build-push-action#761. Looks like there are some issues with GitHub Registry.

@shawaj
Copy link

shawaj commented Jan 17, 2023

I'm having the same issue.

Error: buildx failed with: ERROR: failed to solve: failed to push ghcr.io/nebraltd/hm-diag:f3bc9e2: failed to copy: io: read/write on closed pipe

@gpanagiotidis
Copy link

gpanagiotidis commented Jan 18, 2023

I am having the same issue pushing to eu.gcr.io.

I noticed it is happening when there are 2 or more tags, meaning pushing to 2 or more registries at the same time.

I am using buildx and this is happening in our gitlab runners, running on kubernetes which nodes use containerd.

@crazy-max
Copy link

@akhilerm Could this be linked to #6995?

@akhilerm
Copy link
Member Author

@crazy-max I guess so. But the error "failed to copy" is used only in 2([1], [2]) places, out of which one was changed by #6995. Still not able to identify why it may occur randomly for large images

@SaTu07
Copy link

SaTu07 commented Jan 19, 2023

Had this issue a couple of times today. Rerun 3 times and finally success.

@corinz
Copy link

corinz commented Jan 20, 2023

Same here. Any solutions?

@ahasna
Copy link

ahasna commented Jan 21, 2023

Having the same issue as well. Rerunning solves the issue eventually but it's not ideal and it breaks parts of our CI that relies on an image being pushed.

@gpanagiotidis
Copy link

Same here. Any solutions?

Workaround that seems to be working for me is use an older buildkit image version.

@beriberikix
Copy link

I'm using Docker Setup Buildx and having this issue too. But I don't think it's changed buildkit versions since October. Could it be something else?

@crazy-max
Copy link

crazy-max commented Jan 21, 2023

I'm using Docker Setup Buildx and having this issue too. But I don't think it's changed buildkit versions since October. Could it be something else?

No, BuildKit 0.10.6 uses containerd v1.6.3 and #6995 change appears first in containerd v1.6.9 and BuildKit 0.11 uses containerd v1.6.14 which contains this change.

@kreimben
Copy link

same here

@sameeraksc
Copy link

same here. any solution ?

@snowskeleton
Copy link

same issue

@jedevc
Copy link
Contributor

jedevc commented Jan 24, 2023

A fix in currently in progress: #7985

As far as I can tell there is no easy workaround, except to temporarily downgrade containerd (or buildkit, if you're following from that issue).

@decipher27
Copy link

decipher27 commented Jan 24, 2023

We too facing the same error:

#22 ERROR: failed to push ghcr.io/atlanhq/atlas-master:latest: failed to copy: io: read/write on closed pipe
------
 > exporting to image:
------
ERROR: failed to solve: failed to push ghcr.io/atlanhq/atlas-master:latest: failed to copy: io: read/write on closed pipe
Error: buildx failed with: ERROR: failed to solve: failed to push ghcr.io/atlanhq/atlas-master:latest: failed to copy: io: read/write on closed pipe

Tried retying the job but still we get the same.

@twistedpair
Copy link

Same issue for our repos.

However, it's intermittent. Rerunning will eventually get through.

@joshparallel
Copy link

@dmcgowan @jedevc Could you please clarify whether #7985 will fix the issue entirely, or if it will just report better errors (but errors will still occur)? Also very curious to know what the timeline is for this fix making it to GHCR, if anyone reading this can provide any info on that... Thanks!

@jedevc
Copy link
Contributor

jedevc commented Jan 25, 2023

#7985 should resolve the issues entirely.

The issue is a client-side fix - no registry-side changes on GHCR will be necessary.

scito added a commit to scito/extract_otp_secrets that referenced this issue Jan 25, 2023
failed to copy: io: read/write on closed pipe

- use DockerHub instead of dhcr.io
- ref: containerd/containerd#7972
@AkihiroSuda
Copy link
Member

The fix is now picked into BuildKit v0.11.2 / Buildx v0.10.1 🎉
https://github.com/moby/buildkit/releases/tag/v0.11.2
https://github.com/docker/buildx/releases/tag/v0.10.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.