Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker build hangs indefinitely exporting layers. #213

Closed
mwfriedm opened this issue Mar 28, 2022 · 32 comments
Closed

docker build hangs indefinitely exporting layers. #213

mwfriedm opened this issue Mar 28, 2022 · 32 comments
Assignees
Labels
bug Something isn't working

Comments

@mwfriedm
Copy link

mwfriedm commented Mar 28, 2022

Describe the bug
This is a duplicate of microsoft/hcsshim#696, but I was asked to open a new issue here.

To Reproduce
Dockerfile:

FROM mcr.microsoft.com/windows/servercore:ltsc2019
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
RUN Invoke-WebRequest -Uri "https://github.com/msys2/msys2-installer/releases/download/2021-07-25/msys2-base-x86_64-20210725.sfx.exe" -OutFile msys2.exe; \
  .\msys2.exe -y -oC:\ ; \
  Remove-Item msys2.exe; \
  C:\msys64\usr\bin\bash.exe -lc ' '; \
  C:\msys64\usr\bin\bash.exe -lc 'pacman --noconfirm -Syuu'; \
  C:\msys64\usr\bin\bash.exe -lc 'pacman --noconfirm -Syuu gcc git make zip'; \
  C:\msys64\usr\bin\bash.exe -lc 'pacman -Scc --noconfirm'; \
  Get-Date

docker build

... output ...
Database directory: /var/lib/pacman/
:: Do you want to remove unused repositories? [Y/n]
removing unused sync repositories...
Monday, March 28, 2022 10:02:58 AM

At this point, the RUN command has completed successfully. Docker engine takes a few minutes to write out the file system layers, and the child process (i.e. docker-windows-write-layer) exits.

dockerd.exe is now hung, and a stale hcsNNNNNNNNNN folder is left in C:\ProgramData\Docker\tmp (or wherever you have the dockerd data directory configured).

Expected behavior
The docker build command completes successfully.

Configuration:

  • Edition: Windows 10 Enterprise Version 1909 (OS Build 18363.2158)
  • Base Image being used: Windows Server Core ltsc 2019
  • Container engine: docker
  • Container Engine version 20.10.11

Additional context

docker.exe info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Build with BuildKit (Docker Inc., v0.6.1-docker)
  compose: Docker Compose (Docker Inc., v2.0.0-rc.2)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Server Version: 20.10.11
 Storage Driver: windowsfilter
  Windows:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 18363 (18362.1.amd64fre.19h1_release.190318-1202)
 Operating System: Windows 10 Enterprise Version 1909 (OS Build 18363.2158)
 OSType: windows
 Architecture: x86_64
@mwfriedm mwfriedm added the bug Something isn't working label Mar 28, 2022
@ghost ghost added the triage New and needs attention label Mar 28, 2022
@cwilhit
Copy link
Contributor

cwilhit commented Apr 4, 2022

Opening MSFT internal 38827308 to track.

@cwilhit cwilhit added Fundamentals and removed triage New and needs attention labels Apr 4, 2022
@mwfriedm
Copy link
Author

No updates, just commenting so the bot doesn't close this. I can confirm that this is still an issue.

@janschiefer
Copy link

Also happens on Linux VM with newest docker and BuildKit enabled. "Classic" docker-compose wit BuildKit disabled works fine...


Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
compose: Docker Compose (Docker Inc., v2.5.0)
scan: Docker Scan (Docker Inc., v0.17.0)

Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 98
Server Version: 20.10.16
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
runc version: v1.1.1-0-g52de29d
init version: de40ad0
Security Options:
seccomp
Profile: default
userns
Kernel Version: 4.15.0
Operating System: Ubuntu 18.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 12GiB
Name: h2830616.stratoserver.net
ID: EQQT:UVDC:2JCY:3QCL:53YM:ESV2:RKS3:XGE7:UWEY:CAW2:W6OY:TDNL
Docker Root Dir: /var/lib/docker/310000.310000
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled


@ghost
Copy link

ghost commented Jun 18, 2022

This issue has been open for 30 days with no updates.
@mwfriedm, please provide an update or close this issue.

@mwfriedm
Copy link
Author

No updates, just bumping so that the bot doesn't close this. I can confirm that this is still an issue.

@jheaff1
Copy link

jheaff1 commented Jul 9, 2022

This is an issue for me also, again when installing MSYS2 in a docker container

@kiashok
Copy link

kiashok commented Jul 21, 2022

Has anyone seen this issue happening on nanoserver base images? Or is it happening only on servercore?

@ghost
Copy link

ghost commented Aug 21, 2022

This issue has been open for 30 days with no updates.
@mwfriedm, please provide an update or close this issue.

@mwfriedm
Copy link
Author

No updates, just commenting so the bot doesn't close this. I can confirm that this is still an issue.

@schaveyt
Copy link

schaveyt commented Sep 9, 2022

I am also a victim of this bug

@fady-azmy-msft
Copy link
Contributor

No updates to share, we are still looking into this.

@mwfriedm
Copy link
Author

Confirming fresh repro with slightly newer version numbers.

$> docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.9.1)
  compose: Docker Compose (Docker Inc., v2.10.2)
  extension: Manages Docker extensions (Docker Inc., v0.2.9)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.19.0)

Server:
 Server Version: 20.10.17
 Storage Driver: windowsfilter
  Windows:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 19042 (19041.1.amd64fre.vb_release.191206-1406)
 Operating System: Windows 10 Enterprise Version 2009 (OS Build 19042.2006)

@ghost
Copy link

ghost commented Oct 24, 2022

This issue has been open for 30 days with no updates.
@mwfriedm, please provide an update or close this issue.

@mwfriedm
Copy link
Author

No updates, just commenting so the bot doesn't close this. I can confirm that this is still an issue.

@fady-azmy-msft
Copy link
Contributor

@kevpar has a PR our that should fix this issue microsoft/go-winio#261

@ghost
Copy link

ghost commented Dec 2, 2022

This issue has been open for 30 days with no updates.
@kevpar, @mwfriedm, please provide an update or close this issue.

@mwfriedm
Copy link
Author

No updates, just bumping so that the bot doesn't close this. I can confirm that this is still an issue.

@JieGenius
Copy link

JieGenius commented Mar 3, 2023

I suffered from a similar problem, and I solved this, I hope the details that I offered can help you.
The Dockerfile which leads to hanging definitely as follows:

FROM pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel
RUN apt install -y openssh-server
RUN sed -i "s/UsePAM yes/UsePAM no/" /etc/ssh/sshd_config

CMD ["/usr/sbin/sshd", "-D"]

This docker file will lead to hanging because the command of /usr/sbin/sshd require setting a password for the root user first.

There are two methods that can solve this problem. The first is deleting the CMD ["/usr/sbin/sshd", "-D"], and the second is to set a password for the root user, like this:

FROM pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel
RUN apt install -y openssh-server
RUN sed -i "s/UsePAM yes/UsePAM no/" /etc/ssh/sshd_config

RUN echo "root:123456" | chpasswd
CMD ["/usr/sbin/sshd", "-D"]

In my previous understanding, the CMD in dockerfile only runs when a container starts, but now I guess the CMD in dockerfile maybe executes during image exploration.

I hope this information is helpful to you.

@JieGenius
Copy link

JieGenius commented Mar 3, 2023

moby/moby#5419
This page has a similar problem with "Hangs indefinitely exporting layers."

@TBBle
Copy link

TBBle commented Mar 21, 2023

Those last two comments are probably unrelated, as this issue appears to be happening during cleanup of a temporary folder used when exporting a WCOW layer, and they are both Linux containers.

@TBBle
Copy link

TBBle commented Mar 26, 2023

For anyone tracking this issue here only, based on research in microsoft/hcsshim#696 this appears to be an msys/cygwin specific bug, and is most-commonly triggered when pacman updates itself, e.g., during msys2 install.

A trivial workaround for

RUN choco install -y msys2

or similar RUN command that installs msys or explicitly runs pacman -Syuu is:

RUN choco install -y msys2 && del /f /s /q "C:\$Recycle.Bin"

This should not affect the resulting container, as C:\$Recycle.Bin is already excluded from layer export to avoid this same bug earlier in the process.

@mwfriedm
Copy link
Author

A viable workaround! The nicest birthday present this 1 year old issue could have asked for :-)

I can confirm that the workaround works in my original production scenario.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@kevpar, @mwfriedm, @Juarezhm, please provide an update or close this issue.

1 similar comment
@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@kevpar, @mwfriedm, @Juarezhm, please provide an update or close this issue.

@TBBle
Copy link

TBBle commented May 6, 2023

A proposed Go-level fix works for me in a build of dockerd.exe. Assuming this fix does land for Go 1.21, I don't know if it'll then migrate back to older Go versions, or if we will be waiting on Docker and containerd to migrate to Go 1.21.

Update: It landed in Go master branch, so unless it's reverted, it'll be in Go 1.21. The initial feedback regarding backporting is that it's probably too intrusive, but perhaps changing the hang into a returned error would be doable instead.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@kevpar, @mwfriedm, @Juarezhm, please provide an update or close this issue.

3 similar comments
@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@kevpar, @mwfriedm, @Juarezhm, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@kevpar, @mwfriedm, @Juarezhm, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@kevpar, @mwfriedm, @Juarezhm, please provide an update or close this issue.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@kevpar, @mwfriedm, @Juarezhm, @NAWhitehead, please provide an update or close this issue.

@TBBle
Copy link

TBBle commented Aug 27, 2023

The fix has indeed landed in Go 1.21.0.

@fady-azmy-msft
Copy link
Contributor

Thank you for sharing the update @TBBle! Closing this because the fix is now in Go 1.21.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests