Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker build hangs/crashes when useradd with large UID #5419

Open
mcieslik-mctp opened this issue Apr 26, 2014 · 35 comments
Open

docker build hangs/crashes when useradd with large UID #5419

mcieslik-mctp opened this issue Apr 26, 2014 · 35 comments
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.

Comments

@mcieslik-mctp
Copy link

When I try to add a user during a "docker build ." the process hangs for approx 2-3 min and crashes with a

$ docker build .
Uploading context  5.12 kB
Uploading context 
Step 0 : FROM ubuntu:14.04
 ---> 99ec81b80c55
Step 1 : RUN useradd -u 99900000 -g users mcieslik
 ---> Running in 3ba3c92673fd
2014/04/26 14:58:55 write /var/lib/docker/devicemapper/mnt/.../rootfs/var/log/lastlog: no space left on device

Dockerfile

FROM ubuntu:14.04 
RUN useradd -u 99900000 -g users mcieslik

If I change the above to

FROM ubuntu:14.04 
RUN useradd -u 1001 -g users mcieslik

or run

useradd -u 99900000 -g users mcieslik

in a

docker run -i -t ubuntu:14.04 /bin/bash

everything works fine

@pnasrat
Copy link
Contributor

pnasrat commented Apr 26, 2014

We're probably doing the wrong thing with sparse files which /var/log/lastlog is, it looks like you are using the devicemapper storage backend

Can you update this the output of docker info

I'd also be interested if you can you explain your need for such a large uid?

@mcieslik-mctp
Copy link
Author

Thanks for your prompt answer. These user ids are given to each user (university wide) by our IT overlords. Setting a proper UID is needed to write to NAS mounts.

Containers: 58
Images: 30
Storage Driver: devicemapper
 Pool Name: docker-254:3-12587682-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 14230.2 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 11.4 Mb
 Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.1                                                                                                                                                                                                                   
Kernel Version: 3.14.1-1-ARCH                                                                                                                                                                                                                  
WARNING: No swap limit support   

@pnasrat
Copy link
Contributor

pnasrat commented Apr 26, 2014

As a work around can you try the -l or --no-log-init in the Dockerfile

RUN useradd -l -u 99900000 -g users mcieslik

@mcieslik-mctp
Copy link
Author

Thanks! This worked.

@unclejack
Copy link
Contributor

The underlying issue is that a large sparse file is created (approximately 32 GB), but it's not exactly a Docker bug. Docker could make an attempt to handle large sparse files better, but that's a subject for the #docker-dev mailing list.

I'll close this issue now. Please feel free to comment.

zzak pushed a commit to zzak/mruby_hello_world_cli that referenced this issue Jun 28, 2015
hone added a commit to hone/mruby_hello_world_cli that referenced this issue Jun 28, 2015
Use --no-log-init when creating a user to avoid moby/moby#5419
@rhvgoyal
Copy link
Contributor

So is this an issue with "docker commit" that it can't handle sparse files and blots the file to the full size during commit. If yes, then I guess this is something which should be fixed in docker.

I created a 1G sparse file in a container (with overlay backend) and then did docker commit and top most layer size was 1G. So docker did inflate the file to 1G and that seems very inefficient.

@rhvgoyal
Copy link
Contributor

@unclejack Has this issue been discussed since then any where else? I tested this with latest docker so it has not been fixed yet.

@trntv
Copy link

trntv commented Oct 26, 2015

usermod causes the same problem and it doesn't have --no-log-init
Storage Driver: aufs

@runcom
Copy link
Member

runcom commented Oct 28, 2015

It's happening the same on overlay.. I'm going to reopen this to track this bug and see if Docker can better handle this scenario

@runcom runcom reopened this Oct 28, 2015
@vbatts
Copy link
Contributor

vbatts commented Oct 28, 2015

golang's archive/tar only supports extracting from archives with sparse files, not creating an archive with sparse files.
From my searches, only GNU tar boasts creation of archives with sparse files. Many implementations barely support extracting these archives created by GNU tar

@wontonst
Copy link

wontonst commented Mar 1, 2016

This issue still occurs on docker 1.10.2

Step 7 : RUN echo "start" && echo $(useradd -m -u $uid -g $gid rdsdb) && echo "done"
 ---> Running in d1d07c9f2316
start

done

Hangs after printing done.

Docker info:

Containers: 6
 Running: 0
 Paused: 0
 Stopped: 6
Images: 12
Server Version: 1.10.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 86
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.13.0-79-generic
Operating System: Ubuntu precise (12.04.5 LTS)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.58 GiB
Name: xxx
ID: xxx
WARNING: No swap limit support

@sdwolfz
Copy link

sdwolfz commented Mar 8, 2016

I can also confirm this is still occurring with 1.10.2

I am trying to create the following image:

FROM node:5.7.1

ARG HOST_USER_UID=1000
ARG HOST_USER_GID=1000

RUN DEBIAN_FRONTEND=noninteractive                            && \
                                                                 \
    echo 'Creating notroot user and group from host'          && \
    addgroup --gid $HOST_USER_GID notroot                     && \
    adduser --uid $HOST_USER_UID --gid $HOST_USER_GID notroot && \
                                                                 \
    echo 'Installing testing tools'                           && \
    npm install -g mocha phantomjs testem

USER notroot

WORKDIR /work

EXPOSE 9876

CMD testem --host 0.0.0.0 --port 9876

And I am building with the following command:

docker build -t testem --build-arg HOST_USER_UID=`id -u` --build-arg HOST_USER_GID=`id -g` .

The build freezes after displaying all output from RUN and writes to /var/lib/docker/aufs/diff/a66b81c2371267f6e749ecc667ed9c39dfabdb6f8d767c88d80fe07525ac6ade/var/log/lastlog until the disk is full.

docker info output:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 9
Server Version: 1.10.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 13
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.19.0-51-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.679 GiB
Name: cgi
ID: E4J7:MZX3:WSEP:CEVQ:HGMD:GNQT:IW5L:NJYX:GTUV:6HRV:LB3L:36EK
WARNING: No swap limit support

@sdwolfz
Copy link

sdwolfz commented Mar 8, 2016

I also can confirm that @pnasrat's workaround fixes this problem. I am now using:

groupadd -g $HOST_USER_GID notroot                           && \
useradd -l -u $HOST_USER_UID -g $HOST_USER_GID notroot       && \

instead of:

addgroup --gid $HOST_USER_GID notroot                     && \
adduser --uid $HOST_USER_UID --gid $HOST_USER_GID notroot

And the build succeeded.

My $HOST_USER_UID and $HOST_USER_GID contain 10 digit IDs

@AkihiroSuda AkihiroSuda added the kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. label Nov 29, 2016
@AkihiroSuda
Copy link
Member

linking golang/go#13548 to this issue

@gbraad
Copy link

gbraad commented Nov 29, 2016

Reproducible as an automated build on Docker Hub: https://hub.docker.com/r/gbraad/issue-dockerfile/builds/bsdmngknf8dhfreh8sgfmyu/ :-s

While adduser -l can work around the issue, when using something like Ansible inside the container to configure an environment or some other use of su - or sudo, will cause the same issue.

@justincormack
Copy link
Contributor

Presumably disk quotas will fix the issue with the host running out of space and just leave a runtime error?

pkarashchenko pushed a commit to pkarashchenko/sonic-buildimage that referenced this issue Feb 1, 2017
…user in docker

Note: related to moby/moby#5419

Signed-off-by: Petro Karashchenko <petro.karashchenko@caviumnetworks.com>
luca-digrazia pushed a commit to luca-digrazia/DatasetCommitsDiffSearch that referenced this issue Sep 4, 2022
    This adds the `--no-log-init` flag (`-l`) to the internal `useradd` command used to initial the docker sandbox environment.

    Without this flag, AD/LDAP/SSSD users that have large UID/GID values will be added to `lastlog`/`faillog`, but since docker does not support sparse files, this will cause the docker daemon to attempt to create a `/var/lib/docker/overlay2` entry that may consume all available disk space.

    moby/moby#5419 (comment)

    For one example, my SSSD-assigned uid is `1553201121`, which makes the _sparse_ size of my `lastlog` file 423GB.  If this uid is used by bazel's docker-sandbox, the resulting container attempts to create the full 423GB file, which I confirmed the hard way.

    Closes #13506.

    PiperOrigin-RevId: 379966973
yutongzhang-microsoft added a commit to sonic-net/sonic-mgmt that referenced this issue Oct 26, 2022
Description of PR
In setup-container.sh, when building the docker container, the step "User configuration" will hang around several minutes probability because of the bug [moby/moby#5419] of docker, and the whole script will cost about 20 minutes. Using useradd can work around this bug, but usermod can't. In this pr, we use useradd instead of usermod to work around this bug, and save the build time. Now, the whole script will take around 6 minutes.

What is the motivation for this PR?
In setup-container.sh, when building the docker container, the step "User configuration" will hang around several minutes probability because of the bug [moby/moby#5419] of docker, and the whole script will cost about 20 minutes. Using useradd can work around this bug, but usermod can't. In this pr, we use useradd instead of usermod to work around this bug, and save the build time. Now, the whole script will take around 6 minutes.

How did you do it?
In step "User configuration", when getent passwd {{ USER_NAME }} returns True, first delete the user and use useradd to add the user.

How did you verify/test it?
Setup container with new script and run test cases.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
yejianquan pushed a commit to sonic-net/sonic-mgmt that referenced this issue Oct 27, 2022
Description of PR
In setup-container.sh, when building the docker container, the step "User configuration" will hang around several minutes probability because of the bug [moby/moby#5419] of docker, and the whole script will cost about 20 minutes. Using useradd can work around this bug, but usermod can't. In this pr, we use useradd instead of usermod to work around this bug, and save the build time. Now, the whole script will take around 6 minutes.

What is the motivation for this PR?
In setup-container.sh, when building the docker container, the step "User configuration" will hang around several minutes probability because of the bug [moby/moby#5419] of docker, and the whole script will cost about 20 minutes. Using useradd can work around this bug, but usermod can't. In this pr, we use useradd instead of usermod to work around this bug, and save the build time. Now, the whole script will take around 6 minutes.

How did you do it?
In step "User configuration", when getent passwd {{ USER_NAME }} returns True, first delete the user and use useradd to add the user.

How did you verify/test it?
Setup container with new script and run test cases.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
yejianquan pushed a commit to sonic-net/sonic-mgmt that referenced this issue Oct 27, 2022
Description of PR
In setup-container.sh, when building the docker container, the step "User configuration" will hang around several minutes probability because of the bug [moby/moby#5419] of docker, and the whole script will cost about 20 minutes. Using useradd can work around this bug, but usermod can't. In this pr, we use useradd instead of usermod to work around this bug, and save the build time. Now, the whole script will take around 6 minutes.

What is the motivation for this PR?
In setup-container.sh, when building the docker container, the step "User configuration" will hang around several minutes probability because of the bug [moby/moby#5419] of docker, and the whole script will cost about 20 minutes. Using useradd can work around this bug, but usermod can't. In this pr, we use useradd instead of usermod to work around this bug, and save the build time. Now, the whole script will take around 6 minutes.

How did you do it?
In step "User configuration", when getent passwd {{ USER_NAME }} returns True, first delete the user and use useradd to add the user.

How did you verify/test it?
Setup container with new script and run test cases.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
allen-xf pushed a commit to allen-xf/sonic-mgmt that referenced this issue Oct 28, 2022
Description of PR
In setup-container.sh, when building the docker container, the step "User configuration" will hang around several minutes probability because of the bug [moby/moby#5419] of docker, and the whole script will cost about 20 minutes. Using useradd can work around this bug, but usermod can't. In this pr, we use useradd instead of usermod to work around this bug, and save the build time. Now, the whole script will take around 6 minutes.

What is the motivation for this PR?
In setup-container.sh, when building the docker container, the step "User configuration" will hang around several minutes probability because of the bug [moby/moby#5419] of docker, and the whole script will cost about 20 minutes. Using useradd can work around this bug, but usermod can't. In this pr, we use useradd instead of usermod to work around this bug, and save the build time. Now, the whole script will take around 6 minutes.

How did you do it?
In step "User configuration", when getent passwd {{ USER_NAME }} returns True, first delete the user and use useradd to add the user.

How did you verify/test it?
Setup container with new script and run test cases.

Signed-off-by: Yutong Zhang <yutongzhang@microsoft.com>
threema-danilo added a commit to threema-ch/webrtc-build-docker that referenced this issue Feb 6, 2023
Willymontaz pushed a commit to criteo-forks/spark that referenced this issue May 16, 2023
  * Spark jars should have a dedicated artifact id
  * Use python 3 instead of python 2. This requires to make /usr/bin/python point to python3
  * Add -l option to docker useradd command to prevent a bug using long uid moby/moby#5419
  * Remove \\\ in the maven settings.xml file in the container otherwise the variables criteo.repo.username|password are not correctly used and lead to 401 issues when uploading files to nexus
  * Fix hive dependency. Groupid has to be org.spark-project.hive and artifactid version 1.2.1.spark2
  * use set -e in the build script to fail immediatly when an error occur in any command
  * Spark scala is a profile and cannot be activated with -D java option but rather with -P maven option
  * Use --no-transfer-progress in maven commands to make output readable
  * All mvn commands were using \\ instead of \ leading to bad command interpreation by bash
  * Fix tar command used to build the 'jar-only' tgz
  * Fix mvn jar:jar deploy:deploy parameters which are not exactly the same as the ones of mvn deploy:deploy-file
  * Upgrade pip in the venv otherwise pyarrow cannot install
  * Fix altDeploymentRepository declaration
  * Set pypandoc version to 1.5 as some functions have been removed in newer versions
  * Set back version of build-helper-maven-plugin and change maven-shade-plugin (error during git apply patch)
  * Remove leftover 2.4.3-criteo versions in pom.xml
  * Fix compilation error in sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala
Willymontaz pushed a commit to criteo-forks/spark that referenced this issue May 16, 2023
  * Spark jars should have a dedicated artifact id
  * Use python 3 instead of python 2. This requires to make /usr/bin/python point to python3
  * Add -l option to docker useradd command to prevent a bug using long uid moby/moby#5419
  * Remove \\\ in the maven settings.xml file in the container otherwise the variables criteo.repo.username|password are not correctly used and lead to 401 issues when uploading files to nexus
  * Fix hive dependency. Groupid has to be org.spark-project.hive and artifactid version 1.2.1.spark2
  * use set -e in the build script to fail immediatly when an error occur in any command
  * Spark scala is a profile and cannot be activated with -D java option but rather with -P maven option
  * Use --no-transfer-progress in maven commands to make output readable
  * All mvn commands were using \\ instead of \ leading to bad command interpreation by bash
  * Fix tar command used to build the 'jar-only' tgz
  * Fix mvn jar:jar deploy:deploy parameters which are not exactly the same as the ones of mvn deploy:deploy-file
  * Upgrade pip in the venv otherwise pyarrow cannot install
  * Fix altDeploymentRepository declaration
  * Set pypandoc version to 1.5 as some functions have been removed in newer versions
  * Set back version of build-helper-maven-plugin and change maven-shade-plugin (error during git apply patch)
  * Compilation error in org.apache.spark.sql.internal.SQLConfSuite
Willymontaz pushed a commit to criteo-forks/spark that referenced this issue May 16, 2023
  * Spark jars should have a dedicated artifact id
  * Use python 3 instead of python 2. This requires to make /usr/bin/python point to python3
  * Add -l option to docker useradd command to prevent a bug using long uid moby/moby#5419
  * Remove \\\ in the maven settings.xml file in the container otherwise the variables criteo.repo.username|password are not correctly used and lead to 401 issues when uploading files to nexus
  * Fix hive dependency. Groupid has to be org.spark-project.hive and artifactid version 1.2.1.spark2
  * use set -e in the build script to fail immediatly when an error occur in any command
  * Spark scala is a profile and cannot be activated with -D java option but rather with -P maven option
  * Use --no-transfer-progress in maven commands to make output readable
  * All mvn commands were using \\ instead of \ leading to bad command interpreation by bash
  * Fix tar command used to build the 'jar-only' tgz
  * Fix mvn jar:jar deploy:deploy parameters which are not exactly the same as the ones of mvn deploy:deploy-file
  * Upgrade pip in the venv otherwise pyarrow cannot install
  * Fix altDeploymentRepository declaration
  * Set pypandoc version to 1.5 as some functions have been removed in newer versions
kshramt pushed a commit to kshramt/evidence_based_scheduling that referenced this issue Aug 3, 2023
CodeWithEmad added a commit to CodeWithEmad/tutor that referenced this issue Oct 10, 2023
There is an old bug on docker, which having a large user id will cause hang/crash on build level. --no-log-init added to the useradd command, such that the /var/log/faillog does not take much space.
The upstream issue: moby/moby#5419
regisb pushed a commit to overhangio/tutor that referenced this issue Oct 11, 2023
On macOS, building the "openedx-dev" Docker image resulted in an image that required more than 600 GB of disk space. This was due to the `adduser` command which was called with a user ID of 2x10⁹ (on macOS only). This resulted in a very large /var/log/faillog file, hence the image size.

Related upstream discussion: moby/moby#5419
Close openedx-unsupported/wg-developer-experience#178
moonesque pushed a commit to edSPIRIT/tutor that referenced this issue Nov 20, 2023
On macOS, building the "openedx-dev" Docker image resulted in an image that required more than 600 GB of disk space. This was due to the `adduser` command which was called with a user ID of 2x10⁹ (on macOS only). This resulted in a very large /var/log/faillog file, hence the image size.

Related upstream discussion: moby/moby#5419
Close openedx-unsupported/wg-developer-experience#178
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.
Projects
None yet
Development

No branches or pull requests