Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaniko is stucking on copying root #960

Open
kvaps opened this issue Jan 8, 2020 · 10 comments · May be fixed by #2592
Open

Kaniko is stucking on copying root #960

kvaps opened this issue Jan 8, 2020 · 10 comments · May be fixed by #2592
Labels
area/dockerfile-command For all bugs related to dockerfile file commands area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) categorized cmd/copy differs-from-docker issue/hang issue/rootfs kind/bug Something isn't working priority/p1 Basic need feature compatibility with docker build. we should be working on this next. work-around-available works-with-docker

Comments

@kvaps
Copy link
Contributor

kvaps commented Jan 8, 2020

Actual behavior
Kaniko is sucking forever when trying to save / (root) of previous stage

INFO[0007] Taking snapshot of full filesystem...        
INFO[0007] No files were changed, appending empty layer to config. No layer added to image. 
INFO[0007] Saving file / for later use

Expected behavior
Kaniko will copy / to the specified directory.

To Reproduce

mkdir -p /tmp/kaniko-bug
cd /tmp/kaniko-bug/

cat > Dockerfile <<\EOT
FROM alpine:3.11 as rootfs
RUN echo 7777

FROM alpine:3.11
COPY --from=rootfs / /sysroot/
EOT

docker run -ti --rm -v $PWD:/workspace gcr.io/kaniko-project/executor:v0.15.0 --dockerfile=Dockerfile --no-push

Additional Information

  • Docker image

    provided above

  • Strace log

    strace: Process 53577 attached with 17 threads
    [pid 53624] futex(0xc0004699c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53623] futex(0xc0004a52c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53622] epoll_pwait(4,  
    [pid 53621] futex(0xc0004a4bc8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53620] futex(0xc0001164c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53619] futex(0xc0004804c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53618] futex(0xc0003b4148, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53617] futex(0xc0004a4148, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53616] futex(0xc000480148, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53615] futex(0x3119a80, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53614] futex(0xc0001364c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53613] restart_syscall(<... resuming interrupted read ...> 
    [pid 53612] futex(0xc000116148, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53611] futex(0xc00006a848, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53610] restart_syscall(<... resuming interrupted read ...> 
    [pid 53609] restart_syscall(<... resuming interrupted read ...> 
    [pid 53577] futex(0x30fcc48, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53613] <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
    [pid 53613] futex(0x30fc190, FUTEX_WAKE_PRIVATE, 1) = 1
    [pid 53609] <... restart_syscall resumed>) = 0
    [pid 53613] futex(0xc0004a4bc8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53621] <... futex resumed>)        = 0
    [pid 53613] <... futex resumed>)        = 1
    [pid 53621] nanosleep({tv_sec=0, tv_nsec=3000},  
    [pid 53613] futex(0x3100a60, FUTEX_WAIT_PRIVATE, 0, {tv_sec=9, tv_nsec=999426204} 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53621] <... nanosleep resumed>NULL) = 0
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53621] futex(0xc0004a4bc8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53609] futex(0x30fc190, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} 
    [pid 53613] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed out)
    [pid 53613] futex(0x30fc190, FUTEX_WAKE_PRIVATE, 1) = 1
    [pid 53609] <... futex resumed>)        = 0
    [pid 53613] futex(0xc0004a4bc8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53621] <... futex resumed>)        = 0
    [pid 53613] <... futex resumed>)        = 1
    [pid 53621] nanosleep({tv_sec=0, tv_nsec=3000},  
    [pid 53613] futex(0x3100a60, FUTEX_WAIT_PRIVATE, 0, {tv_sec=9, tv_nsec=999530745} 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53621] <... nanosleep resumed>NULL) = 0
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53621] futex(0xc0004a4bc8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53609] futex(0x30fc190, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} 
    [pid 53610] <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
    [pid 53610] futex(0x30fc190, FUTEX_WAKE_PRIVATE, 1) = 1
    [pid 53609] <... futex resumed>)        = 0
    [pid 53610] futex(0xc0004a4bc8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53609] futex(0xc00006a848, FUTEX_WAKE_PRIVATE, 1 
    [pid 53610] <... futex resumed>)        = 1
    [pid 53621] <... futex resumed>)        = 0
    [pid 53611] <... futex resumed>)        = 0
    [pid 53609] <... futex resumed>)        = 1
    [pid 53621] futex(0xc0004a4bc8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53611] futex(0x30fc180, FUTEX_WAKE_PRIVATE, 1 
    [pid 53610] futex(0x30fc180, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=100000} 
    [pid 53611] <... futex resumed>)        = 0
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53611] futex(0xc00006a848, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53610] <... futex resumed>)        = -1 EAGAIN (Resource temporarily unavailable)
    [pid 53610] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53610] futex(0xc00006a848, FUTEX_WAKE_PRIVATE, 1 
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53611] <... futex resumed>)        = 0
    [pid 53610] <... futex resumed>)        = 1
    [pid 53611] futex(0xc0004a4bc8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53621] <... futex resumed>)        = 0
    [pid 53611] <... futex resumed>)        = 1
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53621] futex(0xc0001364c8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53611] futex(0xc0003b4148, FUTEX_WAKE_PRIVATE, 1 
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53621] <... futex resumed>)        = 1
    [pid 53618] <... futex resumed>)        = 0
    [pid 53614] <... futex resumed>)        = 0
    [pid 53611] <... futex resumed>)        = 1
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53614] futex(0x30fcc48, FUTEX_WAKE_PRIVATE, 1 
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53614] <... futex resumed>)        = 1
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53611] futex(0x31009e0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=29, tv_nsec=999501958} 
    [pid 53577] <... futex resumed>)        = 0
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53577] futex(0xc000116148, FUTEX_WAKE_PRIVATE, 1 
    [pid 53612] <... futex resumed>)        = 0
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53577] <... futex resumed>)        = 1
    [pid 53612] futex(0xc0004a4148, FUTEX_WAKE_PRIVATE, 1 
    [pid 53617] <... futex resumed>)        = 0
    [pid 53612] <... futex resumed>)        = 1
    [pid 53617] futex(0xc0004699c8, FUTEX_WAKE_PRIVATE, 1) = 1
    [pid 53624] <... futex resumed>)        = 0
    [pid 53621] sched_yield( 
    [pid 53617] sched_yield()               = 0
    [pid 53624] futex(0xc0004699c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53621] <... sched_yield resumed>)  = 0
    [pid 53618] futex(0xc0004699c8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53617] futex(0x3102840, FUTEX_WAKE_PRIVATE, 1 
    [pid 53614] futex(0xc0001364c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53612] futex(0xc000116148, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53610] futex(0xc00006a4c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53624] <... futex resumed>)        = -1 EAGAIN (Resource temporarily unavailable)
    [pid 53621] futex(0x3102840, FUTEX_WAKE_PRIVATE, 1 
    [pid 53618] <... futex resumed>)        = 0
    [pid 53617] <... futex resumed>)        = 0
    [pid 53624] futex(0xc0001364c8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53621] <... futex resumed>)        = 0
    [pid 53618] futex(0xc0003b4148, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53577] futex(0x30fcc48, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53624] <... futex resumed>)        = 1
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53624] futex(0x30fc1a8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=100000} 
    [pid 53621] futex(0xc0004a4bc8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53617] futex(0xc0004a4148, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53614] <... futex resumed>)        = 0
    [pid 53614] futex(0x30fc1a8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53624] <... futex resumed>)        = 0
    [pid 53614] <... futex resumed>)        = 1
    [pid 53624] sched_yield( 
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53624] <... sched_yield resumed>)  = 0
    [pid 53614] futex(0xc0001364c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53624] futex(0x30fc090, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 53624] epoll_pwait(4,  
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53624] <... epoll_pwait resumed>[], 128, 0, NULL, 824641757128) = 0
    [pid 53609] futex(0x30fc190, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} 
    [pid 53624] futex(0x30fc190, FUTEX_WAKE_PRIVATE, 1 
    [pid 53609] <... futex resumed>)        = -1 EAGAIN (Resource temporarily unavailable)
    [pid 53624] <... futex resumed>)        = 0
    [pid 53624] futex(0xc0001364c8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53609] sched_yield( 
    [pid 53624] <... futex resumed>)        = 1
    [pid 53614] <... futex resumed>)        = 0
    [pid 53609] <... sched_yield resumed>)  = 0
    [pid 53624] futex(0x30fc1a8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=100000} 
    [pid 53614] futex(0x30fc1a8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53624] <... futex resumed>)        = -1 EAGAIN (Resource temporarily unavailable)
    [pid 53609] futex(0x30fc090, FUTEX_WAIT_PRIVATE, 2, NULL 
    [pid 53624] sched_yield( 
    [pid 53614] <... futex resumed>)        = 0
    [pid 53624] <... sched_yield resumed>)  = 0
    [pid 53609] <... futex resumed>)        = -1 EAGAIN (Resource temporarily unavailable)
    [pid 53624] futex(0x30fc090, FUTEX_WAKE_PRIVATE, 1 
    [pid 53614] nanosleep({tv_sec=0, tv_nsec=3000},  
    [pid 53624] <... futex resumed>)        = 0
    [pid 53609] futex(0x30fc090, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53614] <... nanosleep resumed>NULL) = 0
    [pid 53614] futex(0xc0001364c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53624] futex(0xc0004699c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53609] futex(0x30fc190, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} 
    [pid 53613] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed out)
    [pid 53613] futex(0x30fc190, FUTEX_WAKE_PRIVATE, 1) = 1
    [pid 53609] <... futex resumed>)        = 0
    [pid 53613] futex(0xc0004699c8, FUTEX_WAKE_PRIVATE, 1) = 1
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53624] <... futex resumed>)        = 0
    [pid 53613] futex(0xc0001364c8, FUTEX_WAKE_PRIVATE, 1 
    [pid 53624] futex(0xc0004a4148, FUTEX_WAKE_PRIVATE, 1 
    [pid 53613] <... futex resumed>)        = 1
    [pid 53614] <... futex resumed>)        = 0
    [pid 53613] futex(0x3100a60, FUTEX_WAIT_PRIVATE, 0, {tv_sec=9, tv_nsec=999325757} 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53624] <... futex resumed>)        = 1
    [pid 53617] <... futex resumed>)        = 0
    [pid 53614] futex(0xc0001364c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53617] futex(0xc0004a4148, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53609] nanosleep({tv_sec=0, tv_nsec=20000},  
    [pid 53624] futex(0xc0004699c8, FUTEX_WAIT_PRIVATE, 0, NULL 
    [pid 53609] <... nanosleep resumed>NULL) = 0
    [pid 53609] futex(0x30fc190, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}^C
    
  • Kaniko Image (fully qualified with digest):

    gcr.io/kaniko-project/executor:v0.15.0
    gcr.io/kaniko-project/executor@sha256:630f263d9123266b9f5420d9bc130e6a79306dbc312e5a7d35d922df391192bb
    

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@kvaps
Copy link
Contributor Author

kvaps commented Jan 9, 2020

This is my ugly workaround for this:

FROM alpine:3.11 as rootfs
RUN echo 7777

# Workaround https://github.com/GoogleContainerTools/kaniko/issues/960
RUN ROOTDIRS=$(find / -maxdepth 1 -mindepth 1 \( -type d -o -type l \)  ! -name builds ! -name busybox ! -name dev ! -name etc ! -name kaniko ! -name proc ! -name sys ! -name tmp ! -name var ! -name workspace) \
 && mkdir -p /rootfs/dev /rootfs/proc /rootfs/run /rootfs/sys /rootfs/tmp \
 && cp -ax /etc/ /var /rootfs \
 && rm -rf /rootfs/var/run \
 && ln -s ../run/ /rootfs/var/run \
 && mv $ROOTDIRS /rootfs/


FROM alpine:3.11
COPY --from=rootfs /rootfs/ /sysroot/

@kvaps
Copy link
Contributor Author

kvaps commented Jan 9, 2020

/area multi-stage builds
/kind bug

@cvgw cvgw added area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) kind/bug Something isn't working labels Jan 10, 2020
@cvgw
Copy link
Contributor

cvgw commented Jan 10, 2020

Unfortunately I think this behavior is expected. There are directories at / (such as /kaniko) that are "special"/"reserved". I'm not sure if there is a better work around than you've suggested

@cvgw cvgw added priority/p3 agreed that this would be good to have, but no one is available at the moment. work-around-available labels Jan 10, 2020
@kvaps
Copy link
Contributor Author

kvaps commented Jan 10, 2020

But docker and buildkit is working fine with this

@cvgw
Copy link
Contributor

cvgw commented Jan 10, 2020

But docker and buildkit is working fine with this

Right, this is specific to the way that kaniko is implemented.

@invokermain
Copy link

For what its worth I've run into this when running a COPY command in my Dockerfile using environment variables that don't exist.

e.g. COPY --from=builder $PYSETUP_PATH $PYSETUP_PATH will hang on Saving file . for later use if $PYSETUP_PATH is not defined/default. I guess it might be trying to do COPY --from=builder . . which obviously doesn't make sense.

@pmhahn
Copy link

pmhahn commented May 2, 2022

I'm using debootstrap to build build a base image using the following Dockerfile:

FROM debian:bullseye-slim AS builder
RUN apt-get -qq update && apt-get -q install --assume-yes debootstrap findutils
RUN debootstrap --no-merged-usr --variant='minbase'  stable /work http://deb.debian.org/
FROM scratch
COPY --from=builder /work /

which stalls when kaniko copies the content of /dev/console instead of handling it as a special file. The same Dockerfile works fine with docker.
I have created otiai10/copy#78 to implement handling special files with otiai10/copy which is used by kaniko for copying.

@pmhahn
Copy link

pmhahn commented Jun 4, 2022

Following the hint from otiai10/copy#78 Skip could be used to at least not copy the content, e.g. something like this:

opt := Options{
	Skip: func(src string) (bool, error) {
		stat, err := os.Stat(src)
		if err != nil {
			return nil, err
		}
		return stat.mode & (os.ModeDevice | os.ModeNamedPipe | os.ModeSocket) == 0, nil
	},
}
err := Copy("your/directory", "your/directory.copy", opt)

PS: I'm no Go programmer, so Syntax may be wrong.

@aaron-prindle aaron-prindle linked a pull request Jun 20, 2023 that will close this issue
4 tasks
@aaron-prindle aaron-prindle added issue/hang cmd/copy area/dockerfile-command For all bugs related to dockerfile file commands priority/p2 High impact feature/bug. Will get a lot of users happy and removed priority/p3 agreed that this would be good to have, but no one is available at the moment. labels Jun 20, 2023
@aaron-prindle aaron-prindle added works-with-docker differs-from-docker priority/p1 Basic need feature compatibility with docker build. we should be working on this next. issue/rootfs categorized and removed priority/p2 High impact feature/bug. Will get a lot of users happy labels Jun 20, 2023
@aaron-prindle
Copy link
Collaborator

aaron-prindle commented Jul 12, 2023

It seems a fix was added to otiai related to this specific issue where it's defaults were changed + functionality added to handle special files, PR here otiai10/copy#84

When I attempt the repro Dockerfile suggested above though, I am still seeing a Kaniko build failure despite using an otiai version with that fix PR:

INFO[0010] Pushing image to gcr.io/aprindle-test-cluster/kaniko-test/cache:900ada9315de8b51c19436ce83cf56ade4e49ffb0d88ad4385093856925b5423 
I: Target architecture can be executed
I: Retrieving InRelease 
I: Retrieving Release 
E: Failed getting release file http://deb.debian.org/dists/stable/Release
error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1

keeping this open for now

@lc-guy
Copy link

lc-guy commented Sep 15, 2023

Still hitting this issue.

Considering kaniko doesn't support the --squash command-line argument to reduce an image to a single layer, copying the entire rootfs is the only way to achieve that goal, and this bug makes it wholly impossible, sadly.

My use case is that I'm stripping down an existing very large image to remove cruft I don't need, but of course it'll just stack on more layers as you remove the files, so flattening the image is needed afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dockerfile-command For all bugs related to dockerfile file commands area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) categorized cmd/copy differs-from-docker issue/hang issue/rootfs kind/bug Something isn't working priority/p1 Basic need feature compatibility with docker build. we should be working on this next. work-around-available works-with-docker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants