Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running systemd inside docker arch container hangs or segfaults #3629

Closed
flokli opened this issue Jan 16, 2014 · 25 comments
Closed

running systemd inside docker arch container hangs or segfaults #3629

flokli opened this issue Jan 16, 2014 · 25 comments

Comments

@flokli
Copy link

flokli commented Jan 16, 2014

I tried running the base/arch image in "system container" mode.

However, docker run -i -t base/arch /sbin/init doesn't seem to work like it should.
I detach it (Ctrl-p, Ctrl-q), and with strace I see /sbin/init (which doesn't do anything), however it should normally spawn some other processes (like systemd-journald)

When I run docker run -i -t base/arch /bin/bash, and enter /sbin/init --system, I get the following output:

systemd 208 running in system mode. (+PAM -LIBWRAP -AUDIT -SELINUX -IMA -SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ)
Detected virtualization 'lxc'.
Failed to set hostname to <888e6c612435>: Operation not permitted
Failed to enable kbrequest handling: Operation not permitted
No control group support available, not creating root group.
Cannot add dependency job for unit display-manager.service, ignoring: Unit display-manager.service failed to load: No such file or directory.
Segmentation fault (core dumped)

In the same container (but with strace installed), running

# docker run -i -t 1cff36031b68 /bin/bash
[root@bef7f6801a3d /]# strace -fvv /sbin/init --system

segfaults, leaving the following output: https://gist.github.com/flokli/8456044

Do you have any idea whats wrong here? I'd really like to use docker in system container mode, and according to #223, this should already be possible...

Florian

@s0undt3ch
Copy link

I'm suffering from the same issue...

@codekoala
Copy link

Seeing similar behavior as well.

@mait
Copy link

mait commented Jan 20, 2014

I've tried this at digitalocean.com arch64 vm.

Docker host:
3.8.4-1-ARCH (updated, except kernel)

Docker client:
ubuntu 13.10

I'v used socat with openssl for remote api call. But same result for local docker client via sshing.

http://jpetazzo.github.io/2013/10/20/secure-connection-docker-api/

➜ docker version
Client version: 0.7.6
Go version (client): go1.2
Git commit (client): bc3b2ec
Server version: 0.7.6
Git commit (server): bc3b2ec
Go version (server): go1.2
Last stable version: 0.7.6
➜ docker run -d  base/arch /sbin/init
8809f06d66ad288f84d60e077d811c923059a749ec404938a881e8dc0a083d1c
➜ docker attach  880
[nothing happens here]
➜ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
8809f06d66ad        base/arch:latest    /sbin/init          50 seconds ago      Up 48 seconds                           kickass_brown
➜ docker logs 880
Failed to verify GPT partition /dev/dm-2: Operation not permitted
➜ docker stop 880

➜ docker info
Containers: 1
Images: 6
Driver: devicemapper
 Pool Name: docker-254:0-529154-pool
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 1111.2 Mb
 Data Space Total: 102400.0 Mb
 Metadata Space Used: 1.4 Mb
 Metadata Space Total: 2048.0 Mb
WARNING: No swap limit support
Jan 20 16:12:31 do0 docker[153]: 2014/01/20 16:12:31 POST /v1.8/containers/create
Jan 20 16:12:31 do0 docker[153]: [/var/lib/docker|cbd477d1] +job create()
Jan 20 16:12:31 do0 docker[153]: [/var/lib/docker|cbd477d1] -job create() = OK (0)
Jan 20 16:12:32 do0 docker[153]: 2014/01/20 16:12:32 POST /v1.8/containers/8809f06d66ad288f84d60e077d811c923059a749ec404938a881e8dc0a083d1c/start
Jan 20 16:12:32 do0 docker[153]: [/var/lib/docker|cbd477d1] +job start(8809f06d66ad288f84d60e077d811c923059a749ec404938a881e8dc0a083d1c)
Jan 20 16:12:32 do0 docker[153]: [/var/lib/docker|cbd477d1] -job start(8809f06d66ad288f84d60e077d811c923059a749ec404938a881e8dc0a083d1c) = OK (0)
Jan 20 16:12:41 do0 docker[153]: 2014/01/20 16:12:41 GET /v1.8/containers/json
Jan 20 16:13:12 do0 docker[153]: 2014/01/20 16:13:12 GET /v1.8/containers/880/json
Jan 20 16:13:13 do0 docker[153]: 2014/01/20 16:13:13 POST /v1.8/containers/880/attach?stderr=1&stdout=1&stream=1
Jan 20 16:13:14 do0 docker[153]: 2014/01/20 16:13:14 GET /v1.8/containers/880/json
Jan 20 16:13:21 do0 docker[153]: 2014/01/20 16:13:21 GET /v1.8/containers/json
Jan 20 16:13:36 do0 docker[153]: 2014/01/20 16:13:36 GET /v1.8/containers/880/json
Jan 20 16:13:37 do0 docker[153]: 2014/01/20 16:13:37 POST /v1.8/containers/880/attach?logs=1&stderr=1&stdout=1
Jan 20 16:13:58 do0 docker[153]: 2014/01/20 16:13:58 POST /v1.8/containers/880/stop?t=10
Jan 20 16:13:58 do0 docker[153]: [/var/lib/docker|cbd477d1] +job stop(880)
Jan 20 16:14:00 do0 docker[153]: [error] container.go:468 attach: stderr: write unix @: broken pipe
Jan 20 16:14:00 do0 docker[153]: [error] container.go:499 attach: job 1 returned error write unix @: broken pipe, aborting all jobs
Jan 20 16:14:08 do0 docker[153]: 2014/01/20 16:14:08 Container 8809f06d66ad288f84d60e077d811c923059a749ec404938a881e8dc0a083d1c failed to exit within 10 second
Jan 20 16:14:08 do0 docker[153]: [/var/lib/docker|cbd477d1] -job stop(880) = OK (0)
Jan 20 16:14:12 do0 docker[153]: 2014/01/20 16:14:12 GET /v1.8/containers/json
Jan 20 16:22:11 do0 docker[153]: 2014/01/20 16:22:11 GET /v1.8/info
Jan 20 16:22:11 do0 docker[153]: [/var/lib/docker|cbd477d1] +job info()
Jan 20 16:22:12 do0 docker[153]: [/var/lib/docker|cbd477d1] -job info() = OK (0)

@adonm
Copy link

adonm commented Jan 21, 2014

Same issue here with systemd 208-10

See https://bitbucket.org/dpaw/dpaw_docker/src/4785d502d806bc002bfc1644adb7d5bbcf7f68c3/arch-base/build.sh?at=default for the build script I've been testing (using archlinux's included lxc-create script), no GPT warning but still hit a segfault (will attach a strace when I get a chance).

@ku1ik
Copy link

ku1ik commented Jan 28, 2014

Same problem here.

@chrisruffalo
Copy link

Same problem on Fedora 20 with 208-9.

My initial impression is that it has something to do with the incomplete mount points in /sys/fs/cgroups and /sys/fs/selinux. When I use gdb to run systemd it fails somewhere in the libselinux around this "../sysdeps/x86_64/strlen.S:106". When you search for that error you tend to get a lot of results centered around missing files. I'm willing to bet that means it can't find some SELinux file it's looking for.

My proposed solution would be one of:

  1. Corrected mount points
  2. Fixed Systemd logic so it doesn't error out when it can't find the file
  3. Custom version of Systemd without +SELINUX

Thoughts?

Edit 01: Rebuilding with the --disable-selinux option leads to a segfault too, but at a different point. I had to remove the fsck and fstab related services to move on.

Edit 02: Hm, it looks like something cgroups related, here's the backtrace:

#0 strlen () at ../sysdeps/x86_64/strlen.S:106
#1 0x00007ffff72543fe in __GI___strdup (s=0x0) at strdup.c:41
#2 0x00000000004ca362 in unit_default_cgroup_path (u=0x5b04f0) at src/core/unit.c:2121
#3 0x00000000004583ad in unit_create_cgroups (u=0x5b04f0, mask=(unknown: 0)) at src/core/cgroup.c:392
#4 0x000000000045882e in unit_realize_cgroup_now (u=0x5b04f0) at src/core/cgroup.c:467
#5 0x0000000000458b86 in unit_realize_cgroup (u=0x5b04f0) at src/core/cgroup.c:567
#6 0x00000000004e687f in slice_start (u=0x5b04f0) at src/core/slice.c:200
#7 0x00000000004c7555 in unit_start (u=0x5b04f0) at src/core/unit.c:1253
#8 0x00000000004d0ede in job_run_and_invalidate (j=0x579b60) at src/core/job.c:497
#9 0x000000000040f704 in manager_dispatch_run_queue (source=0x56c930, userdata=0x56c3d0) at src/core/manager.c:1267
#10 0x00000000004bfaff in source_dispatch (s=0x56c930) at src/libsystemd/sd-event/sd-event.c:1825
#11 0x00000000004c0723 in sd_event_run (e=0x56c7e0, timeout=0) at src/libsystemd/sd-event/sd-event.c:2045
#12 0x0000000000411599 in manager_loop (m=0x56c3d0) at src/core/manager.c:1844
#13 0x000000000040aacd in main (argc=2, argv=0x7fffffffed48) at src/core/main.c:1653

And to save some time, here's the relevant code:

    if (unit_has_name(u, SPECIAL_ROOT_SLICE))
            return strdup(u->manager->cgroup_root);

So it looks like to me either a null reference or just a null string. My C/C++ knowledge ends about right here. Maybe someone could take a go at it from here?

Edit 03: I tried it with disabling other systemd compile options and nothing changed. So, back to figuring out how to mount cgroups in /sys/fs I guess...

Edit 04: Final edit, giving up.

I found a collection of items/ideas that lead me to realize a couple things. The first is that the default docker container does not have the capability (SYS_CAP_ADMIN) to mount or unmount things. In newer builds (0.6 and later) there is the "-privileged" option for "docker run" that allows the container more leeway.

From there I found the option "lxc.mount.auto" that should have allowed me to auto mount sys, proc, and cgroups to the contained operating system. Running the following command

docker run -t -i -privileged -lxc-conf="lxc.mount.auto = proc:rw sys:rw cgroup-full:mixed" fedora /bin/bas

Really didn't do any good as it just makes a bunch of errors.

docker run -t -i -privileged -lxc-conf="lxc.mount.auto = proc:rw sys:rw cgroup-full:mixed" fedora /bin/bash
lxc-start: No such file or directory - failed to use 'proc:rw sys:rw cgroup-full:mixed'
lxc-start: failed to setup the mounts for '14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: failed to setup the container
lxc-start: invalid sequence number 1. expected 2
lxc-start: failed to spawn '14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/cpuset/lxc/14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/cpu,cpuacct/lxc/14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/memory/lxc/14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/devices/lxc/14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/freezer/lxc/14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/net_cls/lxc/14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/blkio/lxc/14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/perf_event/lxc/14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/hugetlb/lxc/14975cd2baa2a3d03004f260930cdd88bfd5f24e2c8d062d8ef17f8e64f9436e'
[error] commands.go:2458 Error getting size: bad file descriptor

So... I found some more stuff here: http://blog.docker.io/2013/09/docker-can-now-run-within-docker/

I copied the helper script that he used, or at least parts of it, and I got SELinux and CGROUPS mounted!

But nothing changed. The segfault still happens at the same place. Maybe someone else can figure out what the heck is going on here.

@nekinie
Copy link

nekinie commented Feb 19, 2014

Confirmed same issue on Arch Linux

@hunger
Copy link

hunger commented Mar 5, 2014

Maybe #4450 will help once it is applied.

Systemd runs fine inside a container when using systemd-nspawn, so my guess is that the one inside the docker container is not told that it is actually inside a container and thus tries to do things that do not make sense.

@hunger
Copy link

hunger commented Mar 5, 2014

Yeap, running /usr/lib/systemd/systemd-detect-virt does detect a docker container as "none". So it tries to do the full start. Now... how can I make systemd detect docker?

@hunger
Copy link

hunger commented Mar 6, 2014

Adding --env=container=docker (or lxc) will make systemd recognize that it is inside a container. That stops it from doing some stupid things, but it still core-dumps:-/

@hunger
Copy link

hunger commented May 8, 2014

There is a blog post from somebody that managed to run systemd in docker here: http://rhatdan.wordpress.com/2014/04/30/running-systemd-within-a-docker-container/

Apparently you need --privileged, mount cgroups and then tweak systemd configuration to stop it from bringing up a lot of unnecessary services.

http://lists.freedesktop.org/archives/systemd-devel/2014-May/018998.html

is the first mail in a thread discussing the blog post mentioned above with hints from the systemd people on how the environment expected by systemd looks like. It would rock if docker could implement some of the things suggested there, especially mounting /sys RO (which will stop systemd from starting udev and is also sensible from a security point of view).

@rjnagal
Copy link
Contributor

rjnagal commented May 8, 2014

#5445 mounts /sys as read-only.

On Thu, May 8, 2014 at 1:29 AM, Tobias Hunger notifications@github.comwrote:

There is a blog post from somebody that managed to run systemd in docker
here:
http://rhatdan.wordpress.com/2014/04/30/running-systemd-within-a-docker-container/

Apparently you need --privileged, mount cgroups and then tweak systemd
configuration to stop it from bringing up a lot of unnecessary services.

http://lists.freedesktop.org/archives/systemd-devel/2014-May/018998.html

is the first mail in a thread discussing the blog post mentioned above
with hints from the systemd people on how the environment expected by
systemd looks like. It would rock if docker could implement some of the
things suggested there, especially mounting /sys RO (which will stop
systemd from starting udev and is also sensible from a security point of
view).


Reply to this email directly or view it on GitHubhttps://github.com//issues/3629#issuecomment-42524666
.

@kfox1111
Copy link

kfox1111 commented May 8, 2014

#5445 says it enables rw mounts when --privileged is enabled, but hunger's comment says you need --privileged so #5445 won't fix it.

@rjnagal
Copy link
Contributor

rjnagal commented May 8, 2014

--privileged requires write access to sys and proc. We wouldn't want to do
ro mounts by default.

On Thu, May 8, 2014 at 10:44 AM, kfox1111 notifications@github.com wrote:

#5445 #5445 says it enables rw
mounts when --privileged is enabled, but hunger's comment says you need
--privileged so #5445 https://github.com/dotcloud/docker/pull/5445won't fix it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3629#issuecomment-42581295
.

@hunger
Copy link

hunger commented May 8, 2014

@kfox1111, rjnagal: Apparently (according to blog post) systemd needs CAP_SYS_ADMIN. That is dropped when running without --privileged. Maybe docker could leave that around?

@vmarmol
Copy link
Contributor

vmarmol commented May 9, 2014

@hunger we should be careful about not dropping CAP_SYS_ADMIN, that brings with it a lot of things we probably don't want unprivileged containers to be able to do.

@rjnagal
Copy link
Contributor

rjnagal commented May 9, 2014

If we don't drop CAP_SYS_ADMIN, we are almost a privileged container :)

I think this should be handled at the container option level to drop/add
capabilities. We should keep the defaults for unprivileged containers as
secure as possible.

On Fri, May 9, 2014 at 9:50 AM, Victor Marmol notifications@github.comwrote:

@hunger https://github.com/hunger we should be careful about not
dropping CAP_SYS_ADMIN, that brings with it a lot of things we probably
don't want unprivileged containers to be able to do.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3629#issuecomment-42687879
.

@hunger
Copy link

hunger commented May 11, 2014

@victor: You are right But then it would be nice to be able to keep some
capabilities without getting the rest of the stuff that --privileged does.
I do admit that I am not 100% sure what that actually is, which makes me
all the more uneasy about running containers in privileged mode.

I do e.g. have one container that needs to be privileged because it needs
to initiate port-forwarding. I am pretty sure that one only needs a
capability or two and would be fine otherwise.

On Fri, May 9, 2014 at 6:53 PM, Rohit Jnagal notifications@github.comwrote:

If we don't drop CAP_SYS_ADMIN, we are almost a privileged container :)

I think this should be handled at the container option level to drop/add
capabilities. We should keep the defaults for unprivileged containers as
secure as possible.

On Fri, May 9, 2014 at 9:50 AM, Victor Marmol notifications@github.comwrote:

@hunger https://github.com/hunger we should be careful about not
dropping CAP_SYS_ADMIN, that brings with it a lot of things we probably
don't want unprivileged containers to be able to do.


Reply to this email directly or view it on GitHub<
https://github.com/dotcloud/docker/issues/3629#issuecomment-42687879>
.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3629#issuecomment-42688209
.

@vmarmol
Copy link
Contributor

vmarmol commented May 13, 2014

I don't think we have a good answer today for something in between unpriviledged and priviledged. I think we'd hope to have something since there are many usecases where you only want some privileges. I'm guessing the hard part is how to expose that in the API in a way that makes sense.

Given the prevalence of systemd, we should find a way to make it work though. I know @alexlarsson has been taking a look at that.

@alexlarsson
Copy link
Contributor

Yeah, unprivileged systemd is worked on in: #5773

@cpuguy83
Copy link
Member

Closing as resolved in #6968 and #5773

@offlinehacker
Copy link

This is not really resolved, because this #5773 was reverted in c7d1cb227288fa2174bd601b7214d49955f387e3. I don't know what's going on, i just know that without cgroups and /run as tmpfs systemd can't be started in container, but with these two it can and it works fine.

@cpuguy83 cpuguy83 reopened this Aug 23, 2014
@offlinehacker
Copy link

And here is pull requests that breaks it docker-archive/libcontainer#30

@offlinehacker
Copy link

And this is needed docker-archive/libcontainer#16

@jessfraz
Copy link
Contributor

closing as duplicate of #7395

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests