Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capset() might randomly fail with -EPERM #4556

Closed
farcaller opened this issue Mar 10, 2014 · 63 comments · Fixed by #6083
Closed

capset() might randomly fail with -EPERM #4556

farcaller opened this issue Mar 10, 2014 · 63 comments · Fixed by #6083
Milestone

Comments

@farcaller
Copy link
Contributor

Given how you don't like me opening bugs against docker running on ARM I test the stuff on x86_64 now 😄

Freshly built 0.9.0 sometimes fails to start a container with: "finalize namespace drop capabilities operation not permitted".

Containers: 4
Images: 64
Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 72
Debug mode (server): true
Debug mode (client): false
Fds: 26
Goroutines: 30
Execution Driver: native-0.1
EventsListeners: 0
Kernel Version: 3.13.6-1-VF
Init SHA1: cfb0f0d26cdabf83f312543e21f8a529253bd4e6
Init Path: /usr/lib/docker/dockerinit
WARNING: No swap limit support
@crosbymichael
Copy link
Contributor

Ok, can you tell us more about this setup?

@farcaller
Copy link
Contributor Author

x86_64 box is a dual-core kvm running archlinux with a custom kernel, root is btrfs with aufs docker driver on top (switched back to aufs a few days ago). Got this in 3 out of 100 runs.

arm box is a quad-core running archlinux-arm and same kernel (well, similar kernel, same base config though). root is btrfs as well. Got this problem in 14 out of 100 runs.

@farcaller
Copy link
Contributor Author

Fails here: https://github.com/torvalds/linux/blob/master/kernel/capability.c#L250, pid==1, task_pid_vnr(current)==7.

@crosbymichael
Copy link
Contributor

ping @creack

@tianon
Copy link
Member

tianon commented Mar 11, 2014

Am I reading you right that you're running aufs on top of btrfs? See #2961 and friends.

@farcaller
Copy link
Contributor Author

Ah, I knew there's something wrong with aufs working on top of btrfs. Switched back to btrfs driver, the bug is still relevant.

@farcaller
Copy link
Contributor Author

A quick update. This is still an issue on 3.14-rc6

@djmaze
Copy link
Contributor

djmaze commented Mar 20, 2014

Similar problem here, also with 0.9.0 and pure btrfs (i.e. driver and filesystem) on Arch Linux. Docker has been freshly upgraded from 0.8.0. This is on an ARM machine with a custom 3.8.13 kernel though, so feel free to ignore this comment ;-)

For me the error occurs when building a container. It intermittently fails at different RUN lines (with the message mentioned in the first comment).

@alexbers
Copy link

alexbers commented Apr 4, 2014

Same problem with vfs driver. Kernel version is:
Linux Cubian 3.4.79-sun7i+ #5 SMP PREEMPT Thu Apr 3 19:33:57 YEKT 2014 armv7l GNU/Linux

@nikmol
Copy link

nikmol commented Apr 28, 2014

Did you manage to resolve this problem?
We see the same problem in our system (Freescale, i.MX6, running without AUFS).
Just upgraded to Docker 0.10

@farcaller
Copy link
Contributor Author

I haven't managed to make a simple test case to ask around LKML, so no, no progress there. One of my test boards is also i.MX6, FWIW.

@nikmol
Copy link

nikmol commented Apr 29, 2014

We tried to force it to use LXC instead of libcontainer (docker -d -e LXC),
then I think we managed to get further.
On Apr 29, 2014 1:50 AM, "Vladimir Pouzanov" notifications@github.com
wrote:

I haven't managed to make a simple test case to ask around LKML, so no, no
progress there. One of my test boards is also i.MX6, FWIW.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4556#issuecomment-41653874
.

@vieux
Copy link
Contributor

vieux commented May 20, 2014

Do you know something about that @vmarmol ?

@nikmol
Copy link

nikmol commented May 20, 2014

We managed to run docker containers in our i.MX6 environment now, both with
LXC and libcontainer.

On Mon, May 19, 2014 at 5:04 PM, Victor Vieux notifications@github.comwrote:

Do you know something about that @vmarmol https://github.com/vmarmol ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/4556#issuecomment-43572957
.

@vieux
Copy link
Contributor

vieux commented May 20, 2014

Does that mean we can fix the issue ?
Can you tell us what you did to manage to run docker ?

@nikmol
Copy link

nikmol commented May 20, 2014

To build it I used the direction for the archLinux,
github.com/archlinuxarm/PKGBUILDs/tree/master/community/docker (I only
have 0.11.0 right now).
But there are some stuff that needs to also be enabled in the kernel
(CGROUP, NAMESPACES, NETFILTER, NAT etc.).
I used the file docker/contrib/check-config.sh to see what I needed to
enable in the kernel.
For the cgroup hierarchy, I used the link that was provided in
check-config.sh to create a script to setup the hierarchy.

On Mon, May 19, 2014 at 5:25 PM, Victor Vieux notifications@github.comwrote:

Does that mean we can fix the issue ?
Can you tell us what you did to manage to run docker ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/4556#issuecomment-43574234
.

@vieux
Copy link
Contributor

vieux commented May 22, 2014

@nikmol the issues is fixed or not ?

@nikmol
Copy link

nikmol commented May 22, 2014

When we run the mount script described in:
https://github.com/tianon/cgroupfs-mount
Then we don't see this problem.

On Thu, May 22, 2014 at 3:51 PM, Victor Vieux notifications@github.comwrote:

@nikmol https://github.com/nikmol the issues is fixed or not ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/4556#issuecomment-43953581
.

@farcaller
Copy link
Contributor Author

Can this be reopened? My test machine was dead so I got to testing this only now and it's still broken:

[root@archdock ~]# docker run --rm cellofellow/rpi-arch /bin/true
2014/06/02 13:56:31 finalize namespace drop capabilities operation not permitted
[root@archdock ~]# uname -a
Linux archdock 3.14.4-1-ARCH #1 SMP PREEMPT Sun May 18 17:15:51 MDT 2014 armv7l GNU/Linux
[root@archdock ~]# docker info
Containers: 0
Images: 1
Storage Driver: btrfs
Execution Driver: native-0.2
Kernel Version: 3.14.4-1-ARCH
Debug mode (server): true
Debug mode (client): false
Fds: 11
Goroutines: 16
EventsListeners: 0
Init SHA1: 0c10788f1ec8bfd17e8c5feb7534210f26709045
Init Path: /usr/lib/docker/dockerinit

@farcaller
Copy link
Contributor Author

It's actually much more reproducible now 😞

$ for i in `seq 100`; do docker run --rm cellofellow/rpi-arch /bin/true ; cnt=$(($cnt+$?)); done
...
$ echo $cnt
41

@farcaller
Copy link
Contributor Author

Also, it fails the same way with lxc driver, but somewhat more rarely (23 failures out of 100 runs)

@farcaller
Copy link
Contributor Author

Seems to also happen on i386: jpetazzo/dockvpn#7

@eddelbuettel
Copy link

Same here. I am running Paul's Debian package as a 'backport' I made onto Ubuntu 14.04 running on an i386 box. Works fine but I get this error every now and then too.

@crosbymichael
Copy link
Contributor

@nikmol are you running on ARM or 32bit, is this an official docker build?

@nikmol
Copy link

nikmol commented Jun 23, 2014

@crosbymichael this is on an ARM (32bit).

@farcaller
Copy link
Contributor Author

It seems that this is now broken even with --privileged:

% docker run --privileged -t bors
2014/06/25 07:32:27 finalize namespace drop capabilities operation not permitted

@EvanKrall
Copy link
Contributor

I also see this roughly half the times I docker run on an ARM board (odroid u3).

@nikmol
Copy link

nikmol commented Sep 11, 2014

EvanKrall, which version are you using?

@EvanKrall
Copy link
Contributor

I'm running Ubuntu 14.04 on that board, so https://launchpad.net/ubuntu/trusty/armhf/docker.io/1.0.1~dfsg1-0ubuntu1~ubuntu0.14.04.1

@nikmol
Copy link

nikmol commented Sep 11, 2014

Looks like it's version 1.0.1?
Have you tried version 1.2.0?
I have just updated on my board (ARM) to 1.2.0.
I haven't had time to run any containers yet, but I'll do that soon.

@EvanKrall
Copy link
Contributor

Haven't tried 1.2.0 yet; may try that later tonight.

@djmaze
Copy link
Contributor

djmaze commented Sep 11, 2014

As a warning for those trying out 1.2.0: At least the Arch Linux ARM version currently does not work correctly on btrfs. It is missing a compatibility patch at the moment, so I had to revert to 1.0.

@farcaller
Copy link
Contributor Author

Docker defaults to device mapper I think, which is available mostly everywhere.

@nikmol
Copy link

nikmol commented Sep 12, 2014

djmaze, have you tried to change the file daemon/graphdriver/driver.go with the patch that was for version 1.0.0?

@djmaze
Copy link
Contributor

djmaze commented Sep 13, 2014

@nikmol: Yes, already tried that. The "wrong filesystem" error remained. Will need to have a closer look sometime. Which is difficult, because I need to keep that machine (U3) running.

@farcaller: I've had abysmal experiences with devicemapper on ARM. (The according bug #3280 is still open.) AUFS support was not available in the kernel, so I chose to go with btrfs. And AFAIK, btrfs and AUFS offer much better performance for CoW-based filesystems.

@errordeveloper
Copy link
Contributor

I think nobody will reopen this as the discussion goes pretty wild at glance. We need to submit an new issue with precise indication to what is the problem and easy instructions to reproduce. Ideally this should involve an "official" binary and some mainstream distro with vanilla kernel running on x86.

@nikmol
Copy link

nikmol commented Oct 6, 2014

I added AUFS to our kernel and now we haven't seen any problem (plus it's much faster when starting from images and also removing containers)
We did a test during the weekend where we started 5 containers, verified that that where running, waiting 5 seconds, then took them down.
We started around 5000 containers (5 per time) without any problems.

@cschmittiey
Copy link

I'm having the same issue trying to launch lgierth/meshbox on Fedora 21 server. Any ideas on what may be causing it?

@umiddelb
Copy link

I've complied docker 1.4.0 for ARMv7 from source. This bug seems to be fixed there.

@calmera
Copy link

calmera commented Jan 24, 2015

@umiddelb I would be very interested in the binary that you built ;) Anything special you ran into while building on ARMv7? Currently running 1.3.0 here

@umiddelb
Copy link

@calmera You can download the binary here. It's linked statically, so it runs on ubuntu as well as on fedora.

I've written down the way how I build docker an ARMv7 here. I took the patched docker sources from resin.io, but rebased the Dockerfile back to the original ubuntu build environment (resin.io's sources are made for raspberry pi, they have to take debian).

You may find my docker sources here.

@calmera
Copy link

calmera commented Jan 24, 2015

@umiddelb Awesome! thnx!

@umiddelb
Copy link

umiddelb commented Feb 1, 2015

@calmera Two days ago, the docker developers removed the 'fuse' which explicitly requires the amd64 platform and integrated most of the patches making docker 32 bit safe. You can build docker for ARMv7 with the original sources now. The only thing you still need is a slightly modified Dockerfile for the armhf/ARMv7 platform:

git clone https://github.com/docker/docker.git
cd docker
curl -L https://github.com/umiddelb/armhf/raw/master/Dockerfile.armv7 > Dockerfile
make build
make binary
...

@eddelbuettel
Copy link

Wonderful news. Paging @paultag :)

@thaJeztah
Copy link
Member

@umiddelb perhaps that Dockerfile is something to add to docker/contrib? Also, docker 1.5 will allow you to specify a path to the Dockerfile to use, via the -f option (see #9707). If that file was included in contrib, the process could even be simplified more by having "make" select a different Dockerfile when doing an armv7 build.

Not sure if that is something the maintainers would approve on, but it's worth a try (I don't think it would hinder the regular builds?)

@umiddelb
Copy link

umiddelb commented Feb 1, 2015

@thaJeztah Thank you for hint. I'll file a pull request, but then this option should be made available for the Makefile (environment variable or command line parameter to make build && make binary). Nevertheless the docker developers should have an idea how to make docker platform-aware (i.e. extensions to the Dokerfile syntax), since amd64 isn't the only one anymore.

@thaJeztah
Copy link
Member

@umiddelb yes, in the long run, Docker needs to become platform-aware (also, think of Windows). For now, I think this should still be regarded "experimental", hence my suggestion for inclusion in "contrib". People should be made aware that, at this moment, it's not officially supported

Again, I don't know what the maintainers think of this, but it might be a nice addition for those that want to run on armv7. For changes in the makefile, I think Tianon is the best person to contact, you can try if he's available on IRC for questions.

@djmaze
Copy link
Contributor

djmaze commented Feb 5, 2015

Confirming this bug seems to be fixed after upgrading from 1.0.0 to 1.4.1 on my ARMv7 device (w/ btrfs driver).

@umiddelb
Copy link

I've built another docker ubuntu image (according to the 'official' one). With this image, you only need to replace the FROM statement with:
FROM armv7/armhf-ubuntu:14.04
and commend out the line starting with:
RUN curl -sSL https://storage.googleapis.com/golang/go${GOFMT_VERSION}
and you're done.

@rickyzhang82
Copy link

Can anyone share their binary working docker for armhf 32bit? I got this error in a random way. I can't even build the latest one from github. It is like chicken and egg problem...

Ignore my request. I found it here https://github.com/umiddelb/armhf/raw/master/bin/docker-1.6.0

@umiddelb Thanks for your sharing. The link in your comment above is dead. In any case, I feel tired to hit "sudo make buiild" again and again.

@umiddelb
Copy link

umiddelb commented May 9, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.