Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.7.0/CentOS/RHEL 6.6 - bridge interface creation fails. daemon won't start. #14024

Closed
visualphoenix opened this issue Jun 18, 2015 · 49 comments
Closed
Assignees
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Milestone

Comments

@visualphoenix
Copy link

Can't provide docker info, since daemon won't start.

# docker -D -d
DEBU[0000] Registering POST, /containers/{name:.*}/unpause 
DEBU[0000] Registering POST, /containers/{name:.*}/restart 
DEBU[0000] Registering POST, /exec/{name:.*}/resize     
DEBU[0000] Registering POST, /containers/create         
DEBU[0000] Registering POST, /containers/{name:.*}/kill 
DEBU[0000] Registering POST, /containers/{name:.*}/start 
DEBU[0000] Registering POST, /containers/{name:.*}/stop 
DEBU[0000] Registering POST, /containers/{name:.*}/resize 
DEBU[0000] Registering POST, /auth                      
DEBU[0000] Registering POST, /build                     
DEBU[0000] Registering POST, /containers/{name:.*}/wait 
DEBU[0000] Registering POST, /containers/{name:.*}/attach 
DEBU[0000] Registering POST, /containers/{name:.*}/copy 
DEBU[0000] Registering POST, /containers/{name:.*}/exec 
DEBU[0000] Registering POST, /exec/{name:.*}/start      
DEBU[0000] Registering POST, /commit                    
DEBU[0000] Registering POST, /images/create             
DEBU[0000] Registering POST, /images/load               
DEBU[0000] Registering POST, /images/{name:.*}/push     
DEBU[0000] Registering POST, /images/{name:.*}/tag      
DEBU[0000] Registering POST, /containers/{name:.*}/pause 
DEBU[0000] Registering POST, /containers/{name:.*}/rename 
DEBU[0000] Registering DELETE, /containers/{name:.*}    
DEBU[0000] Registering DELETE, /images/{name:.*}        
DEBU[0000] Registering OPTIONS,                         
DEBU[0000] Registering GET, /version                    
DEBU[0000] Registering GET, /containers/ps              
DEBU[0000] Registering GET, /containers/{name:.*}/changes 
DEBU[0000] Registering GET, /images/{name:.*}/history   
DEBU[0000] Registering GET, /containers/json            
DEBU[0000] Registering GET, /containers/{name:.*}/export 
DEBU[0000] Registering GET, /containers/{name:.*}/json  
DEBU[0000] Registering GET, /events                     
DEBU[0000] Registering GET, /images/search              
DEBU[0000] Registering GET, /images/get                 
DEBU[0000] Registering GET, /images/{name:.*}/get       
DEBU[0000] Registering GET, /exec/{id:.*}/json          
DEBU[0000] Registering GET, /info                       
DEBU[0000] Registering GET, /containers/{name:.*}/top   
DEBU[0000] Registering GET, /containers/{name:.*}/logs  
DEBU[0000] Registering GET, /containers/{name:.*}/stats 
DEBU[0000] Registering GET, /_ping                      
DEBU[0000] Registering GET, /images/json                
DEBU[0000] Registering GET, /images/{name:.*}/json      
DEBU[0000] Registering GET, /containers/{name:.*}/attach/ws 
WARN[0000] You are running linux kernel version 2.6.32-504.8.1.el6.x86_64, which might be unstable running docker. Please upgrade your kernel to 3.10.0. 
DEBU[0000] Warning: could not change group /var/run/docker.sock to docker: Group docker not found 
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock) 
DEBU[0000] devicemapper: driver version is 4.27.0       
DEBU[0000] Generated prefix: docker-252:3-1311141       
DEBU[0000] Checking for existence of the pool 'docker-252:3-1311141-pool' 
DEBU[0000] Pool doesn't exist. Creating it.             
DEBU[0000] Error retrieving the next available loopback: open /dev/loop-control: no such file or directory 
DEBU[0000] Error retrieving the next available loopback: open /dev/loop-control: no such file or directory 
DEBU[0000] [deviceset] constructDeviceIdMap()           
DEBU[0000] Loading data for file /var/lib/docker/devicemapper/metadata/base 
DEBU[0000] Added deviceId=1 to DeviceIdMap              
DEBU[0000] Skipping file /var/lib/docker/devicemapper/metadata/deviceset-metadata 
DEBU[0000] Loading data for file /var/lib/docker/devicemapper/metadata/transaction-metadata 
DEBU[0000] Added deviceId=1 to DeviceIdMap              
DEBU[0000] [deviceset] constructDeviceIdMap() END       
INFO[0000] [graphdriver] using prior storage driver "devicemapper" 
DEBU[0000] Using graph driver devicemapper              
DEBU[0000] Using default logging driver json-file       
DEBU[0000] Creating images graph                        
DEBU[0000] Restored 0 elements                          
DEBU[0000] Creating repository list                     
WARN[0000] Running modprobe bridge nf_nat failed with message: , error: exit status 1 
DEBU[0000] /sbin/iptables, [-t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER] 
DEBU[0000] /sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER] 
DEBU[0000] /sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER] 
DEBU[0000] /sbin/iptables, [-t nat -D PREROUTING]       
DEBU[0000] /sbin/iptables, [-t nat -D OUTPUT]           
DEBU[0000] /sbin/iptables, [-t nat -F DOCKER]           
DEBU[0000] /sbin/iptables, [-t nat -X DOCKER]           
DEBU[0000] [deviceset docker-252:3-1311141] Shutdown()  
DEBU[0000] [devmapper] Shutting down DeviceSet: /var/lib/docker/devicemapper 
DEBU[0000] [devmapper] deactivateDevice()               
DEBU[0000] [devmapper] deactivateDevice END()           
DEBU[0000] [devmapper] deactivatePool()                 
DEBU[0000] [devmapper] devicemapper.GetDeps() /dev/mapper/docker-252:3-1311141-pool: &devicemapper.Deps{Count:0x2, Filler:0x0, Device:[]uint64(nil)} 
DEBU[0000] [devmapper] deactivatePool END               
DEBU[0000] [deviceset docker-252:3-1311141] Shutdown() END 
FATA[0000] Error starting daemon: Error initializing network controller: Error creating default "bridge" network: operation not supported 
@visualphoenix
Copy link
Author

docker version:

# docker version
Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 0baf609
OS/Arch (client): linux/amd64
Cannot connect to the Docker daemon. Is 'docker -d' running on this host?

docker info:

# docker info
Cannot connect to the Docker daemon. Is 'docker -d' running on this host?

uname -a:

$ uname -a
Linux yopt.np.wc1.yellowpages.com 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Freshly created RHEL and CentOS 6.6 hosts tested.
Upgraded 1.6.0 to 1.7.0 worked because the bridge was already created on the host, however, rebooting the host (and losing the bridge from 1.6.0) yielded the same error (above) as the freshly created hosts.

@visualphoenix
Copy link
Author

mentioned in #13528

@visualphoenix
Copy link
Author

Describe the results you received: Docker bridge interface was not created and docker daemon does not start.

Describe the results you expected: I expect to be able to upgrade to 1.7.0 and have CentOS 6.6 hosts still work.

Provide additional info you think is important: Might be a bug in libnetwork?

@aboch
Copy link
Contributor

aboch commented Jun 18, 2015

Looking at the logs provided, I am wondering whether we support kernelversion 2.6.32.

@cpuguy83
Copy link
Member

@aboch We should assuming it's a RHEL kernel.

@visualphoenix
Copy link
Author

@cpuguy83 yes it is RHEL kernel

@visualphoenix
Copy link
Author

possibly related to moby/libnetwork#311

@icecrime
Copy link
Contributor

This code path is probably unrelated: the modprobe only warns in case of failure.

@visualphoenix
Copy link
Author

@icecrime true - in moby/libnetwork#312 i note some other differences in the loaded modules between 1.6.0 and 1.7.0 shown in lsmod. not sure if these didnt get loaded because the daemon bailed before they got loaded.

@visualphoenix
Copy link
Author

@mavenugo ping?

@visualphoenix
Copy link
Author

@thaJeztah can we get this labeled as a bug/regression? any idea if this can be fixed in 1.7.1?

@jessfraz jessfraz added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Jun 18, 2015
@LK4D4 LK4D4 added this to the 1.7.1 milestone Jun 18, 2015
@visualphoenix
Copy link
Author

👍 thanks @jfrazelle @LK4D4

@visualphoenix visualphoenix changed the title 1.7.0 Daemon wont start on CentOS/RHEL 6.6 1.7.0/CentOS/RHEL 6.6 - bridge interface creation fails. daemon won't start. Jun 18, 2015
@thaJeztah
Copy link
Member

Sorry, was away for a bit :)

@icecrime
Copy link
Contributor

The issue is acknowledged, and the root cause is understood. Current situation is:

  • The patch is not trivial (quite some code to backport to libnetwork), we need to make sure we get it right
  • 1.7.0 is a big release, and there might be more issues, so it's worth giving it a few days to collect feedback

So all in all, this is definitely going to make it into 1.7.1, but we're not going rush into building that version in the next few days. I hope that sounds right to everyone.

@visualphoenix
Copy link
Author

@icecrime makes perfect sense to me. seems like you guys are still getting coverage on some of the bigger library migrations that went on, so completely understood why you want to let folks hammer on it for a bit.

Can you go into exactly what code wasnt ported into libnetwork? I thought libnetwork integration was supposed to be almost a straight 1-1 port (for now) of existing docker code.

@icecrime
Copy link
Contributor

@visualphoenix It's actually a rewrite, and apparently some older kernel specific fallback logic was lost in the process (this one in this particular case: https://github.com/docker/libcontainer/blob/master/netlink/netlink_linux.go#L1144).

@LK4D4
Copy link
Contributor

LK4D4 commented Jun 18, 2015

@visualphoenix ioctl was used instead of netlink

@visualphoenix
Copy link
Author

@LK4D4 @icecrime ah makes sense. @vishvananda's netlink lib already has fallbacks for older kernels

@noelob
Copy link

noelob commented Jun 22, 2015

Can you provide a link to the 1.6.2 RPM? The docs (https://docs.docker.com/installation/rhel/) reference the 1.7.0 RPM, which is blocking me from getting Docker installed.

Thanks.

@sgykfjsm
Copy link

@noelob How about binary? I got binary url for v1.6 here: https://docs.docker.com/installation/binaries/
It seems working fine in my environment.

@tubia
Copy link

tubia commented Jun 23, 2015

Hi,
I experienced the same error, while I was trying to fix the problem related to #14035.
I'm using Debian 7 64bit with a custom 3.10 kernel from gandi.net (AUFS not supported).

@vinnyspb
Copy link

Would be really nice to have a link to RHEL 6.6 RPM to the older version. Tried to dig around but can't find it.

@cpuguy83
Copy link
Member

@vinnyspb There is not one. We didn't provide RPM's until the 1.7 release.

@noelob
Copy link

noelob commented Jun 23, 2015

@sgykfjsm @cpuguy83 Ok, thanks I'll try the binary

@visualphoenix
Copy link
Author

@noelob @sgykfjsm @cpuguy83 hey guys - dont use the binary version from docker on centos 6.6 or you're going to have a bad time. Use the docker 1.6.2 rpms from el6 testing:

sudo yum install -y http://mirror.centos.org/centos/6.6/os/x86_64/Packages/device-mapper-libs-1.02.90-2.el6.x86_64.rpm http://mirror.centos.org/centos/6.6/os/x86_64/Packages/device-mapper-1.02.90-2.el6.x86_64.rpm http://mirror.centos.org/centos/6.6/os/x86_64/Packages/device-mapper-event-1.02.90-2.el6.x86_64.rpm http://mirror.centos.org/centos/6.6/os/x86_64/Packages/device-mapper-event-libs-1.02.90-2.el6.x86_64.rpm https://dl.fedoraproject.org/pub/epel/testing/6/x86_64/docker-io-1.6.2-1.el6.x86_64.rpm

If you use the binary ones you will not have udev sync support and that will cause you to have a real bad time.

@visualphoenix
Copy link
Author

Wow this is crazy. I can't believe all of a sudden RH claims 6.x isn't supported. Both Docker and RHEL launched announcing 6.5 was supported. Recently Docker upped the minimum requirement to the 6.6 kernel (understandably). Losing 6.6 (and 6.7 in the future) support would be a huge problem for the roll out I've been doing of docker.

@visualphoenix
Copy link
Author

@Khazrak @vinnyspb wow - we don't have that kernel available to us yet so I had not seen that issue. Very troubling. Would be great if a separate ticket was opened regarding that crash on the most recent kernel. Maybe the docker folks will continue to be kind and help keep 6.6 support.

@gavinwhyte
Copy link

I agree with visualphoenix this is awful, I was about to roll it out to a bank, and there is no support for centos 6.6 or RHEL 6.6 on Docker 1.7.

@visualphoenix
Copy link
Author

@gavinwhyte i completely agree. it's a huge problem because not every business is done upgrading everything from sysvinit stuff to systemd so we have a huge ecosystem of base monitoring and support infrastructure which doesnt currently work with RHEL/CentOS7. Maybe we should create a new ticket regarding RHEL/CentOS 6 support for Docker and get @rhatdan and @shykes to comment rather than continue to clutter this ticket with issues related to that problem

@pdericson
Copy link
Contributor

+1 @gavinwhyte and @visualphoenix I have no love for RHEL / CentOS 6.x but for a lot of large enterprises it's not going to go away anytime soon. The last thing I want to do is stop using Docker because upstream is releasing unstable kernels - please help, Docker Inc.

@visualphoenix
Copy link
Author

Maybe we can move this part of the discussion to #14174. As for this issue, I hope the docker devs backport the support for 6.6 into libnetwork as discussed earlier in the ticket.

@choman
Copy link

choman commented Jun 27, 2015

Is there a manual fix to the docker 1.7 on centos 6.6. issue? at least until 1.7.1. is out

Also, not trying to rush 1.7.1 but does anyone know of the proposed release date?

@mrjana
Copy link
Contributor

mrjana commented Jun 30, 2015

@visualphoenix @lesgrossman and who ever else interested, would you be willing to test a docker binary which contains fixes for the networking issues? If so, let me know I will provide a binary so that you can provide feedback before we get out an official 1.7.1-rc1

@thaJeztah
Copy link
Member

Also, the 1.7.1-RC is being prepared here: #14264 (for those that want to stay informed on progress)

@choman
Copy link

choman commented Jun 30, 2015

@mrjana I would love to assist where possible on this. I need to try and get 1.7.x running on both centos 6.6 and rhel 6.6. I can take both the binary and the RC. My schedule is hectic so I hope I can provide enough value feedback.

@punya
Copy link

punya commented Jun 30, 2015

@mrjana I'd be happy to try out the RC binary as well.

@calavera
Copy link
Contributor

calavera commented Jul 1, 2015

This should be fixed on master and was merged into the release branch #14264. Closing.

@calavera calavera closed this as completed Jul 1, 2015
@007reader
Copy link

I'd be interested in testing too

@alexanderilyin
Copy link

Suggestion from @visualphoenix works for me. Thx!

@hansloven
Copy link

sooooo..... for me, this is resolved by:

  1. su root
  2. yum remove docker-engine
  3. yum install lxc
    *3) yum install docker-io
  4. verify the following command runs fine and then ctrl-C it:
    docker -d -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375
  5. yum remove docker-io
    *5) yum install docker-engine
  6. verified again that the following command runs fine and then ctrl-C it:
    docker -d -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375
  7. now starting service runs as expected
    service docker start

*NOTE: in 3 & 5 I had downloaded each docker rpm individually and was installing from the local directory
download urls:
https://dl.fedoraproject.org/pub/epel/testing/6/x86_64/docker-io-1.6.2-1.el6.x86_64.rpm
and
https://get.docker.com/rpm/1.7.0/centos-6/RPMS/x86_64/docker-engine-1.7.0-1.el6.x86_64.rpm

....absence of issue when progressing from 1.6.2 to 1.7.0 is probably why it was not caught in development....

@AndrewSwerlick
Copy link

@hansloven I found that workaround as well by accident, but I'd recommend against actually using it. It has a couple of flaws, that I experienced in our development environment

  1. Whenever you reboot you'll have to go through those steps again because docker will have to recreate the bridge and will fail to do so with v 1.7.0
  2. I found myself experiencing the dreaded "Unable to start container, device or resource busy" error with high frequency, I suspect because of some version related conflicts for containers created in one version, but then using the next version.

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 2, 2015

I'm pretty sure you can create same bridge with iproute2.

@smgbackup
Copy link

@visualphoenix Given wait period for 1.7.1 re RHEL 6.6 I've been trying to apply your tip from above. It threw a LXC dependency issue that I have been unable to resolve, probably my ignorance. After trying a few other ways including "wget" the problem disappeared. I now have v1.6.2 working with my RHEL 6.6 but still wondering why I had this LXC issue.

@KoenVingerhoets
Copy link

@mrjana - despite the (apparently imminent) release of 1.7.1, I would like to test the 1.7-RC1.
Is that still possible? Also, could you please point me to an rpm package of the RC1? Thank you.

@mrjana
Copy link
Contributor

mrjana commented Jul 7, 2015

@KoenVingerhoets and everybody else who were not able to get to a 1.7.1-RC1 rpm get it from here:

#14264 (comment)

@k2xl
Copy link

k2xl commented Jul 7, 2015

Going to the 1.7.1 RC worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet
Development

No branches or pull requests