Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hypriot/docker network stability issues. #19532

Closed
mevatron opened this issue Jan 21, 2016 · 18 comments
Closed

hypriot/docker network stability issues. #19532

mevatron opened this issue Jan 21, 2016 · 18 comments

Comments

@mevatron
Copy link

Hi,

I'm currently running Hypriot OS 0.6.1 with the latest docker-hypriot-1.9.1 build available on their website. Each time a try to build a docker container that pulls from github I get the following error (with curl verbosity enabled):

root@aaeac2ab909d:/home/meteor# GIT_CURL_VERBOSE=1 git clone --depth 1 https://github.com/4commerce-technologies-AG/meteor.git
Cloning into 'meteor'...
* Couldn't find host github.com in the .netrc file; using defaults
* Hostname was NOT found in DNS cache
*   Trying 192.30.252.129...
* Connected to github.com (192.30.252.129) port 443 (#0)
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
*    server certificate verification OK
*    common name: github.com (matched)
*    server certificate expiration date OK
*    server certificate activation date OK
*    certificate public key: RSA
*    certificate version: #3
*    subject: 
*    start date: Tue, 08 Apr 2014 00:00:00 GMT

*    expire date: Tue, 12 Apr 2016 12:00:00 GMT

*    issuer: C=US,O=DigiCert Inc,OU=www.digicert.com,CN=DigiCert SHA2 Extended Validation Server CA
*    compression: NULL
*    cipher: AES-128-GCM
*    MAC: AEAD
> GET /4commerce-technologies-AG/meteor.git/info/refs?service=git-upload-pack HTTP/1.1
User-Agent: git/2.1.4
Host: github.com
Accept: */*
Accept-Encoding: gzip
Pragma: no-cache

< HTTP/1.1 200 OK
* Server GitHub Babel 2.0 is not blacklisted
< Server: GitHub Babel 2.0
< Content-Type: application/x-git-upload-pack-advertisement
< Transfer-Encoding: chunked
< Expires: Fri, 01 Jan 1980 00:00:00 GMT
< Pragma: no-cache
< Cache-Control: no-cache, max-age=0, must-revalidate
< Vary: Accept-Encoding
< X-GitHub-Request-Id: 4B8A08F2:2C58:14CD5B7:56972928
< 
* Connection #0 to host github.com left intact
* Couldn't find host github.com in the .netrc file; using defaults
* Found bundle for host github.com: 0x83cdf8
* Re-using existing connection! (#0) with host github.com
* Connected to github.com (192.30.252.129) port 443 (#0)
> POST /4commerce-technologies-AG/meteor.git/git-upload-pack HTTP/1.1
User-Agent: git/2.1.4
Host: github.com
Accept-Encoding: gzip
Content-Type: application/x-git-upload-pack-request
Accept: application/x-git-upload-pack-result
Content-Length: 205

* upload completely sent off: 205 out of 205 bytes
< HTTP/1.1 200 OK
* Server GitHub Babel 2.0 is not blacklisted
< Server: GitHub Babel 2.0
< Content-Type: application/x-git-upload-pack-result
< Transfer-Encoding: chunked
< Expires: Fri, 01 Jan 1980 00:00:00 GMT
< Pragma: no-cache
< Cache-Control: no-cache, max-age=0, must-revalidate
< Vary: Accept-Encoding
< X-GitHub-Request-Id: 4B8A08F2:2C58:14CD5EE:56972928
< 
* Connection #0 to host github.com left intact
* Couldn't find host github.com in the .netrc file; using defaults
* Found bundle for host github.com: 0x83cdf8
* Re-using existing connection! (#0) with host github.com
* Connected to github.com (192.30.252.129) port 443 (#0)
> POST /4commerce-technologies-AG/meteor.git/git-upload-pack HTTP/1.1
User-Agent: git/2.1.4
Host: github.com
Accept-Encoding: gzip
Content-Type: application/x-git-upload-pack-request
Accept: application/x-git-upload-pack-result
Content-Length: 214

* upload completely sent off: 214 out of 214 bytes
< HTTP/1.1 200 OK
* Server GitHub Babel 2.0 is not blacklisted
< Server: GitHub Babel 2.0
< Content-Type: application/x-git-upload-pack-result
< Transfer-Encoding: chunked
< Expires: Fri, 01 Jan 1980 00:00:00 GMT
< Pragma: no-cache
< Cache-Control: no-cache, max-age=0, must-revalidate
< Vary: Accept-Encoding
< X-GitHub-Request-Id: 4B8A08F2:2C58:14CD635:56972929
< 
remote: Counting objects: 2610, done.
remote: Compressing objects: 100% (2235/2235), done.
* GnuTLS recv error (-54): Error in the pull function.B/s   
* Closing connection 0
error: RPC failed; result=56, HTTP code = 200| 2.90 MiB/s   
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

docker version

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 23:03:02 UTC 2015
 OS/Arch:      linux/arm

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 23:03:02 UTC 2015
 OS/Arch:      linux/arm

docker info

Containers: 1
Images: 9
Server Version: 1.9.1
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.12-hypriotos-v7+
Operating System: Raspbian GNU/Linux 8 (jessie)
CPUs: 4
Total Memory: 925.5 MiB
Name: black-pearl
ID: 5GCX:4NCH:RZ23:UJCM:LOUR:R2TV:Z3V4:RGVC:AKMX:7WS7:CR3U:L7QT
Debug mode (server): true
 File Descriptors: 15
 Goroutines: 25
 System Time: 2016-01-21T06:58:03.882301269+01:00
 EventsListeners: 0
 Init SHA1: 3ddb09b3a95073d6ab5f4ceba30f9fd506dbfff7
 Init Path: /usr/lib/docker/dockerinit
 Docker Root Dir: /var/lib/docker

I have found that there is something about my network that is causing this because the same docker build command runs successfully on the network at my office. However, even after replacing my router with a new one and nearly default settings (only a few DHCP reservations were made) the above problem persists. Occasionally, the command will work (maybe 1 / 20 attempts), but a majority of the time it will fail.

The same git clone command above works correctly on the hypriot host (Raspberry Pi 2). This tipped me off to try running the docker image by adding --net=host, this worked (and repeatedly so)! Unfortunately, the docker build command doesn't support the --net=host switch, so I was hoping someone with more docker experience had some more tricks to try out!

If more info is required, I'll be happy to provide it!

@GordonTheTurtle
Copy link

If you are reporting a new issue, make sure that we do not have any duplicates already open. You can ensure this by searching the issue list for this repository. If there is a duplicate, please close your issue and add a comment to the existing issue instead.

If you suspect your issue is a bug, please edit your issue description to include the BUG REPORT INFORMATION shown below. If you fail to provide this information within 7 days, we cannot debug your issue and will close it. We will, however, reopen it if you later provide the information.

For more information about reporting issues, see CONTRIBUTING.md.

You don't have to include this information if this is a feature request

(This is an automated, informational response)


BUG REPORT INFORMATION

Use the commands below to provide key information from your environment:

docker version:
docker info:

Provide additional environment details (AWS, VirtualBox, physical, etc.):

List the steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

Describe the results you expected:

Provide additional info you think is important:

----------END REPORT ---------

#ENEEDMOREINFO

@HackToday
Copy link
Contributor

It seems not docker issue, you just run git clone command, it failed,

Perhaps your firewall issue or proxy issue ?

@mevatron
Copy link
Author

Well, you'd think so, but the same command works fine on the Hypriot host. Also, docker containers on my laptop run git clones fine without --net=host on the same network. Also, I've completely changed the router on my network with no change in the error.

Currently, I am going to try downgrading to 1.8.3 to see what happens.

@HackToday
Copy link
Contributor

@mevatron so you failed with docker container running that git command, right ?

If that case, could the container access internet ?

@HackToday
Copy link
Contributor

Use curl to check, and tcpdump may help why that container failed to git clone

@mevatron
Copy link
Author

The container can definitely access the internet. apt-get successfully installs the requested packages. So, maybe I should try curl downloading something from github?

Dockerfile

FROM resin/armv7hf-debian:jessie

MAINTAINER Will Lucas

RUN apt-get update && \
  apt-get install -y --no-install-recommends \
  ca-certificates \
  curl \
  build-essential \
  debian-keyring \
  autoconf \
  automake \
  libtool \
  flex \
  bison \
  scons \
  runit \
  git && \
  rm -rf /var/lib/apt/lists/* && \
  apt-get clean && \
  groupadd -r meteor && \
  useradd -ms /bin/bash -r -g meteor meteor

WORKDIR /home/meteor
USER meteor
RUN git clone --depth 1 https://github.com/4commerce-technologies-AG/meteor.git && /home/meteor/meteor/meteor --version; exit 0

USER root
RUN ln -s /home/meteor/meteor/meteor /usr/local/bin/meteor && \
  ln -s /home/meteor/meteor/dev_bundle/bin/node /usr/local/bin/node && \
  ln -s /home/meteor/meteor/meteor/dev_bundle/bin/npm /usr/local/bin/npm && \
  ln -s /home/meteor/meteor/meteor/dev_bundle/mongodb/bin/mongod /usr/local/bin/mongod

Also, since you mentioned the firewall settings here are what I assume are the Hypriot defaults (as I haven't modified them since flashing the SD card):

$ sudo iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N DOCKER
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT

Also, the 1.8.3 downgrade didn't help so doesn't seem to be a regression anyway.

@gazzer82
Copy link

I am seeing some odd network issues also, most NPM installs fail during a docker build, usually with a JSON read error, or a checksum failure. Did you ever get to the bottom of this?

I have tried this on two separate Pi2 and one Pi1, the same NPM install command run on my OSX and Ubuntu machines work fine on the same network so it doesn't seem to be a network issue, or at least it's not effecting the whole network.

@mevatron
Copy link
Author

@gazzer82 Sorry to hear you are running into the same thing. I haven't been able to narrow down what's happening yet. I need to do some Wireshark captures and try to attack it from there. Have you tried running your build commands with --net=host under the same base image, but with docker run instead of docker build? I found I was able to correct whatever the issue is by doing that. Unfortunately, for philosophical reasons it seems docker build doesn't allow for the --net=host switch.

Since the commands work fine by using --net=host this leads me to believe it is something with the way docker-hypriot's iptables are interacting with my network. My x86 based docker builds work flawlessly on my network, so I don't think it is something with baseline docker. It is puzzling also that I can use the same Pi on my work network and have no issues with the above mentioned Dockerfile. I have changed everything I'm willing to change (hardware-wise) on my network and am still seeing the issue.

@StefanScherer
Copy link
Contributor

We got a similar issue on Scaleway. Perhaps this is related, see #18176 (comment)

@gazzer82
Copy link

Hmm, completely replaced my router, as i had been wanting to replace it, so it's now just the Pi hard wired to my Router (Asus RT-AC68P), which is then connected to an SMC D3CM1604 DOCSIS 3 cable modem on TWC.

Still failing to build, it did manage to get a little further and install Babel, but it is now failing later one. I am so stumped by this one.

MTU and everything is 1500, apart from the TWC interface which is something odd used by them. Maybe that's where the issue is coming from . .

lib/index.js -> bin/index.js
lib/tunnel.js -> bin/tunnel.js
npm WARN optional dep failed, continuing fsevents@^1.0.0
npm ERR! Linux 4.1.17-v7+
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "install" "-g" "--unsafe-perm"
npm ERR! node v0.12.0
npm ERR! npm  v2.5.1

npm ERR! shasum check failed for /tmp/npm-14-dadbd61c/registry.npmjs.org/node-forge/-/node-forge-0.6.39.tgz
npm ERR! Expected: 2184e89dba9b44b3aa54cd4bf1e7334f247cf9ce
npm ERR! Actual:   45bd84b4e929f086705c64b4606d46934574d362
npm ERR! From:     https://registry.npmjs.org/node-forge/-/node-forge-0.6.39.tgz
npm ERR! 
npm ERR! If you need help, you may report this error at:
npm ERR!     <http://github.com/npm/npm/issues>

npm ERR! Please include the following file with any support request:
npm ERR!     /data/airsonos/npm-debug.log

@mevatron
Copy link
Author

@StefanScherer Thanks for the tip! Is the #18176 fix included in the latest docker-hypriot_1.10.1-1_armhf.deb available on your site? I tried that build last night, but I'm still getting the same error unfortunately.

@StefanScherer
Copy link
Contributor

@mevatron Do you mean this correction for Scaleway? https://github.com/scaleway-community/scaleway-docker/pull/51/files
No this isn't part of docker-hypriot_1.10.1-1_armhf.deb. You might add the change manually and restart your Docker Engine.

@gazzer82
Copy link

@mevatron i have tried making this change on my system, and it does't seem to fix the issue, be intrigued to see if it makes any difference for you?

@mevatron
Copy link
Author

mevatron commented May 1, 2016

@gazzer82 I finally got a chance to try the workaround mentioned @StefanScherer it also did not work for me. But... I do believe I have found the solution. With many, many thanks to @aaronlehmann! He originally found the workaround here distribution/distribution#785 (comment). He has a more complete summary on moby/libnetwork#1090.

I ended up using his echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal workaround.

This has me now building container images from home!

Hope this helps you and others!
Will

@aaronlehmann
Copy link
Contributor

@mevatron: Great to hear that this workaround was helpful. This is the first case I've heard of this issue surfacing outside AWS, but it makes sense that other environments which somehow generate invalid packets would suffer from it. It's a useful data point to know that the same thing can happen on a residential internet connection.

@mevatron
Copy link
Author

mevatron commented May 1, 2016

@aaronlehmann Thanks again for discovering that! It was quite annoying to not be able to build containers from my home network. Also, I have a Cisco DPC3216 modem I'm not sure if that is the issue or my ISP. Although, my work network and home network both use the same ISP and I can build container images without issue at work.

Have a good rest of the weekend!

@StefanScherer
Copy link
Contributor

Thanks @mevatron! I've opened issue hypriot/image-builder-rpi#57 to check and improve the SD image. Please follow the progress there. Closing this issue.

@mevatron
Copy link
Author

mevatron commented May 2, 2016

@StefanScherer Sounds great! Keep up the great work on Hypriot! I'll definitely try to continue to pitch in :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants