Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker slow and not responding #25433

Closed
eon01 opened this issue Aug 5, 2016 · 5 comments
Closed

Docker slow and not responding #25433

eon01 opened this issue Aug 5, 2016 · 5 comments

Comments

@eon01
Copy link

eon01 commented Aug 5, 2016

Output of docker version:

Docker version 1.12.0, build 8eab29e

Output of docker info:

Containers: 7
 Running: 2
 Paused: 0
 Stopped: 5
Images: 1
Server Version: 1.12.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 86
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local nfs
 Network: host null bridge overlay
Swarm: active
 NodeID: 34495kge70x4i15lnenpxop4w
 Is Manager: true
 ClusterID: 2t06r0vn5n38xeot3p5x72nia
 Managers: 2
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot interval: 10000
  Heartbeat tick: 1
  Election tick: 3
 Dispatcher:
  Heartbeat period: 5 seconds
 CA configuration:
  Expiry duration: 3 months
 Node Address: 10.0.1.222
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.13.0-92-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.859 GiB
Name: iron-1
ID: MKUL:QCR3:AVT6:BRO3:XILV:OROV:XEXK:LRET:EY5X:D2EX:HICN:WEKL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: user
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.): aws (EC2 Medium), Docker in Swarm Mode.

Issue: Docker is not responding and takes a long time to put this message out:

# docker service ls
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded

It takes also a considerable time to restart it:

service docker restart
[... waiting ..]
docker stop/waiting
docker start/running, process 26324

This is happening for the first time for me.
The only plugin I installed is ContainX/docker-volume-netshare, but it was not running when this bug happened.

The real problem with this is that all of my containers are shut down after restarting Docker (on the first node). Are containers supposed to shut down after a restart ?

docker service ps app
ID                         NAME            IMAGE                    NODE    DESIRED STATE  CURRENT STATE            ERROR
5an2nk1o2l1r2vfx1nsdh3heq  app.1      eon01/app:v1  server-2  Running        Running 13 minutes ago   
82fboutukie10eymspduhnq7k   \_ app.1  eon01/app:v1  server-1  Shutdown       Rejected 14 minutes ago  "Error processing tar file(exi…"
8ktfk350k8lly9w4v95ehpwcp   \_ app.1  eon01/app:v1  server-1  Shutdown       Rejected 14 minutes ago  "open /var/lib/docker/containe…"
65zcu4ja474nkzdhgjlwo444p   \_ app.1  eon01/app:v1  server-1  Shutdown       Rejected 15 minutes ago  "open /var/lib/docker/containe…"
dpr3g9c96sifw7498826kopyu   \_ app.1  eon01/app:v1  server-1  Shutdown       Failed 15 minutes ago    "task: non-zero exit (137)"
4ubbcl41iapze1lmd54y3li7f  app.2      eon01/app:v1  server-2  Running        Running 15 minutes ago   
cl6kqevl3uvw6qm9ir96b2yjk   \_ app.2  eon01/app:v1  server-1  Shutdown       Failed 15 minutes ago    "starting container failed: Ad…"
0pvxt5dihlglyy5yoz45alvwf  app.3      eon01/app:v1  server-1  Running        Running 13 minutes ago   
cpsp219b745eh00e2ahjlhtu1   \_ app.3  eon01/app:v1  server-1  Shutdown       Rejected 14 minutes ago  "Error processing tar file(exi…"
6fswazl2ktzsen6ljc6z2pbyu   \_ app.3  eon01/app:v1  server-1  Shutdown       Rejected 14 minutes ago  "Error processing tar file(exi…"
6whkrdggtz11smmfq08tj3ycz   \_ app.3  eon01/app:v1  server-1  Shutdown       Rejected 15 minutes ago  "Error processing tar file(exi…"
3wg9mq2s6ajljzyrvvzxngzgp   \_ app.3  eon01/app:v1  server-1  Shutdown       Failed 15 minutes ago    "task: non-zero exit (137)"
1kacslfttd037fmunz1y0y7rm  app.4      eon01/app:v1  server-2  Running        Running 15 minutes ago   
c8r28f4devj8bkav9xkoazxff   \_ app.4  eon01/app:v1  server-1  Shutdown       Failed 15 minutes ago    "task: non-zero exit (1)"
e04mnf2fuw4b9a2lfps20nduk  app.5      eon01/app:v1  server-1  Running        Running 13 minutes ago   
e66dw4ng982t19texewywotr9   \_ app.5  eon01/app:v1  server-1  Shutdown       Rejected 14 minutes ago  "open /var/lib/docker/containe…"
ecefkbp9h8sc7ycrf0o7cvwuh   \_ app.5  eon01/app:v1  server-1  Shutdown       Rejected 14 minutes ago  "Error processing tar file(exi…"
11p172nz7757sx4qo440nxkbl   \_ app.5  eon01/app:v1  server-1  Shutdown       Rejected 15 minutes ago  "Error processing tar file(exi…"
1kh8nmk01mmlu70x379spsuv3   \_ app.5  eon01/app:v1  server-1  Shutdown       Failed 15 minutes ago    "task: non-zero exit (137)"

@thaJeztah
Copy link
Member

Just some initial thoughts; This may be related to #25017

I do notice you're running a two-node cluster with two managers; you should never run with two managers; 1 manager or 3 managers is better. In fact, when running with two managers, you double the chance of a manager failure (compared to running a single manager); here's why;

The RAFT algorithm requires the managers to reach a quorum when making changes to the Swarm; in a two node manager setup, this effectively means that all managers must be present; if a single manager is lost, you lose control over the cluster. (And with two managers, chances are doubled that one of them fails). Also read this section of the documentation https://docs.docker.com/engine/swarm/admin_guide/#/add-manager-nodes-for-fault-tolerance

/cc @LK4D4 @aaronlehmann do you think this issue is the same / related to #25017?

@aaronlehmann
Copy link
Contributor

do you think this issue is the same / related to #25017?

It's possible, but hard to know without more context. It could simply be that the other manager is down, has changed its IP address, or some similar problem. As @thaJeztah mentioned, a two manager setup has the worst fault tolerance properties, so it's better to run with a single manager (or three).

@liubin
Copy link
Contributor

liubin commented Aug 8, 2016

Isn't ContainX/docker-volume-netshare in stopping status the reason? If your containers are using volumes.

@eon01
Copy link
Author

eon01 commented Aug 8, 2016

@liubin ContainX/docker-volume-netshare is installed but it was not running right when I got the problem.

@thaJeztah
Copy link
Member

Let me close this ticket for now, as it looks like it went stale.

@thaJeztah thaJeztah closed this as not planned Won't fix, can't repro, duplicate, stale Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants