Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks are killed unexpectedly #32218

Closed
le-ortega opened this issue Mar 30, 2017 · 15 comments
Closed

Tasks are killed unexpectedly #32218

le-ortega opened this issue Mar 30, 2017 · 15 comments

Comments

@le-ortega
Copy link

Description

I have created a cluster with 5 nodes (3 managers and 2 workers). I am testing the behavior of cluster when a node is disconnected. Once I have created my cluster, I have checked that leader is 'manager1' server:

root@manager1:~# docker node ls
ID                           HOSTNAME     STATUS  AVAILABILITY  MANAGER STATUS
79sw5tsvzxewr5lcwne37s9mu    manager2     Ready   Active        Reachable
d2swwbt2n2j58dyzwjs2xc8f7    node2        Ready   Active
dqxztwj3rrycomdvy7y4xkstd    manager3     Ready   Active        Reachable
jza9lmqz1nv9gmncrkmsdd9r8 *  manager1     Ready   Active        Leader
vae6tkmhc9mapv9fcck38nu5m    node1        Ready   Active

I turn off the 'manager1' server and I have detected that some containers of some services in other nodes are killed to be started again. I would expect that those tasks running in
'manager1' were planned in other node(s).

Here, we can see that a task was killed in 'node2' to be started again:

root@manager1:~# docker service ps consulagent
ID            NAME                                   IMAGE                        NODE         DESIRED STATE  CURRENT STATE            ERROR  PORTS
nrv6ofiwlsbz  consulagent.dqxztwj3rrycomdvy7y4xkstd  127.0.0.1:5000/consul:x.x.x  manager3     Running        Pending 19 minutes ago
5umgd96jc4np  consulagent.d2swwbt2n2j58dyzwjs2xc8f7  127.0.0.1:5000/consul:x.x.x  node2        Running        Running 19 minutes ago
kgs950z3hvdo  consulagent.vae6tkmhc9mapv9fcck38nu5m  127.0.0.1:5000/consul:x.x.x  node1        Shutdown       Shutdown 18 minutes ago
qzvs3jg10u5m  consulagent.d2swwbt2n2j58dyzwjs2xc8f7  127.0.0.1:5000/consul:x.x.x  node2        Shutdown       Shutdown 19 minutes ago

Steps to reproduce the issue:

  1. I have created a cluster with 5 nodes (3 managers and 2 workers)
  2. Turn off the 'manager1' server (in my case, it is swarm leader)
  3. Turn on the 'manager1' server again (20 seconds after turning off it)

Describe the results you received:
Some containers of some services in other nodes were killed to be started again.

Describe the results you expected:
Tasks running in 'manager1' were planned in other node(s).

Additional information you deem important (e.g. issue happens only occasionally):
Issue happens always when leader node is disconnected.

Output of docker version:

root@manager1:~# docker version
Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   60ccb22
 Built:        Thu Feb 23 11:02:43 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   60ccb22
 Built:        Thu Feb 23 11:02:43 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

root@manager1:~# docker info
 Containers: 22
  Running: 17
  Paused: 0
  Stopped: 5
 Images: 29
 Server Version: 17.03.0-ce
 Storage Driver: aufs
  Root Dir: /var/lib/docker/aufs
  Backing Filesystem: extfs
  Dirs: 555
  Dirperm1 Supported: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host macvlan null overlay
 Swarm: active
  NodeID: jza9lmqz1nv9gmncrkmsdd9r8
  Is Manager: true
  ClusterID: xx8uy075owfn8xxq5b164k5j1
  Managers: 3
  Nodes: 5
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 3
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
  Node Address: x.x.x.x
  Manager Addresses:
   0.0.0.0:2377
   x.x.x.x:2377
   y.y.y.y:2377
   z.z.z.z:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 977c511eda0925a723debdc94d09459af49d082a
 runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
 init version: 949e6fa
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.4.0-66-generic
 Operating System: Ubuntu 16.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 31.42 GiB
 Name: manager1
 ID: MRXH:7HX7:P6CI:GEUB:7QFR:3YOC:FNDS:PMLQ:JBYE:YARV:7QVC:HU65
 Docker Root Dir: /var/lib/docker
 Debug Mode (client): false
 Debug Mode (server): false
 Registry: https://index.docker.io/v1/
 WARNING: No swap limit support
 Labels:
  provider=digitalocean
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
This cluster runs on DigitalOcean.

Additional information:
I add some logs of another time I have reproduced it (leader was 'manager3'):

Manager1 logs:

Mar 22 12:38:34 manager1 docker[16977]: time="2017-03-22T12:38:34.879173831Z" level=info msg="743461d627fc65f0 is starting a new election at term 3"
Mar 22 12:38:34 manager1 docker[16977]: time="2017-03-22T12:38:34.879299534Z" level=info msg="743461d627fc65f0 became candidate at term 4"
Mar 22 12:38:34 manager1 docker[16977]: time="2017-03-22T12:38:34.879318076Z" level=info msg="743461d627fc65f0 received vote from 743461d627fc65f0 at term 4"
Mar 22 12:38:34 manager1 docker[16977]: time="2017-03-22T12:38:34.879343604Z" level=info msg="743461d627fc65f0 [logterm: 3, index: 562] sent vote request to 632639c85c597131 at term 4"
Mar 22 12:38:34 manager1 docker[16977]: time="2017-03-22T12:38:34.879371042Z" level=info msg="743461d627fc65f0 [logterm: 3, index: 562] sent vote request to 155dcbc8adeaf7bc at term 4"
Mar 22 12:38:34 manager1 docker[16977]: time="2017-03-22T12:38:34.879405271Z" level=info msg="raft.node: 743461d627fc65f0 lost leader 155dcbc8adeaf7bc at term 4"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38.880916001Z" level=info msg="743461d627fc65f0 is starting a new election at term 4"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38.881017732Z" level=info msg="743461d627fc65f0 became candidate at term 5"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38.881036476Z" level=info msg="743461d627fc65f0 received vote from 743461d627fc65f0 at term 5"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38.881070231Z" level=info msg="743461d627fc65f0 [logterm: 3, index: 562] sent vote request to 632639c85c597131 at term 5"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38.881119405Z" level=info msg="743461d627fc65f0 [logterm: 3, index: 562] sent vote request to 155dcbc8adeaf7bc at term 5"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38Z" level=info msg="Firewalld running: false"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38.899715783Z" level=info msg="743461d627fc65f0 received vote from 632639c85c597131 at term 5"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38.899771473Z" level=info msg="743461d627fc65f0 [quorum:2] has received 2 votes and 0 vote rejections"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38.899819105Z" level=info msg="743461d627fc65f0 became leader at term 5"
Mar 22 12:38:38 manager1 docker[16977]: time="2017-03-22T12:38:38.899855714Z" level=info msg="raft.node: 743461d627fc65f0 elected leader 743461d627fc65f0 at term 5"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39Z" level=info msg="Firewalld running: false"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.243005873Z" level=error msg="update failed" error="task vweb3v6bitiw60oak3c8t9a0m was already shut down when reached by updater" task.id=son7ons3v3h3xpcsjh018hrgi
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.243097567Z" level=error msg="update failed" error="task vweb3v6bitiw60oak3c8t9a0m was already shut down when reached by updater" task.id=6tefwlqxqh15wg8k47kggupvv
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.243205414Z" level=error msg="update failed" error="task vweb3v6bitiw60oak3c8t9a0m was already shut down when reached by updater" task.id=vfh6ty1py82kj2pgwtp8i3zzo
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.243312214Z" level=error msg="update failed" error="task vweb3v6bitiw60oak3c8t9a0m was already shut down when reached by updater" task.id=cgzq7zgk4de64q5fmllfwvnxf
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.248536277Z" level=error msg="failed allocating task q2by4y7fi2lvsrog4oxwzasfr during init" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.249198103Z" level=error msg="failed allocating task rff8khdnjs5kgybknl65tt38l during init" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.252864009Z" level=error msg="update failed" error="task vweb3v6bitiw60oak3c8t9a0m was already shut down when reached by updater" task.id=nm5pjfc9c2c75pd9piud9iblf
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.313478526Z" level=error msg="update failed" error="task v8w1euhc8e9mrswh21z7ypxia was already shut down when reached by updater" task.id=3q5bad6qr4gcat54llfymighb
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.313873459Z" level=error msg="update failed" error="task v8w1euhc8e9mrswh21z7ypxia was already shut down when reached by updater" task.id=5jqlko0n5tdu6ga3w7hjht7lw
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.313974627Z" level=error msg="update failed" error="task v8w1euhc8e9mrswh21z7ypxia was already shut down when reached by updater" task.id=ag50ccnld1va0cctj7lf9q5sl
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.314093329Z" level=error msg="update failed" error="task v8w1euhc8e9mrswh21z7ypxia was already shut down when reached by updater" task.id=fpzuij2np56no4efyqo2wca0m
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39Z" level=info msg="Firewalld running: false"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.380160580Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.380992136Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.410820579Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.411410017Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.411933151Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.412295109Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.412748384Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.413129769Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.413504678Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.413877231Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.414196638Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.414506609Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.414829787Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.415170746Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.416579214Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.416924785Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.441738602Z" level=error msg="update failed" error="task rprx2f8ct9ngu1up1mmhlqti5 was already shut down when reached by updater" task.id=lxaivsv8xz5wkzqape6wts7hp
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39Z" level=info msg="Firewalld running: false"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.505388822Z" level=error msg="update failed" error="task rprx2f8ct9ngu1up1mmhlqti5 was already shut down when reached by updater" task.id=tqeypt3nqrpk0fvn6hb8kr53o
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.527718848Z" level=error msg="update failed" error="task 6zmtlp0b37ljgz19vez7bg8o8 was already shut down when reached by updater" task.id=a9kdyz2y6oqrbw9brfp2uutnf
ar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.601090566Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.601202731Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.602731338Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.602833056Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.603004402Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.603061444Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.603283649Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.603360695Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.603475125Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.603548640Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.603662377Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.603704270Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.616321391Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.617729951Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.617878182Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.617957211Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.620438920Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.620578258Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.622586530Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.622724127Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.622882812Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.622970865Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.644343179Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.644548616Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.719122686Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.719269896Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.737819942Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.737925716Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.738147309Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.738241418Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.738386962Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.738511157Z" level=error msg="task allocation failure" error="failed during networktask allocation for task q2by4y7fi2lvsrog4oxwzasfr: failed to allocate network IP for task q2by4y7fi2lvsrog4oxwzasfr network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39Z" level=info msg="Firewalld running: false"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.755998011Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.807524591Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.819103499Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.841300939Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.841428770Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.841651408Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.853050959Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.853202575Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.853307193Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.853498997Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.853591334Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.853666664Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.853746015Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.853813091Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.853870343Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.891264860Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.907392247Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:39 manager1 docker[16977]: time="2017-03-22T12:38:39.992237390Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.011744089Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40Z" level=info msg="Firewalld running: false"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.050699391Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40Z" level=info msg="Firewalld running: false"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.275064368Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.439217729Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40Z" level=info msg="Firewalld running: false"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.466697730Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.485198770Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.578584672Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.614293212Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.667935153Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40Z" level=info msg="Firewalld running: false"
Mar 22 12:38:40 manager1 docker[16977]: time="2017-03-22T12:38:40.825128596Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:41 manager1 docker[16977]: time="2017-03-22T12:38:41.028157199Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:41 manager1 docker[16977]: time="2017-03-22T12:38:41Z" level=info msg="Firewalld running: false"
Mar 22 12:38:41 manager1 docker[16977]: time="2017-03-22T12:38:41.277262230Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52.720632794Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52.720815578Z" level=error msg="update failed" error="task rprx2f8ct9ngu1up1mmhlqti5 was already shut down when reached by updater" task.id=gwllbg12vw1atjcwhke8zoxph
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52.720942515Z" level=error msg="update failed" error="task 6zmtlp0b37ljgz19vez7bg8o8 was already shut down when reached by updater" task.id=ixcivz3z60sbqkltv94tghlg2
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52.721079707Z" level=error msg="update failed" error="task ui29ugtpzr94eo5ep9fdgoc0o was already shut down when reached by updater" task.id=54ukltf16apawwy1zci88z301
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52.721203202Z" level=error msg="update failed" error="task s3w8bwxhb5o92nknjrqh7iolz was already shut down when reached by updater" task.id=i6g6af7l11prqnhvi1n0d2buf
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52.732888146Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52.765324603Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52Z" level=info msg="Firewalld running: false"
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52.841646491Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:52 manager1 docker[16977]: time="2017-03-22T12:38:52.933435464Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:38:53 manager1 docker[16977]: time="2017-03-22T12:38:53Z" level=info msg="Firewalld running: false"
Mar 22 12:38:53 manager1 docker[16977]: time="2017-03-22T12:38:53.149205596Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:39:54 manager1 docker[16977]: time="2017-03-22T12:39:54.479236541Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:39:54 manager1 docker[16977]: time="2017-03-22T12:39:54.484528064Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:39:54 manager1 docker[16977]: time="2017-03-22T12:39:54.694071852Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.097440278Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.168230738Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.188752958Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.221090636Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.248009602Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.275128493Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.311353864Z" level=warning msg="cannot find target task in store" forwarder.id=xt2pbjzefr8xtcpjh7mk73tdj method="(*Dispatcher).UpdateTaskStatus" node.id=jy5gyelmk0vzi179trlslp33k node.session=jkp7m9yv7o3778n4aa30pluij task.id=nizcvvei5fld5vcymi0btcbqc
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.323288217Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.380988052Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.426957598Z" level=error msg="task unavailable" method="(*Dispatcher).processUpdates" module=dispatcher task.id=nizcvvei5fld5vcymi0btcbqc
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.438836330Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.443704298Z" level=error msg="task allocation failure" error="failed during networktask allocation for task rff8khdnjs5kgybknl65tt38l: failed to allocate network IP for task rff8khdnjs5kgybknl65tt38l network zqquu8yjk0tzh1h1ninajhnm7: could not allocate IP from IPAM: Address already in use"
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.526135145Z" level=warning msg="cannot find target task in store" forwarder.id=xt2pbjzefr8xtcpjh7mk73tdj method="(*Dispatcher).UpdateTaskStatus" node.id=jy5gyelmk0vzi179trlslp33k node.session=jkp7m9yv7o3778n4aa30pluij task.id=nizcvvei5fld5vcymi0btcbqc
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.539621699Z" level=error msg="task unavailable" method="(*Dispatcher).processUpdates" module=dispatcher task.id=nizcvvei5fld5vcymi0btcbqc
Mar 22 12:42:00 manager1 docker[16977]: time="2017-03-22T12:42:00.560197204Z" level=error msg="update failed" error="task 6zmtlp0b37ljgz19vez7bg8o8 was already shut down when reached by updater" task.id=2bs2ccra5q08wzi8qdwe39tpr

Manager2 logs:

Mar 22 12:38:36 manager2 docker[17009]: time="2017-03-22T12:38:36.148626929Z" level=info msg="632639c85c597131 is starting a new election at term 3"
Mar 22 12:38:36 manager2 docker[17009]: time="2017-03-22T12:38:36.148734518Z" level=info msg="632639c85c597131 became candidate at term 4"
Mar 22 12:38:36 manager2 docker[17009]: time="2017-03-22T12:38:36.148747451Z" level=info msg="632639c85c597131 received vote from 632639c85c597131 at term 4"
Mar 22 12:38:36 manager2 docker[17009]: time="2017-03-22T12:38:36.148764999Z" level=info msg="632639c85c597131 [logterm: 3, index: 562] sent vote request to 743461d627fc65f0 at term 4"
Mar 22 12:38:36 manager2 docker[17009]: time="2017-03-22T12:38:36.148790091Z" level=info msg="632639c85c597131 [logterm: 3, index: 562] sent vote request to 155dcbc8adeaf7bc at term 4"
Mar 22 12:38:36 manager2 docker[17009]: time="2017-03-22T12:38:36.148815473Z" level=info msg="raft.node: 632639c85c597131 lost leader 155dcbc8adeaf7bc at term 4"
Mar 22 12:38:38 manager2 docker[17009]: time="2017-03-22T12:38:38.887993225Z" level=info msg="632639c85c597131 [term: 4] received a MsgVote message with higher term from 743461d627fc65f0 [term: 5]"
Mar 22 12:38:38 manager2 docker[17009]: time="2017-03-22T12:38:38.888079838Z" level=info msg="632639c85c597131 became follower at term 5"
Mar 22 12:38:38 manager2 docker[17009]: time="2017-03-22T12:38:38.888106566Z" level=info msg="632639c85c597131 [logterm: 3, index: 562, vote: 0] voted for 743461d627fc65f0 [logterm: 3, index: 562] at term 5"
Mar 22 12:38:38 manager2 docker[17009]: time="2017-03-22T12:38:38Z" level=info msg="Firewalld running: false"
Mar 22 12:38:38 manager2 docker[17009]: time="2017-03-22T12:38:38.918918831Z" level=info msg="raft.node: 632639c85c597131 elected leader 743461d627fc65f0 at term 5"

Manager3 logs (after turn on it; containers name have been anonymized):

Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.114217712Z" level=info msg="Loading containers: start."
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.148952897Z" level=info msg="Firewalld running: false"
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.344833356Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.376318307Z" level=info msg="Loading containers: done."
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.454096077Z" level=info msg="Listening for connections" addr="138.197.111.230:2377" proto=tcp
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.454034916Z" level=info msg="Listening for local connections" addr="/var/run/docker/swarm/control.sock" proto=unix
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.472185607Z" level=warning msg="ignoring request to join cluster, because raft state already exists"
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.472279306Z" level=info msg="155dcbc8adeaf7bc became follower at term 3"
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.472300706Z" level=info msg="newRaft 155dcbc8adeaf7bc [peers: [], term: 3, commit: 562, applied: 0, lastindex: 562, lastterm: 3]"
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.710376310Z" level=warning msg="failed to deactivate service binding for container grayloglb.1.0ed60wpz1naih1fcgyc11agcg" error="No such container: grayloglb.1.0ed60wpz1naih1fcgyc11agcg" module="node/agent/taskmanager" task.id=0ed60wpz1naih1fcgyc11agcg
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.710617247Z" level=warning msg="failed to deactivate service binding for container service1" error="No such container: service1" module="node/agent/taskmanager" task.id=0xa230m8qb2th4kr055giwpz1
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.711329734Z" level=warning msg="failed to deactivate service binding for container service2" error="No such container: service2" module="node/agent/taskmanager" task.id=48cz3jbqqbm1o82vl6w291dwx
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.711410152Z" level=warning msg="failed to deactivate service binding for container service3" error="No such container: service3" module="node/agent/taskmanager" task.id=4vtlair9ba8yegey0qifog5ov
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.712343431Z" level=warning msg="failed to deactivate service binding for container service4" error="No such container: service4" module="node/agent/taskmanager" task.id=e5wuru59h39lr7kkezwyxk6il
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.712826154Z" level=warning msg="failed to deactivate service binding for container service5" error="No such container: service5" module="node/agent/taskmanager" task.id=jq3aw48qmui5uffeb1h8sg8v3
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.712939780Z" level=warning msg="failed to deactivate service binding for container service6" error="No such container: service6" module="node/agent/taskmanager" task.id=jx8uu9e5t4c1nk0e5scaj8gs9
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.713085561Z" level=warning msg="failed to deactivate service binding for container service7" error="No such container: service7" module="node/agent/taskmanager" task.id=ke7uaxcj9p4noenr2jzs0qplh
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.713228195Z" level=warning msg="failed to deactivate service binding for container container8" error="No such container: container8" module="node/agent/taskmanager" task.id=kyk7382f5040hyafs8yqpe2c3
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.713494455Z" level=warning msg="failed to deactivate service binding for container container9" error="No such container: container9" module="node/agent/taskmanager" task.id=l76r7dy6j5rwh68kfl6k1fluh
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.714069272Z" level=warning msg="failed to deactivate service binding for container container10" error="No such container: container10" module="node/agent/taskmanager" task.id=m9tvj7unymvevnv9gg1zbmp3q
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.714453216Z" level=warning msg="failed to deactivate service binding for container container11" error="No such container: container11" module="node/agent/taskmanager" task.id=mzlq0u6vtcprqw6f35xjl8ha0
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.715012605Z" level=warning msg="failed to deactivate service binding for container container12" error="No such container: container12" module="node/agent/taskmanager" task.id=oetxm0ccnvsholm1ra2fxisa8
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.715209213Z" level=warning msg="failed to deactivate service binding for container container13" error="No such container: container13" module="node/agent/taskmanager" task.id=ofw0yrzgm6hebq34e2k57ngxg
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.715407476Z" level=warning msg="failed to deactivate service binding for container container14" error="No such container: container14" module="node/agent/taskmanager" task.id=pazzehd1a0dbtc6wyxfrukpjm
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.715474602Z" level=warning msg="failed to deactivate service binding for container container15" error="No such container: container15" module="node/agent/taskmanager" task.id=pfnpgx2f65jn2vdwebpcu0wv7
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.715726654Z" level=warning msg="failed to deactivate service binding for container container16" error="No such container: container16" module="node/agent/taskmanager" task.id=rprx2f8ct9ngu1up1mmhlqti5
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.715868444Z" level=warning msg="failed to deactivate service binding for container container17" error="No such container: container17" module="node/agent/taskmanager" task.id=s89mcnhxbngsqqhm2y9ynnldt
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.716386911Z" level=warning msg="failed to deactivate service binding for container container18" error="No such container: container18" module="node/agent/taskmanager" task.id=uanixrv8k6s7atkx0ccqp30vy
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.716512401Z" level=warning msg="failed to deactivate service binding for container container19" error="No such container: container19" module="node/agent/taskmanager" task.id=uq0vy2l5wgfvncri6be3k2ioi
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.716649488Z" level=warning msg="failed to deactivate service binding for container container20" error="No such container: container20" module="node/agent/taskmanager" task.id=urcpfsj2xhp0kos166w4u0fmd
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.717179733Z" level=warning msg="failed to deactivate service binding for container container21" error="No such container: container21" module="node/agent/taskmanager" task.id=wwnlptryjn0uzpisp5vp1dpzo
Mar 22 12:41:59 manager3 docker[1577]: time="2017-03-22T12:41:59.717711614Z" level=warning msg="failed to deactivate service binding for container container22" error="No such container: container22" module="node/agent/taskmanager" task.id=yhdb3zgx2wyfq95y1efkiwukg
Mar 22 12:42:03 manager3 docker[1577]: time="2017-03-22T12:42:03.775671639Z" level=info msg="Daemon has completed initialization"
Mar 22 12:42:03 manager3 docker[1577]: time="2017-03-22T12:42:03.777149797Z" level=info msg="Docker daemon" commit=60ccb22 graphdriver=aufs version=17.03.0-ce
Mar 22 12:42:03 manager3 docker[1577]: time="2017-03-22T12:42:03.785212221Z" level=info msg="API listen on [::]:2376"
Mar 22 12:42:03 manager3 docker[1577]: time="2017-03-22T12:42:03.785215785Z" level=info msg="API listen on /var/run/docker.sock"

I use the button 'Switch off' of DigitialOcean droplet. If you click on it, a note appears saying: "you power off your Droplet...".

Attaching "syslog" part of this machine when I switched it off:

Mar 22 12:38:31 manager3 systemd[1]: Stopped target Cloud-init target.
Mar 22 12:38:31 manager3 systemd[1]: Stopping ACPI event daemon...
Mar 22 12:38:31 manager3 systemd[1]: Stopped target Timers.
Mar 22 12:38:31 manager3 systemd[1]: Stopped Timer to automatically refresh installed snaps.
Mar 22 12:38:31 manager3 systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
Mar 22 12:38:31 manager3 systemd[1]: Stopped Daily apt activities.
Mar 22 12:38:31 manager3 systemd[1]: Stopped target System Time Synchronized.
Mar 22 12:38:31 manager3 systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Mar 22 12:38:31 manager3 systemd[1]: Stopped target Graphical Interface.
Mar 22 12:38:31 manager3 systemd[1]: Stopped Execute cloud user/final scripts.
Mar 22 12:38:31 manager3 systemd[1]: Stopped target Multi-User System.
Mar 22 12:38:31 manager3 systemd[1]: Stopping LSB: Record successful boot for GRUB...
Mar 22 12:38:31 manager3 systemd[1]: Stopping LSB: Set up cgroupfs mounts....
Mar 22 12:38:31 manager3 systemd[1]: Stopping docker.service...
Mar 22 12:38:31 manager3 docker[17023]: time="2017-03-22T12:38:31.450201942Z" level=info msg="Processing signal 'terminated'"
Mar 22 12:38:31 manager3 docker[17023]: time="2017-03-22T12:38:31.45026495Z" level=info msg="stopping containerd after receiving terminated"
Mar 22 12:38:31 manager3 systemd[1]: Stopping LSB: MD monitoring daemon...
Mar 22 12:38:31 manager3 systemd[1]: Stopping Deferred execution scheduler...
Mar 22 12:38:31 manager3 systemd[1]: Stopping LSB: automatic crash report generation...
Mar 22 12:38:31 manager3 systemd[1]: Stopping FUSE filesystem for LXC...
Mar 22 12:38:31 manager3 systemd[1]: Stopping LSB: Set the CPU Frequency Scaling governor to "ondemand"...
Mar 22 12:38:31 manager3 systemd[1]: Stopping Regular background program processing daemon...
Mar 22 12:38:31 manager3 systemd[1]: Stopping Login Service...
Mar 22 12:38:31 manager3 systemd[1]: Stopping D-Bus System Message Bus...
Mar 22 12:38:31 manager3 systemd[1]: Stopped target Login Prompts.
Mar 22 12:38:31 manager3 systemd[1]: Stopping Getty on tty1...
Mar 22 12:38:31 manager3 systemd[1]: Stopping Serial Getty on ttyS0...
Mar 22 12:38:31 manager3 systemd[1]: Stopping Authenticate and Authorize Users to Run Privileged Tasks...
Mar 22 12:38:31 manager3 systemd[1]: Stopped Apply the settings specified in cloud-config.
Mar 22 12:38:31 manager3 systemd[1]: Stopped target Cloud-config availability.
Mar 22 12:38:31 manager3 rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="1552" x-info="http://www.rsyslog.com"] exiting on signal 15.
@thaJeztah
Copy link
Member

ping @dongluochen @aaronlehmann

@aaronlehmann
Copy link
Contributor

I noticed that in your docker service ps output, the task on node1 was shut down around the same time, but not restarted again. Is that expected? Was there a change to the service around this time that caused it to no longer run on node1?

@le-ortega
Copy link
Author

Exactly, it is another issue that happens sometimes. I commented it here: #31377
In fact, logs are the same but I wanted to separate both issues. I haven't had any useful answer because it seems to be resolved in 17.03.0-ce, but I discovered it with that version.

Could you help me around problem with tasks are killed unexpectedly when I disconnect a server simulating that an unexpected problem occurs? Thank you very much in advance.

@aaronlehmann
Copy link
Contributor

I remember we had a similar problem long ago that was reproducible on DigitalOcean by "powering off" VMs: #25017 (cc @thaJeztah). The issue there was that agents could take a long time to realize that the old leader was down, because this way of shutting off the leader didn't sever the TCP connections, and the agents had to rely on a timeout to notice that the old leader was unavailable. Meanwhile, the new leader would only wait so long for the agents to connect to it, and eventually mark those nodes as down.

We seemed to have fixed this, but it's possible there's a regression, or some variant of the problem still exists. You mentioned that the problem is 100% reproducible, which I wouldn't expect if the underlying cause is agents not connecting to the new leader soon enough. But it would still be useful to check the docker node ls output while restarting the old leader, and see if any node changes state to DOWN. This would indicate that the node is taking too long to recognize the manager failure or connect to the new leader.

Are you running the same version of Docker on all nodes?

I'm also looking into the issue with unexpected tasks in the "pending" state.

@le-ortega
Copy link
Author

Yes, I create the machines of my cluster with docker-machine and all nodes have the same version.

I have reproduced it today again and the output before switching off swarm leader is:

root@manager2:~# docker node ls
ID                           HOSTNAME     STATUS  AVAILABILITY  MANAGER STATUS
dn7fghnc87nqlaz4givtl33u3 *  manager2     Ready   Active        Reachable
lf1hxwrv43gw43iir0ym76u34    node1        Ready   Active
lk4hrx4m7srfmvysgthjalfvs    manager1     Ready   Active        Leader
mbs9qw8xrj9tf847rttt8hnvh    node2        Ready   Active
zhckmd1o9ilefb528s26o3lcd    manager3     Ready   Active        Reachable

Switching off 'manager1':

root@manager2:~# docker node ls
ID                           HOSTNAME     STATUS   AVAILABILITY  MANAGER STATUS
dn7fghnc87nqlaz4givtl33u3 *  manager2     Ready    Active        Reachable
lf1hxwrv43gw43iir0ym76u34    node1        Unknown  Active
lk4hrx4m7srfmvysgthjalfvs    manager1     Unknown  Active        Unreachable
mbs9qw8xrj9tf847rttt8hnvh    node2        Unknown  Active
zhckmd1o9ilefb528s26o3lcd    manager3     Ready    Active        Leader

After 10~15 seconds, 'node1' and 'node2' are 'Ready' again (but some tasks were killed):

root@manager2:~# docker node ls
ID                           HOSTNAME     STATUS   AVAILABILITY  MANAGER STATUS
dn7fghnc87nqlaz4givtl33u3 *  manager2     Ready    Active        Reachable
lf1hxwrv43gw43iir0ym76u34    node1        Ready    Active
lk4hrx4m7srfmvysgthjalfvs    manager1     Unknown  Active        Unreachable
mbs9qw8xrj9tf847rttt8hnvh    node2        Ready    Active
zhckmd1o9ilefb528s26o3lcd    manager3     Ready    Active        Leader

In fact, I have created a new cluster for performing the checks you told me and it is happening with 17.03.1-ce also:

root@manager2:~# docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Fri Mar 24 00:45:26 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Fri Mar 24 00:45:26 2017
 OS/Arch:      linux/amd64
 Experimental: false

@aaronlehmann
Copy link
Contributor

Thanks for the response. I spent some time trying to reproduce this, and I found an issue that could cause containers to wrongly be restarted (moby/swarmkit#2091), but unfortunately I think it's different than what you're seeing. I could only trigger it by pausing manager daemons, not by shutting them down completely.

It occurred to me that enabling debug logging on all nodes (-D option to Docker daemon) might help pinpoint what's happening. That adds a lot of information about heartbeats, task assignments, and so on.

@le-ortega
Copy link
Author

I have created a new cluster and I have enabled debug logging in all nodes.

root@test-manager1:/var/log# docker node ls
ID                           HOSTNAME       STATUS  AVAILABILITY  MANAGER STATUS
1bu3pdze3crzohp7dro617h03    test-manager3  Ready   Active        Reachable
8qpfwhdtpg3iy8yvwwkevero2    test-node2     Ready   Active
e1aujbptdxjrehvqiqgr0ng9n    test-manager2  Ready   Active        Reachable
shcb2gfg5lkrdm6zdv81gc0xw    test-node1     Ready   Active
vlqkq8i65siv6pk8tiolcnmzu *  test-manager1  Ready   Active        Leader

After shutting down 'manager1', some containers of 'node2' received SIGTERM

root@test-node2:~# docker ps -a|grep Exited
9f109416fe30        127.0.0.1:5000/xxxx@sha256:997fa9b6bc0d2a93068aaa7ac17101679f0168959b8a975d9077538fd0208e38                "/wait_for_it.sh p..."   23 minutes ago      Exited (0) 5 seconds ago   service13.8qpfwhdtpg3iy8yvwwkevero2.hdmrjgodfafwoucf7eyxvearu
bc99c51a45c6        127.0.0.1:5000/xxxx@sha256:997fa9b6bc0d2a93068aaa7ac17101679f0168959b8a975d9077538fd0208e38                "/wait_for_it.sh p..."   23 minutes ago      Exited (0) Less than a second ago   service2.8qpfwhdtpg3iy8yvwwkevero2.muzutnnasddzm6tpfe3kv62b9
c8f047d2d370        127.0.0.1:5000/cadvisor@sha256:84413400487446ca7dcae38e1d2980b752ffe6e4d8ec07c8872af00fd893a909              "/usr/bin/cadvisor..."   23 minutes ago      Exited (0) 7 seconds ago    cadvisor.8qpfwhdtpg3iy8yvwwkevero2.84dz5ja26urloqibqof06b8xc
ad228dfe7431        127.0.0.1:5000/yyyy@sha256:9636d8ca2e5dce7b86207652de9a519c88ff0f61d0c538396a61de0fa7304bc2   "/wait_for_it.sh 1..."   36 minutes ago      Exited (137) Less than a second ago    service3.8qpfwhdtpg3iy8yvwwkevero2.pnmcq3lq2dsf8mnabhvd4m0rc
5a0a7f5e4718        127.0.0.1:5000/consul@sha256:f751c9714d5aeede1b859f2ca8a95547e14fb9088823126316f48e54da568ef6                "supervisord -c /s..."   36 minutes ago      Exited (0) 9 seconds ago   consulagent.8qpfwhdtpg3iy8yvwwkevero2.y0rw9wxkbukgqiahlswvp61fq
root@test-node2:~# docker logs -f 9f109416fe30
....
....
....
2017-04-05 10:09:06,662 INFO success: daemon entered RUNNING state, process has stayed up for > than 60 seconds (startsecs)
2017-04-05 10:31:05,247 WARN received SIGTERM indicating exit request
2017-04-05 10:31:05,247 INFO waiting for jmx-exporter, daemon, ntpd to die
2017-04-05 10:31:05,248 INFO stopped: jmx-exporter (terminated by SIGTERM)
Stopping workermanager
stopping workermanager
2017-04-05 10:31:05,850 CRIT reaped unknown pid 188)
2017-04-05 10:31:08,854 INFO waiting for daemon, ntpd to die
2017-04-05 10:31:10,339 INFO stopped: daemon (exit status 0)
2017-04-05 10:31:10,341 INFO stopped: ntpd (exit status 0)

'test-manager1' was switched off at 10:30:50, so I have attached logs from the next line this point. I can reproduce it always, I hope those logs can help you.

logs.zip

@aaronlehmann
Copy link
Contributor

aaronlehmann commented Apr 5, 2017

Thanks. These logs are really helpful.

Do you have any services with dynamic published ports (for example -p 80)? It occurred to me that you may be hitting #29247, which was has been fixed in master but not the 17.03.x series. I could see that bug causing the service's Endpoint field to be changed when the leader switchover happens. This would make the old tasks look out-of-date, because their Endpoint field doesn't match the service.

If you inspect the service before and after the leader switchover, do you see any changes in Ports? If you inspect the old task that was shut down and the one that replaced it, do you see any differences in the Networks or Endpoints sections? (If you end up checking, posting the JSON output of docker inspect for the service before and after the leader switchover, and docker inspect for the old and new tasks might be helpful).

If this is not #29247, I suspect it's a similar allocator issue. The allocator making a change to the service or the task when it initializes could cause the task to be replaced.

cc @yongtang

@aluzzardi
Copy link
Member

Adding area/testing - this should have been caught by automated leader re-election tests (/cc @dhiltgen @dongluochen)

@le-ortega
Copy link
Author

I have reproduced it again and I attach here the output of docker inspect before and after shutting down leader.
service_task_container_logs.zip

@aaronlehmann
Copy link
Contributor

Hi @le-ortega: thanks so much for your help and patience so far. I've been looking into this but I've been a bit swamped today. I'm hoping to get back to it early next week.

@aaronlehmann
Copy link
Contributor

I think I found the problem! Opened moby/swarmkit#2113

@aaronlehmann
Copy link
Contributor

@thaJeztah @vieux: The fix is being in swarmkit master and being backported for 17.05. Should it also be backported to 17.03.x?

@thaJeztah thaJeztah added this to the 17.05.0 milestone Apr 13, 2017
@aaronlehmann
Copy link
Contributor

The fix was merged in #32576

@thaJeztah thaJeztah added this to Triage in 17.03.2-maybe Apr 14, 2017
@thaJeztah thaJeztah moved this from Triage to Pick in 17.03.2-maybe May 9, 2017
@thaJeztah thaJeztah removed this from Pick in 17.03.2-maybe May 9, 2017
@thaJeztah
Copy link
Member

closing this one; we're looking at doing a patch release for 17.03, and including the fix there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants