You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.
Docker API freezing, journal errors: "failed to create endpoint $container on network bridge: failed to find host side interface veth$id: Link not found"
#1731
Closed
ahmetb opened this issue
Dec 27, 2016
· 2 comments
I have about 15 systemd timers (and their service units) that all all start the same container with different parameters. It runs a Go program, which would run for a few seconds and exit. I start the containers with docker run --rm in the .service unit file.
I'm expecting to use systemd timers to be able to run cron-like jobs on the docker engine. Actual behavior below.
Reproduction Steps (i.e. my setup)
It's a bit hard for me to give a working repro but I'll go over the use case. (I can provide access to a live repro machine if needed). I have:
a .service unit
a .timer unit calling the .service every N minutes
the service starts a docker container with --rm
Example .service:
$ cat pd-lastfm.service
[Unit]
Description=Scheduled personal dashboard task for 'lastfm'
[Service]
Type=oneshot
ExecStart=/usr/bin/docker run --rm -v /home/core/personal-dashboard/config.toml:/etc/personal-dashboard/config.toml -v /home/core/personal-dashboard/google-credentials.json:/etc/personal-dashboard/google-credentials.json -e GOOGLE_APPLICATION_CREDENTIALS=/etc/personal-dashboard/google-credentials.json ahmet/personal-dashboard lastfm
Both the timer and the service are registered/enabled in systemd. It works fine, until it stops working fine after a few runs and freezes the docker API.
Actual Behavior
docker ps (and other commands calling the API) stops responding. It just freezes. When I prepend strace it gets frozen on the docker socket indefinitely:
Dec 25 05:39:36 core dockerd[14688]: time="2016-12-25T05:39:36.835572926Z" level=info msg="New containerd process, pid: 14711\n"
Dec 25 05:39:38 core dockerd[14688]: time="2016-12-25T05:39:38.052757245Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Dec 25 05:39:38 core dockerd[14688]: time="2016-12-25T05:39:38.069980146Z" level=info msg="Firewalld running: false"
Dec 25 05:39:38 core dockerd[14688]: time="2016-12-25T05:39:38.204778397Z" level=info msg="Removing stale sandbox 86293020c451cbbd4e8e71058e12b0e279ac887b685b1b3ace1284a4b38eedb2 (2a680ff1c4d28d411f0ac2a26868bb2c87b65bf896ec8c48253e680df684c764)"
Dec 25 05:39:38 core dockerd[14688]: time="2016-12-25T05:39:38.241944720Z" level=info msg="Removing stale sandbox b10b0413485a2482788f0177e0590ed6301458c4e35bc5b19dc39047bd7e92ee (5331e2c7f17aab38462602fcdae45c65efc7898856388d62387dded40d13e552)"
Dec 25 05:39:38 core dockerd[14688]: time="2016-12-25T05:39:38.278733320Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Dec 25 05:39:38 core dockerd[14688]: time="2016-12-25T05:39:38.338560342Z" level=info msg="Loading containers: start."
Dec 25 05:39:50 core dockerd[14688]: .....................time="2016-12-25T05:39:50.431136966Z" level=warning msg="error locating sandbox id 86293020c451cbbd4e8e71058e12b0e279ac887b685b1b3ace1284a4b38eedb2: sandbox 86293020c451cbbd4e8e71058e12b0e279ac887b685b1b3ace1284a4b38eedb2 not found"
Dec 25 05:39:50 core dockerd[14688]: time="2016-12-25T05:39:50.433652624Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/2a680ff1c4d28d411f0ac2a26868bb2c87b65bf896ec8c48253e680df684c764/shm: invalid argument"
Dec 25 05:39:50 core dockerd[14688]: time="2016-12-25T05:39:50.435367769Z" level=warning msg="error locating sandbox id b10b0413485a2482788f0177e0590ed6301458c4e35bc5b19dc39047bd7e92ee: sandbox b10b0413485a2482788f0177e0590ed6301458c4e35bc5b19dc39047bd7e92ee not found"
Dec 25 05:39:50 core dockerd[14688]: time="2016-12-25T05:39:50.436346915Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/5331e2c7f17aab38462602fcdae45c65efc7898856388d62387dded40d13e552/shm: invalid argument"
Dec 25 05:39:50 core dockerd[14688]: time="2016-12-25T05:39:50.972693231Z" level=error msg="Failed to start container 5331e2c7f17aab38462602fcdae45c65efc7898856388d62387dded40d13e552: rpc error: code = 6 desc = \"mkdir /run/containerd/5331e2c7f17aab38462602fcdae45c65efc7898856388d62387dded40d13e552: file exists\""
Dec 25 05:39:51 core dockerd[14688]: time="2016-12-25T05:39:51.078254376Z" level=error msg="Failed to start container 2a680ff1c4d28d411f0ac2a26868bb2c87b65bf896ec8c48253e680df684c764: rpc error: code = 6 desc = \"mkdir /run/containerd/2a680ff1c4d28d411f0ac2a26868bb2c87b65bf896ec8c48253e680df684c764: file exists\""
Dec 25 05:39:51 core dockerd[14688]: time="2016-12-25T05:39:51.926941377Z" level=info msg="Loading containers: done."
Dec 25 05:39:51 core dockerd[14688]: time="2016-12-25T05:39:51.928236521Z" level=info msg="Daemon has completed initialization"
Dec 25 05:39:51 core dockerd[14688]: time="2016-12-25T05:39:51.929131351Z" level=info msg="Docker daemon" commit=bac3bae graphdriver=overlay version=1.11.2
Dec 25 05:39:52 core dockerd[14688]: time="2016-12-25T05:39:52.097980019Z" level=info msg="API listen on /var/run/docker.sock"
Dec 25 06:00:04 core dockerd[14688]: time="2016-12-25T06:00:04.304826683Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/be1652bc1c7e77137db6b1ad2b85174d2fb72c1d0b4363778860b82984dbee10/shm: no such file or directory"
Dec 25 06:00:04 core dockerd[14688]: time="2016-12-25T06:00:04.306358422Z" level=error msg="Handler for POST /v1.23/containers/be1652bc1c7e77137db6b1ad2b85174d2fb72c1d0b4363778860b82984dbee10/start returned error: failed to create endpoint mad_turing on network bridge: failed to find host side interface veth6b16d1b: Link not found"
Dec 25 14:30:00 core dockerd[14688]: time="2016-12-25T14:30:00.929993644Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/a44cd5ec2a6f766e508431392b9d4b0709a9eae3a71a44720192e330687a3e74/shm: no such file or directory"
Dec 25 14:30:00 core dockerd[14688]: time="2016-12-25T14:30:00.931014252Z" level=error msg="Handler for POST /v1.23/containers/a44cd5ec2a6f766e508431392b9d4b0709a9eae3a71a44720192e330687a3e74/start returned error: failed to create endpoint tiny_blackwell on network bridge: failed to find host side interface vethad8591e: Link not found"
Dec 26 03:00:01 core dockerd[14688]: time="2016-12-26T03:00:01.190455722Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/726ab059bd227a63d06cb7e2c64b1e50f9999a6db0f5d61dbb29295426ebea72/shm: no such file or directory"
Dec 26 03:00:01 core dockerd[14688]: time="2016-12-26T03:00:01.193196721Z" level=error msg="Handler for POST /v1.23/containers/726ab059bd227a63d06cb7e2c64b1e50f9999a6db0f5d61dbb29295426ebea72/start returned error: failed to create endpoint dreamy_kalam on network bridge: failed to find host side interface vethf862218: Link not found"
Dec 26 06:00:00 core dockerd[14688]: time="2016-12-26T06:00:00.851085977Z" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/8dd7878a15cbcd6df2911fd540feae6be45de224585d7c7ddf1c1992317b0b75/shm: no such file or directory"
Dec 26 06:00:00 core dockerd[14688]: time="2016-12-26T06:00:00.852241613Z" level=error msg="Handler for POST /v1.23/containers/8dd7878a15cbcd6df2911fd540feae6be45de224585d7c7ddf1c1992317b0b75/start returned error: failed to create endpoint hopeful_mccarthy on network bridge: failed to find host side interface vetha9988bb: Link not found"
Dec 26 18:20:01 core dockerd[14688]: time="2016-12-26T18:20:01Z" level=error msg="containerd: deleting container" error="wait: no child processes"
Dec 27 02:00:02 core dockerd[14688]: time="2016-12-27T02:00:02Z" level=error msg="containerd: deleting container" error="wait: no child processes"
I have about 13 of these timers running docker run --rm ..., some of them are working fine, however, some of them are just getting stuck past their due (possibly correlated with docker API freezing, when one syptom happens, the other one is often there, too). And this happens very often, like multiple times a day. See the "ago" values in "LEFT" column below:
$ systemctl list-timers --all
NEXT LEFT LAST PASSED UNIT ACTIVATES
Sun 2016-12-25 14:00:00 UTC 1 day 14h ago Sun 2016-12-25 14:00:04 UTC 1 day 14h ago pd-reporting.timer pd-reporting.service
Sun 2016-12-25 15:15:00 UTC 1 day 13h ago Sun 2016-12-25 15:15:00 UTC 1 day 13h ago pd-jawbone-caffeine.timer pd-jawbone-caffeine.service
Mon 2016-12-26 07:59:00 UTC 20h ago Mon 2016-12-26 07:59:00 UTC 20h ago pd-twitter-tweets.timer pd-twitter-tweets.service
Mon 2016-12-26 08:30:00 UTC 19h ago Mon 2016-12-26 08:30:00 UTC 19h ago pd-jawbone-heartrate.timer pd-jawbone-heartrate.service
Mon 2016-12-26 13:45:00 UTC 14h ago Mon 2016-12-26 13:45:00 UTC 14h ago pd-jawbone-steps.timer pd-jawbone-steps.service
Mon 2016-12-26 14:59:00 UTC 13h ago Mon 2016-12-26 14:59:00 UTC 13h ago pd-strava.timer pd-strava.service
Mon 2016-12-26 19:40:00 UTC 8h ago Mon 2016-12-26 19:40:00 UTC 8h ago pd-lastfm.timer pd-lastfm.service
Tue 2016-12-27 04:30:00 UTC 7min left Tue 2016-12-27 04:15:01 UTC 7min ago pd-jawbone-sleeps.timer pd-jawbone-sleeps.service
Tue 2016-12-27 04:40:00 UTC 17min left Tue 2016-12-27 04:20:00 UTC 2min 11s ago pd-wakatime.timer pd-wakatime.service
Tue 2016-12-27 04:59:00 UTC 36min left Tue 2016-12-27 03:59:00 UTC 23min ago pd-foursquare.timer pd-foursquare.service
Tue 2016-12-27 04:59:00 UTC 36min left Tue 2016-12-27 03:59:00 UTC 23min ago pd-github.timer pd-github.service
Tue 2016-12-27 04:59:00 UTC 36min left Tue 2016-12-27 03:59:00 UTC 23min ago pd-twitter-followers.timer pd-twitter-followers.service
Tue 2016-12-27 12:52:33 UTC 8h left Tue 2016-12-27 00:52:33 UTC 3h 29min ago rkt-gc.timer rkt-gc.service
Wed 2016-12-28 00:00:00 UTC 19h left Tue 2016-12-27 00:00:00 UTC 4h 22min ago logrotate.timer logrotate.service
Wed 2016-12-28 01:07:23 UTC 20h left Tue 2016-12-27 01:07:23 UTC 3h 14min ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
n/a n/a n/a n/a update-engine-stub.timer update-engine-stub.service
I do ps aux | grep docker, I see the list of commands run by my frozen timers:
htop also shows the frozen /usr/bin/docker (client) processes. Also, my docker program's entrypoint does not appear in the top output, so I'm guessing my app is not responsible for the issue.
journal logs of a .service whose /usr/bin/docker entrypoint has the last log line "Starting" from systemd (see when the timer ran it at 19:40 below):
$ journalctl -fu pd-lastfm.service
-- Logs begin at Thu 2016-11-03 16:24:07 UTC. --
Dec 26 19:00:01 core docker[25558]: task=lastfm git=v1.0-33-g08bf33d time=2016-12-26T19:00:01Z msg="parsed response" songs=0
Dec 26 19:00:01 core docker[25558]: task=lastfm git=v1.0-33-g08bf33d time=2016-12-26T19:00:01Z msg="saved measurement"
Dec 26 19:00:02 core systemd[1]: Started Scheduled personal dashboard task for 'lastfm'.
Dec 26 19:20:00 core systemd[1]: Starting Scheduled personal dashboard task for 'lastfm'...
Dec 26 19:20:01 core docker[25907]: task=lastfm git=v1.0-33-g08bf33d time=2016-12-26T19:20:01Z msg=starting
Dec 26 19:20:01 core docker[25907]: task=lastfm git=v1.0-33-g08bf33d time=2016-12-26T19:20:01Z msg="querying last.fm API"
Dec 26 19:20:01 core docker[25907]: task=lastfm git=v1.0-33-g08bf33d time=2016-12-26T19:20:01Z msg="parsed response" songs=2
Dec 26 19:20:01 core docker[25907]: task=lastfm git=v1.0-33-g08bf33d time=2016-12-26T19:20:01Z msg="saved measurement"
Dec 26 19:20:01 core systemd[1]: Started Scheduled personal dashboard task for 'lastfm'.
Dec 26 19:40:00 core systemd[1]: Starting Scheduled personal dashboard task for 'lastfm'...
machine resource stats seem normal, no indication of high memory/load, plus other timers keep working, however certain ones are stuck. so the systemd timer can't kill/reschedule the job.
The text was updated successfully, but these errors were encountered:
Issue Report
Bug
CoreOS Version
Environment
DigitalOcean VM.
Expected Behavior
I have about 15 systemd timers (and their service units) that all all start the same container with different parameters. It runs a Go program, which would run for a few seconds and exit. I start the containers with
docker run --rm
in the .service unit file.I'm expecting to use systemd timers to be able to run cron-like jobs on the docker engine. Actual behavior below.
Reproduction Steps (i.e. my setup)
It's a bit hard for me to give a working repro but I'll go over the use case. (I can provide access to a live repro machine if needed). I have:
Example .service:
Example .timer:
Both the timer and the service are registered/enabled in systemd. It works fine, until it stops working fine after a few runs and freezes the docker API.
Actual Behavior
docker ps
(and other commands calling the API) stops responding. It just freezes. When I prependstrace
it gets frozen on the docker socket indefinitely:docker.service
has logs like this in the journal:docker run --rm ...
, some of them are working fine, however, some of them are just getting stuck past their due (possibly correlated with docker API freezing, when one syptom happens, the other one is often there, too). And this happens very often, like multiple times a day. See the "ago" values in "LEFT" column below:ps aux | grep docker
, I see the list of commands run by my frozen timers:htop
also shows the frozen /usr/bin/docker (client) processes. Also, my docker program's entrypoint does not appear in the top output, so I'm guessing my app is not responsible for the issue.The text was updated successfully, but these errors were encountered: