Docker logging blocks and never recovers #22502

relistan · 2016-05-04T15:06:33Z

You can permanently block a production container by listening too slowly on a log socket with any client. Docker eventually blocks the container process, and it never recovers. The container will remain blocked on stdout/stderr unless it's handling output asynchronously and dealing with output overruns on its own. This is new behavior introduced in November, which utterly changes the behavior of logging in Docker. It is not even reflected in the changelogs.

Output of docker version:

Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 20
 Running: 20
 Paused: 0
 Stopped: 0
Images: 26
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker-202:16-4407302-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 9.992 GB
 Data Space Total: 107.4 GB
 Data Space Available: 97.38 GB
 Metadata Space Used: 11.25 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.136 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /mnt/var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /mnt/var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.77 (2012-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 3.13.0-83-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 29.4 GiB
Name: ip-10-3-13-191

Additional environment details (AWS, VirtualBox, physical, etc.):

Running on AWS, repeatable on VMWare

Steps to reproduce the issue:

Simulate a high throughput production app with some heavy timestamp logging: docker run -d busybox sh -c "while true; do date; done" . Note the container ID for later.
Fetch blockingreader.go from here . This connects up to a container, sets up some pipes, and then never reads from them, letting Docker buffer internally.
go get github.com/fsouza/go-dockerclient
docker logs --tail=2 -f <your_container_id> in another terminal
go run blockingreader.go <your_container_id>
Watch the log output for a while and the app will stop logging once it fills the buffer and blocks. You can't tell from just busybox but the app is not just not logging, it's actually totally blocked. You could hook GDB up to it if you really want to see. We took stack dumps of our app to prove it. This is bad because you've now blocked a production app with a client connection to the logs.
CTRL-C or kill or your blockingreader process which will release the Docker client socket and you would expect to see Docker unblock the container. In extensive testing, this does not happen. Even when the blocked reader is gone, Docker continues to block the container.

Describe the results you received:
Docker eventually blocks on high output logging if any client is slow and the internal logging buffer fills. This was not expected. It then also does not release the container when the socket is disconnected. This was really not expected.

Describe the results you expected:
Docker would continue to accept logging output from containerized applications without blocking (or leaking GBs of RAM like it used to). I would have expected a ring buffer implementation or some kind of sliding window without blocking at all.

Additional information you deem important (e.g. issue happens only occasionally):
100% repeatable. We tripped into this issue because we use Mozilla's Heka to log all of our container output. Heka connects to the Docker logs endpoint and sometimes becomes slow under heavy logging. In this way you can take out an entire server of Docker containers because of the permanent blocking behavior of the logs.

The text was updated successfully, but these errors were encountered:

thaJeztah · 2016-05-04T16:16:38Z

ping @aaronlehmann @unclejack PTAL

tonistiigi · 2016-05-04T16:30:01Z

Can you test with v1.11, very similar issue of what you are describing was fixed with #21840, at least this should fix the behavior after ctrl-c.

relistan · 2016-05-04T17:50:06Z

@tonistiigi I don't have access to 1.11 at the moment, will have to test tomorrow (I'm on UTC+1). You should be able to repeat the steps above in a minute or two if you have a 1.11 to test against.

tonistiigi · 2016-05-04T18:40:55Z

@relistan I tested with your script and #21840 does seem to solve this. Note that point 6 behavior is somewhat by-design and is not fixed. If you have one reader that is very slow and one that is very fast we first buffer to memory but when this buffer fills up we are going to rely on the slower reader speed(0 in your case). If we didn't do that we would just run out of memory. Maybe we could provide something to tweak the buffer size. The bug in v1.10 is that once you close the slow reader the fast one remains blocked. In v1.11 docker logs will start showing new output after blockingreader.go is closed.

relistan · 2016-05-04T21:12:43Z

Thanks for testing @tonistiigi. I don't believe that is the right behavior in item 6 above, even if it's by design. Docker should drop log messages to the slow reader, not block the containerized app itself. Note that in item 6 it's not just the slow reader that is blocked, it's the whole stdout of the container. That's a big change in functionality vs 1.9 and also not very friendly behavior in a production environment. For example, someone watching the docker logs command over a slow link should not be able to block a production application.

tonistiigi · 2016-05-04T21:15:03Z

cc @aaronlehmann

aaronlehmann · 2016-05-04T21:22:14Z

I don't think dropping data from the log stream would necessarily be the correct behavior. This could break the formatting of the logs and lead to unexpected gaps in the log data. Perhaps it would be reasonable as an option. Alternatively, the buffer size could be made configurable.

benoahriz · 2016-05-04T21:38:27Z

@aaronlehmann so you are saying its acceptable behavior to block after a buffer is reached in a production environment? Dropping data or corrupting the formatting is definitely much better than blocking(thus making the app unusable) would it be feasible to at least have this as a config option to drop instead of block in such cases?

relistan · 2016-05-04T22:09:25Z

Isn't there a separate buffer for each reader? That's what it looks like from the code. So if I have the daemon logging to a file, and two clients watching docker logs then I have at least three buffers, no? So why would we block the other readers and the application itself when one of the docker logs calls is slow, rather than simply dropping data on the slow link? Honestly, if Docker is going to block the whole container on stdout/stderr for one slow reader, then it is unsafe to ever use docker logs in production.

tonistiigi · 2016-05-04T22:52:14Z

correctness > speed. Something consuming the data and getting the bogus input is potentially far more dangerous than the decrease in speed. The buffer there is implemented to handle this problem and the size is something we thought was unreachable for any normal usage. Of course usage may vary, so configuring the size or using disk as a cache are solutions here.

I don't think this has anything to do with logs or the logging speed, the problem appears because there is an attach stream that refuses to read data. (Of course if the logdriver was considerably slower than attach stream then this would be opposite).

There can be 2 buffers per io stream. One if there is an attach stream and other if a log driver is attached. New docker log commands don't create new buffers.

relistan · 2016-05-05T07:31:32Z

@tonistiigi we're talking about logs here, right? Having worked on production systems for over 20 years I can't remember a time when I would have preferred to have complete logs over having the application up and running. The only exception being audit logging, something generally not entrusted to generalized logging anyway. This is the reason many people don't log to disk, and why traditional stream logging was done over UDP, for example.

I know you said you don't think it's about logging speed, but it is. The tool I provided above is the fastest way to make it happen so it doesn't read anything. In the production environment where this hit us, Heka was consuming the stream, but it had gotten backed up and was slow. I know this because we replaced our log appender with an asynchronous one. We then attached metrics to the remaining buffer size for our asynchronous appender. You can see it go up and down as Heka consumes messages. Of course since we're on 1.10.3 it eventually sits at low water mark forever because the Docker engine stops consuming.

Making the buffer bigger would make this happen less frequently perhaps, but it is not a solution. Also no one has addressed that this was a fundamental change in the Docker engine and that it wasn't in the changelogs. I know this is hard to get right all the time, but it should be retro-actively be added to the appropriate changelog so that anyone having not already upgraded can see it.

If there are actually people who prefer this behavior, can we have a CLI flag to switch it? If not, I'm not sure what logging solution I'll recommend in production that leverages the Docker daemon, including in "Docker: Up and Running". The only other way around it is to use a shim like we used to with the Spotify logger or some other supervisory process in the container that does the right thing.

ripienaar · 2016-05-05T10:46:29Z

correctness > speed indeed, a production application thats blocked and unusuable because a peripheral system is having a bad time is not correct - it's completely dead and unusable in a way that might impact customers.

logging is peripheral to the primary function of the application.

This behaviour leads to a system that as soon as you flag an app in docker to enable debug level logging for a while then in the case where the debugging is volumous the system under debug becomes unusable - and impossible to debug. Operationally you'd much rather sacrifice logging integrity.

Systems where logging is so important as to be a critical function of the app will/should not be doing that logging to STD{OUT,ERR}, so sacrificing integrity of STD{OUT,ERR} logs seems reasonable to me over sacrificing the entire app.

ptinsley · 2016-05-05T11:41:30Z

+1 @ripienaar @relistan

I understand there are applications where it is absolutely critical that no log is lost. I would argue that is by far outweighed by applications that are more than happy to shed a few logs to survive a heavy load event.

If it were configurable you could give people the option of 100% log capture but I would say the default should be to throw logs on the floor for a consumer with a full buffer.

rnelson0 · 2016-05-05T11:52:44Z

Something consuming the data and getting the bogus input is potentially far more dangerous than the decrease in speed.

Someone able to cause a denial of service by using a slow log reader is even more dangerous, as "blocked" in this case is just another way to say DoS. In the field, one simply never assumes that syslogs are 100% complete because syslog, like ICMP, is one of the first things that is dropped to ensure the responsiveness of the service.

Some systems note "## syslog events were dropped" so that purposeful gaps can be differentiated from gaps caused by communication interruption. That may be helpful here to some extent.

cpuguy83 · 2016-05-05T17:45:55Z

@rnelson0 In this case, a slow reader could have caused unbounded buffer growth, eventually causing the entire daemon to crash.
Crashing the daemon is much worse than blocking a single application.

A slow docker logs consumer would not even be able to block the application.
The reason the application would get blocked is because of https://github.com/docker/docker/blob/master/pkg/broadcaster/unbuffered.go#L27
docker logs does not use this.
Only the logging driver itself (for writing logs to the log service) and docker attach would be in there.

rnelson0 · 2016-05-05T17:59:46Z

Sorry if it was unclear, I was making analogies between a log reader dropping logs and how syslogs are treated, but not trying to directly relate them.

tonistiigi · 2016-05-05T18:23:18Z

The speed of a docker logs consumer does not have an impact on the container stdio write speed. What happens to the logs depends on the behavior of the log driver. If you use syslog driver and can configure it to drop logs on syslog buffer filling up it should start to drop them. JSON-file logger that is the default will not drop logs and never did before v1.10.

relistan · 2016-05-05T18:54:45Z

@cpuguy83 @tonistiigi Are you suggesting that the reproduction steps above do not cause the application to be blocked? It sounds like you're now saying that this bug is not valid. It seems that #21840 also proves that this can happen. I suggest that you try it on 10.1.3 with the reproduction steps above, it definitely blocks. And, we have production evidence that it does. I thought we were all agreed on the premise at least and simply arguing the correctness of the implementation.

@tonistiigi the JSON logger never dropped logs before v1.10 but it did not block the container, either. It had a different issue: it just blew up in memory. That was very bad behavior but at least in production we could manage around it: you had a little time before things crapped out and you could monitor for it. What we see now is the container just blocks. If you believe the write speed should not be affected then it looks likely that there is instead a concurrency issue somewhere that causes that effect.

The problem with all of the other drivers, including the syslog driver, is that you give up the ability to look at the log stream from a single container with docker logs. That has, up to now, been the advantage of running the JSON-file logger with Heka or some other process consuming the output stream from a socket.

cpuguy83 · 2016-05-05T20:15:12Z

@relistan The example is contrived. This would only happen with either a blocked/slow logging driver (note this is a driver not docker logs consumers`) or by a blocked/slow attached client.

(Some) Logging drivers can be configured for async logging, this is your choice to risk dropping logs.

Issues with recovery after unblocking (e.g. by closing down the client connection) are fixed in 1.11.
I was able to reproduce the recovery bug on master (not on 1.11), so I'm investigating this regression.

I could see about implementing a way to make sure that attached clients can't block, but this would also be a significant change for some.

cpuguy83 · 2016-05-05T20:35:59Z

And I take that back, I'm not seeing the recovery issue on master.

tonistiigi · 2016-05-05T20:48:22Z

@relistan docker logs and docker attach have very different behavior. docker attach directly consumes output to the client(limiting throughput using a 1M buffer if the client link is slow). There is a bug in v1.10 for docker attach, I reproduced it and it is already fixed in v1.11. docker logs is different and reads already logged messages back from the driver. This shouldn't have any other purpose in your reproduce script other than you use this output to verify that the streams are being blocked by another component. So if you don't run your production apps in foreground over a slow link this should not affect you.

mjvaldez · 2016-05-05T23:48:52Z

@cpuguy83 @tonistiigi the behavior of docker logs is not the concern. what are you suggesting the behavior of the container and daemon should be with an attached blocked/slow client?

cpuguy83 · 2016-05-05T23:52:15Z

@mjvaldez Attached clients get a 1MB buffer. This is a rather large buffer and clients would probably have to just not be reading from the attach stream at all to hit this buffer limit.

In practice, in any environment, why is something attached to a container in such a way?

mjvaldez · 2016-05-06T00:09:49Z

@cpuguy83 please see comments above where relistan describes the use case and ramifications of the behavior we saw in production.

cpuguy83 · 2016-05-06T00:14:07Z

@mjvaldez AFAICT this is centered around logging. Can you explain how you have logging setup?
Docker does have support for async logging in some of its drivers, which seems like you might want to utilize here.

tonistiigi · 2016-05-06T00:19:28Z

@relistan What version of mozilla-heka do you use? Seems old versions used the attach stream but it was fixed in mozilla-services/heka@bd5c4e8

relistan · 2016-05-06T09:50:20Z

The fundamental misunderstanding here is that it was not clear to me (and I think also at least the authors of Logspout, and Heka until recently) that /attach and /logs do not hit the same buffers inside the daemon. It doesn't help when, even with code to look at @aaronlehmann continued to refer to "logs" when he can see that we're using /attach:

I don't think dropping data from the log stream would necessarily be the correct behavior. This could
break the formatting of the logs and lead to unexpected gaps in the log data.

I was not aware of that and I guess no one at Docker (at least none of you guys) knew that at least two of the major third party logging systems were until not very long ago using the /attach endpoint to consume logs. I can assure you because at New Relic we ran Heka in production for quite some time, and at Nitro for a few months on v1.9, that this did not block containers until v1.10. It did leak memory like crazy when the reader got slow. At New Relic we ran a custom patched build that resolved the memory leak issue that @didip and I patched here.

I am a major contributor on the Heka Docker plugin and we've been running it at Nitro just fine until we upgraded to Docker v1.10 in production and the change took out our site not long after. We'll pull in Rob's recent changes to the Heka Docker plugin that switches it to the /logs endpoint and expect that it resolves this issue at least.

Nonetheless, this functional change to the behavior of the /attach endpoint really needs to go into the release notes for v1.10. It certainly would have raised a red flag for me. And I suggest adding some clarity to the Remote API documentation for the /attach endpoint which in various places refers to pulling in logs.

I would suggest that a 1MB buffer (not quite, it's 1e6) is also not anywhere near big enough. With log line average length around 100-150 bytes , this is 6666 to 10k lines of buffer. A high throughput production app can blow that buffer very quickly, especially in debug logging mode.

Finally, an explicit warning that docker attach can block a container is a critical addition. It's not mentioned in the docs. And anyone with experience prior to v1.10 will be reasonably certain that it doesn't.

relistan · 2016-05-06T09:58:37Z

Inadvertently closed rather than commented.

tonistiigi · 2016-05-09T21:21:25Z

I'm closing this as the issue was fixed in v1.11(#21840) and documentation was clarified in #22611 . If a variable buffer size is needed this can be discussed in a separate issue. I'm not sure if it is a good idea as reaching this limit should be an indication that the endpoint is being misused.

cpuguy83 · 2017-02-01T23:47:50Z

Just fyi, for 1.14 we've added a new logging mode that you can pass through via log-opt.
--log-opt mode=non-blocking --log-opt max-buffer-size=4m, for example.
See #28762 for details.

thomas15v · 2017-06-28T08:36:38Z

Is there any way to recover the logs if you accidentally froze it? I am having this issue with version 1.13.1. When doing docker-compose logs -f --tail 1000 <containerID>. It just hangs at "Attaching container ...".

Laisky · 2018-10-30T03:12:32Z

still hang on write(1 on docker 18.01.0-ce even with mode=non-blocking

cpuguy83 · 2018-10-30T22:23:10Z

@Laisky mode=non-blocking means the container will not be blocked on write for logs.

IF you have a client attached to the container stdio and the client is not consuming it (or is consuming too slowly), you will get blocks on write.

Laisky · 2018-10-31T09:53:24Z

@cpuguy83 thank you for your answer.

I can reproduce the problem by simply run one container with docker 18.03.0-ce, build 0520e24 (also on 18.01 9ee9f40):

docker run -itd --rm --log-driver=fluentd --log-opt="fluentd-address=172.16.5.20:12345" --log-opt tag="test.sit" --log-opt mode=non-blocking --log-opt max-buffer-size=50m --log-opt fluentd-async-connect=true --name test1 test

the docker image of test is a python script with infinite loop to print string, such like:

import datetime

while 1:
    print(datetime.datetime.now())

and the fluentd-address is just an random ip, the fluentd-async-connect can make sure this container starting properly.

after running a few minutes, the python script in this container will be blocked:

# strace -tt -p 26500

Process 26500 attached
17:00:09.487341 write(1, "\n", 1

cpuguy83 · 2019-02-01T23:53:08Z

@Laisky I cannot reproduce
Logs are full with:

ERRO[2019-02-01T23:37:40.443055483Z] Error writing log message                     container=23d79a64ce8356808efaf9d232280c4dc58ae8d9d573ff5dd37733ed7ee9718f driver=fluentd error="fluent#appendBuffer: Buffer full, limit 1048576"

Until I stop the container.

thaJeztah added area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. labels May 4, 2016

relistan closed this as completed May 6, 2016

relistan reopened this May 6, 2016

AkihiroSuda added a commit to AkihiroSuda/issues-docker that referenced this issue May 7, 2016

moby/moby#22502: logging can block containers

31abab8

tonistiigi mentioned this issue May 9, 2016

docs: clarify docker attach #22611

Merged

tonistiigi closed this as completed May 9, 2016

skippy mentioned this issue Aug 1, 2016

Node health flapping - EC2 hashicorp/consul#1212

Closed

AkihiroSuda added a commit to AkihiroSuda/issues-docker that referenced this issue Aug 15, 2016

moby/moby#22502 has been closed

e1f371f

timbunce mentioned this issue Oct 6, 2016

If log driver is not 'json-file' use /attach API to get the log stream gliderlabs/logspout#94

Open

bradrydzewski mentioned this issue Dec 12, 2016

Standard error (stderr) is truncated harness/gitness#1884

Closed

rashamalek mentioned this issue Apr 24, 2017

Backup process hangs after a couple of tries due to docker fifo issue zalando/ghe-backup#51

Closed

This was referenced Dec 20, 2020

Stopped container is shown in docker ps and is unresponsive due to blocked attached output reader #41827

Open

Logs stop consuming for container, blocking writes to stderr docker/compose#6018

Closed

Docker logging blocks and never recovers #22502

Docker logging blocks and never recovers #22502

Comments

relistan commented May 4, 2016

thaJeztah commented May 4, 2016

tonistiigi commented May 4, 2016

relistan commented May 4, 2016

tonistiigi commented May 4, 2016

relistan commented May 4, 2016 • edited

tonistiigi commented May 4, 2016

aaronlehmann commented May 4, 2016

benoahriz commented May 4, 2016

relistan commented May 4, 2016

tonistiigi commented May 4, 2016 • edited

relistan commented May 5, 2016 • edited

ripienaar commented May 5, 2016 • edited

ptinsley commented May 5, 2016

rnelson0 commented May 5, 2016

cpuguy83 commented May 5, 2016

rnelson0 commented May 5, 2016

tonistiigi commented May 5, 2016

relistan commented May 5, 2016 • edited

cpuguy83 commented May 5, 2016

cpuguy83 commented May 5, 2016

tonistiigi commented May 5, 2016

mjvaldez commented May 5, 2016

cpuguy83 commented May 5, 2016

mjvaldez commented May 6, 2016

cpuguy83 commented May 6, 2016 • edited

tonistiigi commented May 6, 2016

relistan commented May 6, 2016 • edited

relistan commented May 6, 2016

tonistiigi commented May 9, 2016

cpuguy83 commented Feb 1, 2017

thomas15v commented Jun 28, 2017

Laisky commented Oct 30, 2018

cpuguy83 commented Oct 30, 2018

Laisky commented Oct 31, 2018 • edited

cpuguy83 commented Feb 1, 2019

relistan commented May 4, 2016 •

edited

tonistiigi commented May 4, 2016 •

edited

relistan commented May 5, 2016 •

edited

ripienaar commented May 5, 2016 •

edited

relistan commented May 5, 2016 •

edited

cpuguy83 commented May 6, 2016 •

edited

relistan commented May 6, 2016 •

edited

Laisky commented Oct 31, 2018 •

edited