Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker logging blocks and never recovers #22502

Closed
relistan opened this issue May 4, 2016 · 35 comments
Closed

Docker logging blocks and never recovers #22502

relistan opened this issue May 4, 2016 · 35 comments
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.

Comments

@relistan
Copy link

relistan commented May 4, 2016

You can permanently block a production container by listening too slowly on a log socket with any client. Docker eventually blocks the container process, and it never recovers. The container will remain blocked on stdout/stderr unless it's handling output asynchronously and dealing with output overruns on its own. This is new behavior introduced in November, which utterly changes the behavior of logging in Docker. It is not even reflected in the changelogs.

Output of docker version:

Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 20
 Running: 20
 Paused: 0
 Stopped: 0
Images: 26
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker-202:16-4407302-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 9.992 GB
 Data Space Total: 107.4 GB
 Data Space Available: 97.38 GB
 Metadata Space Used: 11.25 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.136 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /mnt/var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /mnt/var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.77 (2012-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 3.13.0-83-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 29.4 GiB
Name: ip-10-3-13-191

Additional environment details (AWS, VirtualBox, physical, etc.):

Running on AWS, repeatable on VMWare

Steps to reproduce the issue:

  1. Simulate a high throughput production app with some heavy timestamp logging: docker run -d busybox sh -c "while true; do date; done" . Note the container ID for later.
  2. Fetch blockingreader.go from here . This connects up to a container, sets up some pipes, and then never reads from them, letting Docker buffer internally.
  3. go get github.com/fsouza/go-dockerclient
  4. docker logs --tail=2 -f <your_container_id> in another terminal
  5. go run blockingreader.go <your_container_id>
  6. Watch the log output for a while and the app will stop logging once it fills the buffer and blocks. You can't tell from just busybox but the app is not just not logging, it's actually totally blocked. You could hook GDB up to it if you really want to see. We took stack dumps of our app to prove it. This is bad because you've now blocked a production app with a client connection to the logs.
  7. CTRL-C or kill or your blockingreader process which will release the Docker client socket and you would expect to see Docker unblock the container. In extensive testing, this does not happen. Even when the blocked reader is gone, Docker continues to block the container.

Describe the results you received:
Docker eventually blocks on high output logging if any client is slow and the internal logging buffer fills. This was not expected. It then also does not release the container when the socket is disconnected. This was really not expected.

Describe the results you expected:
Docker would continue to accept logging output from containerized applications without blocking (or leaking GBs of RAM like it used to). I would have expected a ring buffer implementation or some kind of sliding window without blocking at all.

Additional information you deem important (e.g. issue happens only occasionally):
100% repeatable. We tripped into this issue because we use Mozilla's Heka to log all of our container output. Heka connects to the Docker logs endpoint and sometimes becomes slow under heavy logging. In this way you can take out an entire server of Docker containers because of the permanent blocking behavior of the logs.

@thaJeztah
Copy link
Member

ping @aaronlehmann @unclejack PTAL

@thaJeztah thaJeztah added area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. labels May 4, 2016
@tonistiigi
Copy link
Member

Can you test with v1.11, very similar issue of what you are describing was fixed with #21840, at least this should fix the behavior after ctrl-c.

@relistan
Copy link
Author

relistan commented May 4, 2016

@tonistiigi I don't have access to 1.11 at the moment, will have to test tomorrow (I'm on UTC+1). You should be able to repeat the steps above in a minute or two if you have a 1.11 to test against.

@tonistiigi
Copy link
Member

@relistan I tested with your script and #21840 does seem to solve this. Note that point 6 behavior is somewhat by-design and is not fixed. If you have one reader that is very slow and one that is very fast we first buffer to memory but when this buffer fills up we are going to rely on the slower reader speed(0 in your case). If we didn't do that we would just run out of memory. Maybe we could provide something to tweak the buffer size. The bug in v1.10 is that once you close the slow reader the fast one remains blocked. In v1.11 docker logs will start showing new output after blockingreader.go is closed.

@relistan
Copy link
Author

relistan commented May 4, 2016

Thanks for testing @tonistiigi. I don't believe that is the right behavior in item 6 above, even if it's by design. Docker should drop log messages to the slow reader, not block the containerized app itself. Note that in item 6 it's not just the slow reader that is blocked, it's the whole stdout of the container. That's a big change in functionality vs 1.9 and also not very friendly behavior in a production environment. For example, someone watching the docker logs command over a slow link should not be able to block a production application.

@tonistiigi
Copy link
Member

cc @aaronlehmann

@aaronlehmann
Copy link
Contributor

I don't think dropping data from the log stream would necessarily be the correct behavior. This could break the formatting of the logs and lead to unexpected gaps in the log data. Perhaps it would be reasonable as an option. Alternatively, the buffer size could be made configurable.

@benoahriz
Copy link

@aaronlehmann so you are saying its acceptable behavior to block after a buffer is reached in a production environment? Dropping data or corrupting the formatting is definitely much better than blocking(thus making the app unusable) would it be feasible to at least have this as a config option to drop instead of block in such cases?

@relistan
Copy link
Author

relistan commented May 4, 2016

Isn't there a separate buffer for each reader? That's what it looks like from the code. So if I have the daemon logging to a file, and two clients watching docker logs then I have at least three buffers, no? So why would we block the other readers and the application itself when one of the docker logs calls is slow, rather than simply dropping data on the slow link? Honestly, if Docker is going to block the whole container on stdout/stderr for one slow reader, then it is unsafe to ever use docker logs in production.

@tonistiigi
Copy link
Member

tonistiigi commented May 4, 2016

correctness > speed. Something consuming the data and getting the bogus input is potentially far more dangerous than the decrease in speed. The buffer there is implemented to handle this problem and the size is something we thought was unreachable for any normal usage. Of course usage may vary, so configuring the size or using disk as a cache are solutions here.

I don't think this has anything to do with logs or the logging speed, the problem appears because there is an attach stream that refuses to read data. (Of course if the logdriver was considerably slower than attach stream then this would be opposite).

There can be 2 buffers per io stream. One if there is an attach stream and other if a log driver is attached. New docker log commands don't create new buffers.

@relistan
Copy link
Author

relistan commented May 5, 2016

@tonistiigi we're talking about logs here, right? Having worked on production systems for over 20 years I can't remember a time when I would have preferred to have complete logs over having the application up and running. The only exception being audit logging, something generally not entrusted to generalized logging anyway. This is the reason many people don't log to disk, and why traditional stream logging was done over UDP, for example.

I know you said you don't think it's about logging speed, but it is. The tool I provided above is the fastest way to make it happen so it doesn't read anything. In the production environment where this hit us, Heka was consuming the stream, but it had gotten backed up and was slow. I know this because we replaced our log appender with an asynchronous one. We then attached metrics to the remaining buffer size for our asynchronous appender. You can see it go up and down as Heka consumes messages. Of course since we're on 1.10.3 it eventually sits at low water mark forever because the Docker engine stops consuming.

Making the buffer bigger would make this happen less frequently perhaps, but it is not a solution. Also no one has addressed that this was a fundamental change in the Docker engine and that it wasn't in the changelogs. I know this is hard to get right all the time, but it should be retro-actively be added to the appropriate changelog so that anyone having not already upgraded can see it.

If there are actually people who prefer this behavior, can we have a CLI flag to switch it? If not, I'm not sure what logging solution I'll recommend in production that leverages the Docker daemon, including in "Docker: Up and Running". The only other way around it is to use a shim like we used to with the Spotify logger or some other supervisory process in the container that does the right thing.

@ripienaar
Copy link

ripienaar commented May 5, 2016

correctness > speed indeed, a production application thats blocked and unusuable because a peripheral system is having a bad time is not correct - it's completely dead and unusable in a way that might impact customers.

logging is peripheral to the primary function of the application.

This behaviour leads to a system that as soon as you flag an app in docker to enable debug level logging for a while then in the case where the debugging is volumous the system under debug becomes unusable - and impossible to debug. Operationally you'd much rather sacrifice logging integrity.

Systems where logging is so important as to be a critical function of the app will/should not be doing that logging to STD{OUT,ERR}, so sacrificing integrity of STD{OUT,ERR} logs seems reasonable to me over sacrificing the entire app.

@ptinsley
Copy link

ptinsley commented May 5, 2016

+1 @ripienaar @relistan

I understand there are applications where it is absolutely critical that no log is lost. I would argue that is by far outweighed by applications that are more than happy to shed a few logs to survive a heavy load event.

If it were configurable you could give people the option of 100% log capture but I would say the default should be to throw logs on the floor for a consumer with a full buffer.

@rnelson0
Copy link

rnelson0 commented May 5, 2016

Something consuming the data and getting the bogus input is potentially far more dangerous than the decrease in speed.

Someone able to cause a denial of service by using a slow log reader is even more dangerous, as "blocked" in this case is just another way to say DoS. In the field, one simply never assumes that syslogs are 100% complete because syslog, like ICMP, is one of the first things that is dropped to ensure the responsiveness of the service.

Some systems note "## syslog events were dropped" so that purposeful gaps can be differentiated from gaps caused by communication interruption. That may be helpful here to some extent.

@cpuguy83
Copy link
Member

cpuguy83 commented May 5, 2016

@rnelson0 In this case, a slow reader could have caused unbounded buffer growth, eventually causing the entire daemon to crash.
Crashing the daemon is much worse than blocking a single application.

A slow docker logs consumer would not even be able to block the application.
The reason the application would get blocked is because of https://github.com/docker/docker/blob/master/pkg/broadcaster/unbuffered.go#L27
docker logs does not use this.
Only the logging driver itself (for writing logs to the log service) and docker attach would be in there.

@rnelson0
Copy link

rnelson0 commented May 5, 2016

Sorry if it was unclear, I was making analogies between a log reader dropping logs and how syslogs are treated, but not trying to directly relate them.

@tonistiigi
Copy link
Member

The speed of a docker logs consumer does not have an impact on the container stdio write speed. What happens to the logs depends on the behavior of the log driver. If you use syslog driver and can configure it to drop logs on syslog buffer filling up it should start to drop them. JSON-file logger that is the default will not drop logs and never did before v1.10.

@relistan
Copy link
Author

relistan commented May 5, 2016

@cpuguy83 @tonistiigi Are you suggesting that the reproduction steps above do not cause the application to be blocked? It sounds like you're now saying that this bug is not valid. It seems that #21840 also proves that this can happen. I suggest that you try it on 10.1.3 with the reproduction steps above, it definitely blocks. And, we have production evidence that it does. I thought we were all agreed on the premise at least and simply arguing the correctness of the implementation.

@tonistiigi the JSON logger never dropped logs before v1.10 but it did not block the container, either. It had a different issue: it just blew up in memory. That was very bad behavior but at least in production we could manage around it: you had a little time before things crapped out and you could monitor for it. What we see now is the container just blocks. If you believe the write speed should not be affected then it looks likely that there is instead a concurrency issue somewhere that causes that effect.

The problem with all of the other drivers, including the syslog driver, is that you give up the ability to look at the log stream from a single container with docker logs. That has, up to now, been the advantage of running the JSON-file logger with Heka or some other process consuming the output stream from a socket.

@cpuguy83
Copy link
Member

cpuguy83 commented May 5, 2016

@relistan The example is contrived. This would only happen with either a blocked/slow logging driver (note this is a driver not docker logs consumers`) or by a blocked/slow attached client.

(Some) Logging drivers can be configured for async logging, this is your choice to risk dropping logs.

Issues with recovery after unblocking (e.g. by closing down the client connection) are fixed in 1.11.
I was able to reproduce the recovery bug on master (not on 1.11), so I'm investigating this regression.

I could see about implementing a way to make sure that attached clients can't block, but this would also be a significant change for some.

@cpuguy83
Copy link
Member

cpuguy83 commented May 5, 2016

And I take that back, I'm not seeing the recovery issue on master.

@tonistiigi
Copy link
Member

@relistan docker logs and docker attach have very different behavior. docker attach directly consumes output to the client(limiting throughput using a 1M buffer if the client link is slow). There is a bug in v1.10 for docker attach, I reproduced it and it is already fixed in v1.11. docker logs is different and reads already logged messages back from the driver. This shouldn't have any other purpose in your reproduce script other than you use this output to verify that the streams are being blocked by another component. So if you don't run your production apps in foreground over a slow link this should not affect you.

@mjvaldez
Copy link

mjvaldez commented May 5, 2016

@cpuguy83 @tonistiigi the behavior of docker logs is not the concern. what are you suggesting the behavior of the container and daemon should be with an attached blocked/slow client?

@cpuguy83
Copy link
Member

cpuguy83 commented May 5, 2016

@mjvaldez Attached clients get a 1MB buffer. This is a rather large buffer and clients would probably have to just not be reading from the attach stream at all to hit this buffer limit.

In practice, in any environment, why is something attached to a container in such a way?

@mjvaldez
Copy link

mjvaldez commented May 6, 2016

@cpuguy83 please see comments above where relistan describes the use case and ramifications of the behavior we saw in production.

@cpuguy83
Copy link
Member

cpuguy83 commented May 6, 2016

@mjvaldez AFAICT this is centered around logging. Can you explain how you have logging setup?
Docker does have support for async logging in some of its drivers, which seems like you might want to utilize here.

@tonistiigi
Copy link
Member

@relistan What version of mozilla-heka do you use? Seems old versions used the attach stream but it was fixed in mozilla-services/heka@bd5c4e8

@relistan
Copy link
Author

relistan commented May 6, 2016

The fundamental misunderstanding here is that it was not clear to me (and I think also at least the authors of Logspout, and Heka until recently) that /attach and /logs do not hit the same buffers inside the daemon. It doesn't help when, even with code to look at @aaronlehmann continued to refer to "logs" when he can see that we're using /attach:

I don't think dropping data from the log stream would necessarily be the correct behavior. This could
break the formatting of the logs and lead to unexpected gaps in the log data.

I was not aware of that and I guess no one at Docker (at least none of you guys) knew that at least two of the major third party logging systems were until not very long ago using the /attach endpoint to consume logs. I can assure you because at New Relic we ran Heka in production for quite some time, and at Nitro for a few months on v1.9, that this did not block containers until v1.10. It did leak memory like crazy when the reader got slow. At New Relic we ran a custom patched build that resolved the memory leak issue that @didip and I patched here.

I am a major contributor on the Heka Docker plugin and we've been running it at Nitro just fine until we upgraded to Docker v1.10 in production and the change took out our site not long after. We'll pull in Rob's recent changes to the Heka Docker plugin that switches it to the /logs endpoint and expect that it resolves this issue at least.

Nonetheless, this functional change to the behavior of the /attach endpoint really needs to go into the release notes for v1.10. It certainly would have raised a red flag for me. And I suggest adding some clarity to the Remote API documentation for the /attach endpoint which in various places refers to pulling in logs.

I would suggest that a 1MB buffer (not quite, it's 1e6) is also not anywhere near big enough. With log line average length around 100-150 bytes , this is 6666 to 10k lines of buffer. A high throughput production app can blow that buffer very quickly, especially in debug logging mode.

Finally, an explicit warning that docker attach can block a container is a critical addition. It's not mentioned in the docs. And anyone with experience prior to v1.10 will be reasonably certain that it doesn't.

@relistan relistan closed this as completed May 6, 2016
@relistan
Copy link
Author

relistan commented May 6, 2016

Inadvertently closed rather than commented.

@relistan relistan reopened this May 6, 2016
AkihiroSuda added a commit to AkihiroSuda/issues-docker that referenced this issue May 7, 2016
@tonistiigi
Copy link
Member

I'm closing this as the issue was fixed in v1.11(#21840) and documentation was clarified in #22611 . If a variable buffer size is needed this can be discussed in a separate issue. I'm not sure if it is a good idea as reaching this limit should be an indication that the endpoint is being misused.

@cpuguy83
Copy link
Member

cpuguy83 commented Feb 1, 2017

Just fyi, for 1.14 we've added a new logging mode that you can pass through via log-opt.
--log-opt mode=non-blocking --log-opt max-buffer-size=4m, for example.
See #28762 for details.

@thomas15v
Copy link

Is there any way to recover the logs if you accidentally froze it? I am having this issue with version 1.13.1. When doing docker-compose logs -f --tail 1000 <containerID>. It just hangs at "Attaching container ...".

@Laisky
Copy link

Laisky commented Oct 30, 2018

still hang on write(1 on docker 18.01.0-ce even with mode=non-blocking

@cpuguy83
Copy link
Member

@Laisky mode=non-blocking means the container will not be blocked on write for logs.

IF you have a client attached to the container stdio and the client is not consuming it (or is consuming too slowly), you will get blocks on write.

@Laisky
Copy link

Laisky commented Oct 31, 2018

@cpuguy83 thank you for your answer.

I can reproduce the problem by simply run one container with docker 18.03.0-ce, build 0520e24 (also on 18.01 9ee9f40):

docker run -itd --rm --log-driver=fluentd --log-opt="fluentd-address=172.16.5.20:12345" --log-opt tag="test.sit" --log-opt mode=non-blocking --log-opt max-buffer-size=50m --log-opt fluentd-async-connect=true --name test1 test

the docker image of test is a python script with infinite loop to print string, such like:

import datetime

while 1:
    print(datetime.datetime.now())

and the fluentd-address is just an random ip, the fluentd-async-connect can make sure this container starting properly.

after running a few minutes, the python script in this container will be blocked:

# strace -tt -p 26500

Process 26500 attached
17:00:09.487341 write(1, "\n", 1

@cpuguy83
Copy link
Member

cpuguy83 commented Feb 1, 2019

@Laisky I cannot reproduce
Logs are full with:

ERRO[2019-02-01T23:37:40.443055483Z] Error writing log message                     container=23d79a64ce8356808efaf9d232280c4dc58ae8d9d573ff5dd37733ed7ee9718f driver=fluentd error="fluent#appendBuffer: Buffer full, limit 1048576"

Until I stop the container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet
Development

No branches or pull requests