Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] daemon.ContainerLogs(): fix --follow #37656

Closed
wants to merge 7 commits into from

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Aug 16, 2018

This is a continuation of #37576, much simplified and improved, also easier to review

This commit tires to kill three birds with one stone.

I. Goroutine leak on docker logs --follow (#37391).

When daemon.ContainerLogs() is called with options.follow=true
(as in "docker logs --follow"), the "loggerutils.followLogs()"
function never returns (even then the logs consumer is gone).
As a result, all the resources associated with it (including
an opened file descriptor for the log file being read, two FDs
for a pipe, and two FDs for inotify watch) are never released.

If this is repeated (such as by running "docker logs --follow"
and pressing Ctrl-C a few times), this results in DoS caused by
either hitting the limit of inotify watches, or the limit of
opened files. The only cure is daemon restart.

Apparently, what happens is:

  1. logs producer (a container) is gone, calling (*LogWatcher).Close()
    for all its readers (daemon/logger/jsonfilelog/jsonfilelog.go:175).

  2. WatchClose() is properly handled by a dedicated goroutine in
    followLogs(), cancelling the context.

  3. Upon receiving the ctx.Done(), the code in followLogs()
    (daemon/logger/loggerutils/logfile.go#L626-L638) keeps to
    send messages synchronously (which is OK for now).

  4. Logs consumer is gone (Ctrl-C is pressed on a terminal running
    "docker logs --follow"). Method (*LogWatcher).Close() is properly
    called (see daemon/logs.go:114). Since it was called before and
    due to to once.Do(), nothing happens (which is kinda good, as
    otherwise it will panic on closing a closed channel).

  5. A goroutine (see item 3 above) keeps sending log messages
    synchronously to the logWatcher.Msg channel. Since the
    channel reader is gone, the channge send operation blocks forever,
    and resource cleanup set up in defer statements at the beginning
    of followLogs() never happens.

II. Premature exit of docker logs --follow (#37630).

docker logs --follow should mimic tail -f, which only exits once
the file it watches is gone. Same here: even if a container is
stopped, docker logs --follow should not exit since the container
can be restarted and continue producing logs. The two ways to
exit "docker logs --follow" should be:

  1. Kill it (as in Ctrl-C or kill).

  2. Remove the container.

III. Complicated logic of following logs.

This is not an issue per se, but followLogs() is "pretty gnarly"
(C) @cpuguy83. While we're at it, let's try to improve things a bit.

Now onto the fix.

  1. It appears to be that LogWatcher should not be bothered with
    log producer (i.e. container) being gone, so remove calls to
    (*LogWatcher).Close() from the log drivers, as well as the
    accompaniying data structures. This helps to solve issues II and III.

  2. To clarify what (*LogWatcher).Close() actually is, rename it
    to ConsumerGone() (note before that patch it used to mean both
    ProducerGone() (which is eliminated in 1 above) and ConsumerGone().
    Similarly, WatchClose() is now WatchConsumerGone(), etc.
    This is not required, but improves code readability a bit.

  3. followLogs() is modified to:

  • remove the context and the associated goroutine, which is no longer
    needed (helps to fix II and III);
  • remove blocking msg send (fixes I);
  • watch for Chmod event (which, if the file being watched is no more,
    means that the (opened) file is removed), exiting if the log file
    is removed (helps to fix II);
  • exit once ConsumerGone() is received, freeing all the
    resources (fixes I).
  1. Some existing test cases are modified due to II being fixed (i.e. now the container
    should be removed if we want docker logs -f to finish by itself).

  2. A test case TestLogsFollowGoroutineLeak is added to check if I is fixed.

  3. ContainerLogs() and ReadLogs() are modified to honor "follow" flag
    even if the container is not running.

@@ -95,7 +95,7 @@ github.com/philhofer/fwd 98c11a7a6ec829d672b03833c3d69a7fae1ca972
github.com/tinylib/msgp 3b556c64540842d4f82967be066a7f7fffc3adad

# fsnotify
github.com/fsnotify/fsnotify v1.4.7
github.com/fsnotify/fsnotify c9e9bfb647855178ec5f3947c02e6bd47a379eb9 https://github.com/kolyshkin/fsnotify/
Copy link
Member

@thaJeztah thaJeztah Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per fsnotify/fsnotify#183 (comment) and fsnotify/fsnotify#245 (comment) the project is looking for new maintainers; perhaps we can assist in that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll start with a few PRs :)

@codecov
Copy link

codecov bot commented Aug 16, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@0d9d861). Click here to learn what that means.
The diff coverage is 23.33%.

@@            Coverage Diff            @@
##             master   #37656   +/-   ##
=========================================
  Coverage          ?   36.04%           
=========================================
  Files             ?      609           
  Lines             ?    45018           
  Branches          ?        0           
=========================================
  Hits              ?    16225           
  Misses            ?    26565           
  Partials          ?     2228

@kolyshkin kolyshkin force-pushed the logs-f-leak-2 branch 2 times, most recently from cb3c07d to 63244e8 Compare August 16, 2018 16:05
@kolyshkin
Copy link
Contributor Author

Windows CI failure looks legit:

02:07:42.875 FAIL: check_test.go:107: DockerSuite.TearDownTest
02:07:42.875
02:07:42.875 assertion failed: error is not nil: Error response from daemon: unable to remove filesystem for 8f40ac43ca11be91cd139bf4b75db1164ed0639140757d3c9589b3bdda5e1ba2: remove D:\CI\CI-0f0bc06c5\daemon\containers\8f40ac43ca11be91cd139bf4b75db1164ed0639140757d3c9589b3bdda5e1ba2\8f40ac43ca11be91cd139bf4b75db1164ed0639140757d3c9589b3bdda5e1ba2-json.log: The process cannot access the file because it is being used by another process.: failed to remove 8f40ac43ca11be91cd139bf4b75db1164ed0639140757d3c9589b3bdda5e1ba2

@kolyshkin
Copy link
Contributor Author

docker.py test case hang is also legit; made a couple of fixes to tests: docker/docker-py#2121; will retest.

@kolyshkin
Copy link
Contributor Author

z ci needs to be restarted

@thaJeztah thaJeztah added this to backlog in maintainers-session Aug 20, 2018
@anusha-ragunathan
Copy link
Contributor

Merge conflict needs to be resolved

@kolyshkin
Copy link
Contributor Author

Rebased; local log driver patched as well (to remove readers).

This code has many return statements, for some of them the
"end logs" or "end stream" message was not printed, giving
the impression that this "for" loop never ended.

Make sure that "begin logs" is to be followed by "end logs".

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This ia a test case for issue moby#37391. The idea is to start
a container that produces lots of output, then run
"docker logs --follow" on the above container, than kill "docker logs"
and check whether the number of

 - daemon goroutines,
 - daemon opened file descriptors

are back to whatever they were.

Currently, this test reliably detects the leak.

PS here's what it takes to run the test case against the local daemon:

for i in busybox busybox:glibc hello-world debian:jessie; do docker pull $i; done
cd integration
ln -s ../Dockerfile .
cd container
go test -v -run TestLogsFollowGoroutineLeak

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
To include fsnotify/fsnotify#260
(fix for fsnotify/fsnotify#194).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This commit kills three birds with one stone.

I. Goroutine leak on docker logs --follow (moby#37391).

When daemon.ContainerLogs() is called with options.follow=true
(as in "docker logs --follow"), the "loggerutils.followLogs()"
function never returns (even then the logs consumer is gone).
As a result, all the resources associated with it (including
an opened file descriptor for the log file being read, two FDs
for a pipe, and two FDs for inotify watch) are never released.

If this is repeated (such as by running "docker logs --follow"
and pressing Ctrl-C a few times), this results in DoS caused by
either hitting the limit of inotify watches, or the limit of
opened files. The only cure is daemon restart.

Apparently, what happens is:

1. logs producer (a container) is gone, calling (*LogWatcher).Close()
for all its readers (daemon/logger/jsonfilelog/jsonfilelog.go:175).

2. WatchClose() is properly handled by a dedicated goroutine in
followLogs(), cancelling the context.

3. Upon receiving the ctx.Done(), the code in followLogs()
(daemon/logger/loggerutils/logfile.go#L626-L638) keeps to
send messages _synchronously_ (which is OK for now).

4. Logs consumer is gone (Ctrl-C is pressed on a terminal running
"docker logs --follow"). Method (*LogWatcher).Close() is properly
called (see daemon/logs.go:114). Since it was called before and
due to to once.Do(), nothing happens (which is kinda good, as
otherwise it will panic on closing a closed channel).

5. A goroutine (see item 3 above) keeps sending log messages
synchronously to the logWatcher.Msg channel. Since the
channel reader is gone, the channge send operation blocks forever,
and resource cleanup set up in defer statements at the beginning
of followLogs() never happens.

II. Premature exit of docker logs --follow (moby#37630).

docker logs --follow should mimic tail -f, which only exits once
the file it watches is gone. Same here: even if a container is
stopped, docker logs --follow should not exit since the container
can be restarted and continue producing logs. The two ways to
exit "docker logs --follow" should be:

1. Kill it (as in Ctrl-C or kill).

2. Remove the container.

III. Complicated logic of following logs.

This is not an issue per se, but followLogs() is "pretty gnarly"
(C) @cpuguy83. While we're at it, let's try to improve things a bit.

Now onto the fix.

1. It appears to be that LogWatcher should not be bothered with
log producer (i.e. container) being gone, so remove calls to
`(*LogWatcher).Close()` from the log drivers, as well as the
accompaniying data structures. This helps to solve issues II and III.

2. To clarify what `(*LogWatcher).Close()` actually is, rename it
to `ConsumerGone()` (note before that patch it used to mean both
`ProducerGone()` (which is eliminated in 1 above) and `ConsumerGone()`.
Similarly, WatchClose() is now WatchConsumerGone(), etc.
This is not required, but improves code readability a bit.

3. followLogs() is modified to
 - remove the context and the associated goroutine, which is no longer
   needed (fixes II and III);
 - remove blocking msg send (fixes I);
 - watch for Chmod event (which, if the file being watched is no more,
   means that the (opened) file is removed), exiting if the log file
   is removed (fixes II);
 - exit once ConsumerGone() is received, freeing all the
   resources (fixes I).

4. Test cases are modified due to II being fixed (i.e. now the container
should be removed if we want "docker logs -f" to finish by itself).

Should fix moby#37391 and moby#37630.

[v2: fix conflicts after local driver merge, patch local driver]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
TestLogsFollowNonStop is a test case for moby#3763.
It checsk that ContainerLogs(opts.Follow=true):
  - won't stop even if the container is stopped;
  - keep reading logs once the container is restarted;
  - only stops when the container is removed.

TestLogsFollowStopped checks that ContainerLogs(opts.Follow=true)
works fine (i.e. waits for more logs) for a container that is
currently stopped.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This fixes "docker logs --follow" to not exit (but wait for more logs)
in case the container is currently not running.

1. ContainerLogs() does not modify the value of "follow" option.

2. ReadLogs() calls followLogs() even if the writer is closed
(so that "docker logs --follow" will not exit on stopped container).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
to include docker/docker-py#2121

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

rebased; should help CI

@kolyshkin
Copy link
Contributor Author

As much as I like this PR (it is relatively simple and elegant, and removes a lot of code), I am afraid it won't work, for the following reasons:

With that said, I see no other way but to return to #37576 -- which is complicated and ugly, but backward-compatible and solves the initial issue (#37391).

@kolyshkin kolyshkin closed this Aug 24, 2018
@thaJeztah thaJeztah removed this from backlog in maintainers-session Aug 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants