New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
awslogs: Prevent close from being blocked on log #47748
base: master
Are you sure you want to change the base?
Conversation
9f0a44a
to
7f2b744
Compare
Before this change a call to `Close` could be blocked if the the channel used to buffer logs is full. When this happens the container state will end up wedged causing a deadlock on anything that needs to lock the container state. This removes the use of a channel which has semantics which are difficult to manage to something more suitable for the situation. Signed-off-by: Brian Goff <cpuguy83@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpuguy83 nice refactor here! Overall changes LGTM. Just had one or two questions but nothing blocking.
"sync" | ||
|
||
"github.com/docker/docker/daemon/logger" | ||
"github.com/pkg/errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor-nit: should we use built-in errors
package here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use pkg/errors
everywhere in moby, mostly because it handles attaching stack traces.
// MessageQueue is a queue for log messages. | ||
// | ||
// [MessageQueue.Enqueue] will block unless/until there is a call to | ||
// [MessageQueue.Dequeue]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be updated that dequeue is the act of reading from the channel returned by MessageQueue.Receiver
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed! Forgot to update this after changing the implementation.
@@ -576,7 +578,7 @@ func (l *logStream) collectBatch(created chan bool) { | |||
} | |||
l.publishBatch(batch) | |||
batch.reset() | |||
case msg, more := <-l.messages: | |||
case msg, more := <-chLogs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The channel will be closed once the message queue is closed, so any buffered messages will not be handled by the current read implementation. This behavior existed before though so perhaps it should be a seperate issue.
Admittedly I also cannot think of a clean solution to abstract the complexity of reading from the underlying channel of the message queue structure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only return after the buffer is emptied (more
is true only after the last message is drained).
https://go.dev/play/p/NR4WOn-XUCs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice! TIL, the wording in https://go.dev/tour/concurrency/4 threw me off, but your example proves the correct behavior.
Before this change a call to
Close
could be blocked if the the channel used to buffer logs is full.When this happens the container state will end up wedged causing a deadlock on anything that needs to lock the container state.
This removes the use of a channel which has semantics which are difficult to manage to something more suitable for the situation.
Closes #39523
I can't say for sure if this resolves every report in #39523 but with the limited information provided I think this is the best we can do.
If others experience an issue then they'll need to open a new issue with the needed details to track the problem down.