New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address maxConcurrentStreams
violation on write timeout
#3908
Address maxConcurrentStreams
violation on write timeout
#3908
Conversation
5be65fc
to
e5d769f
Compare
Codecov Report
@@ Coverage Diff @@
## master #3908 +/- ##
============================================
+ Coverage 73.29% 73.32% +0.03%
- Complexity 15548 15563 +15
============================================
Files 1365 1365
Lines 59863 59882 +19
Branches 7598 7606 +8
============================================
+ Hits 43877 43909 +32
+ Misses 12136 12120 -16
- Partials 3850 3853 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What an awesome analysis! 😄
@@ -104,6 +104,7 @@ void clientTimeout() throws InterruptedException { | |||
assertThat(loggingEventCaptor.getAllValues()).noneMatch(event -> { | |||
return event.getLevel() == Level.WARN && | |||
event.getThrowableProxy() != null && | |||
event.getThrowableProxy().getMessage() != null && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert?
removeResponse(id); | ||
// Removing the response and decrementing {@code unfinishedResponses} isn't done immediately | ||
// here. Instead, we rely on {@code Http2ResponseDecoder#onStreamClosed} to decrement | ||
// `unfinishedResponses` to match the timing where netty decrements {@code numActiveStreams}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about?
// `unfinishedResponses` to match the timing where netty decrements {@code numActiveStreams}. | |
// `unfinishedResponses` after Netty decrements `numActiveStreams` in `DefaultHttp2Connection` so that `unfinishedResponses` is never greater than `numActiveStreams`. |
(We don't have to use {@code ...}
in the comment. 😉 )
core/src/main/java/com/linecorp/armeria/client/Http2ResponseDecoder.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Sorry reviewers While taking another look, I realized that the channel might not send a goAway frame when necessary. |
bcc82f0
to
d1183d8
Compare
Sorry 😅 I've added 3 commits, I'd appreciate if you can take another look 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left some questions. 😄
core/src/test/java/com/linecorp/armeria/client/ClientMaxConnectionAgeTest.java
Outdated
Show resolved
Hide resolved
.responseTimeoutMillis(0) | ||
.build(); | ||
|
||
assertThat(client.get("/delayed?seconds=4").aggregate().join().status()).isEqualTo(OK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request doesn't seem to be closed unlike what the method name says?
I maybe miss something, but if the MAX_CONNECTION_AGE
is two seconds and the response is sent after 4 seconds, shouldn't the connection should be closed forcefully? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the late check
Maybe I'm misunderstanding the specification, but I thought that even if MAX_CONNECTION_AGE
is reached, the client waits for requests to finish (so it doesn't forcefully fail long-running requests). I've updated the test name to better reflect this 😅
I've also organized how I think MAX_CONNECTION_AGE
works
-
A timer is scheduled for
MAX_CONNECTION_AGE
, but is only respected if there are no running requests
armeria/core/src/main/java/com/linecorp/armeria/internal/common/AbstractKeepAliveHandler.java
Lines 436 to 440 in b77bfbb
if (!isServer && !hasRequestsInProgress(ctx)) { logger.debug("{} Closing a {} connection exceeding the max age: {}ns", ctx.channel(), name, maxConnectionAgeNanos); ctx.channel().close(); } -
Instead, every time a request may have finished, we check if
MAX_CONNECTION_AGE
has passed, and there are no running requests.
armeria/core/src/main/java/com/linecorp/armeria/client/Http2ResponseDecoder.java
Lines 153 to 155 in b77bfbb
if (shouldSendGoAway()) { | |
channel().close(); | |
} |
armeria/core/src/main/java/com/linecorp/armeria/client/Http2ResponseDecoder.java
Lines 215 to 217 in b77bfbb
if (shouldSendGoAway()) { | |
channel().close(); | |
} |
armeria/core/src/main/java/com/linecorp/armeria/client/Http2ResponseDecoder.java
Lines 275 to 279 in b77bfbb
if (shouldSendGoAway()) { | |
// The connection has reached its lifespan. | |
// Should send a GOAWAY frame if it did not receive or send a GOAWAY frame. | |
channel().close(); | |
} |
- At the same time, we don't allow future requests for connections where
MAX_CONNECTION_AGE
has passed.
return active && !responseDecoder.needsToDisconnectWhenFinished();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the test name to better reflect this 😅
Thanks for that. 😄
but I thought that even if MAX_CONNECTION_AGE is reached, the client waits for requests to finish (so it doesn't forcefully fail long-running requests)
Exactly. My memory was wrong. Thanks for checking it. 😄
core/src/main/java/com/linecorp/armeria/client/Http2ResponseDecoder.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM!
…bort (#3920) Motivation: #3908 (comment) Once a request reaches `HttpSessionHandler#invoke` https://github.com/line/armeria/blob/d39c8f5719d7de7eef26df37917cc2df786b5b53/core/src/main/java/com/linecorp/armeria/client/HttpSessionHandler.java#L169 `HttpResponseDecoder.unfinishedResponses` hae been incremented from either of the three following locations: 1. https://github.com/line/armeria/blob/d39c8f5719d7de7eef26df37917cc2df786b5b53/core/src/main/java/com/linecorp/armeria/client/HttpChannelPool.java#L276 2. https://github.com/line/armeria/blob/d39c8f5719d7de7eef26df37917cc2df786b5b53/core/src/main/java/com/linecorp/armeria/client/HttpChannelPool.java#L788 3. https://github.com/line/armeria/blob/d39c8f5719d7de7eef26df37917cc2df786b5b53/core/src/main/java/com/linecorp/armeria/client/HttpChannelPool.java#L503 Usually, once a request completes (with either success or failure), the response is removed and `HttpResponseDecoder.unfinishedResponses` is decremented. However, if a request is cancelled early it is possible that `unfinishedResponses` for the session isn't decremented. This is problematic because: 1. A http2 connection may not be able to fully utilize maximum streams (although 25 streams is allowed, only 24 streams are utilized) 2. Max connection age may not function properly, since it needs `HttpResponseDecoder.unfinishedResponses == 0` before closing connections. Modifications: - Decrement `HttpResponseDecoder.unfinishedResponses` if a request is cancelled immediately. Result: - More stable behavior from Armeria client
https://github.com/line/armeria/runs/4285541086?check_suite_focus=true#step:6:1154 |
435a06b
to
dca247c
Compare
I've tried force pushing to rerun the test.
|
You might want to install @trustin's awesome script https://gist.github.com/trustin/05cbb70e22fc5e7c8b5ffbd1f0d99c8b |
Thanks for checking it. 😅
We don't have to do it right now. I just wanted to make sure. 😄 |
I guess The flaky test seems not related to this PR. |
Thanks a lot, @jrhee17 for fixing this bug which is really hard to find the cause. 😄 |
Background
Currently, armeria maintains a state
HttpResponseDecoder#unfinishedResponses
to check how many in-flight requests are being processed for a connection.Armeria uses this value to check if all connections occupy too many concurrent streams, and creates a new connection if necessary.
On the other hand, netty maintains it's own state to check how many in-flight requests are being processed for a connection. (
DefaultHttp2Connection.DefaultEndpoint#numActiveStreams
)Netty checks this value before creating a stream, and throws a
Http2Exception$StreamException
ifMAX_CONCURRENT_STREAMS
is unavailable.Problem Statement
Currently, when a
WriteTimeoutException
is triggered, armeria decrementsunfinishedResponses
and removes the response. (AWriteTimeoutException
is thrown when a request header isn't written within a predefinedwriteTimeoutMillis
)However, netty may not be aware that armeria has failed the response. Consequently, netty's
numActiveStreams
is greater than armeria'sunfinishedResponses
. This may cause a violation ofMAX_CONCURRENT_STREAMS
for additional requests on the connection.Motivation
Netty always calls
Http2ResponseDecoder.onStreamClosed
before decrementingnumActiveStreams
.If we want
numActiveStreams
to be in sync withunfinishedResponses
, I propose that we modify the timing of decrementingunfinishedResponses
toHttp2ResponseDecoder.onStreamClosed
.In detail, when a
WriteTimeoutException
is scheduledarmeria/core/src/main/java/com/linecorp/armeria/client/HttpRequestSubscriber.java
Lines 171 to 173 in 117a21e
the response is closed.
armeria/core/src/main/java/com/linecorp/armeria/client/HttpRequestSubscriber.java
Line 318 in 117a21e
Consequently, after the stream processes the
close
event,whenComplete
is triggered.armeria/core/src/main/java/com/linecorp/armeria/client/Http2ResponseDecoder.java
Lines 83 to 90 in 117a21e
And the response is removed (and
unfinishedResponses
is decremented)armeria/core/src/main/java/com/linecorp/armeria/client/Http2ResponseDecoder.java
Line 101 in 117a21e
However, as far as netty is concerned, the request may have been written and may still be processing.
Misc
Reproduced
maxConcurrentStreams
whenWriteTimeoutException
occurs at 225a684Modifications
removeResponse
call fromHttp2ResponseDecoder. onWrapperCompleted
, and rely ononStreamClosed
to remove the response/decrementunfinishedResponses
onHeadersRead
,onDataRead
,onRstStreamRead
, also check ifresWrapper
had been closed. This preserves behavior sinceres
was previously removed onWriteTimeoutException
, resulting inres == null
.Update
I realized that if we simply don't process values when headers/data/rst are received, then we might not send a
GoAway
and close the connection whendisconnectWhenFinished = true
due to df43379.I've verified this behavior from test cases added in 8018da1
I've modified further such that:
onStreamClosed
is called.channel().close();
ifshouldSendGoAway()
is true foronDataRead
,onHeadersRead
sinceonStreamClosed
will handle this instead.onStreamClosed
to try to close theResponseWrapper
only if the underlyingdelegate
is open.d1183d8
There is a slight change of behavior, where a
GoAway
may be triggered fromonRstStream
as well. Let me know if this change shouldn't be made 🙏Result:
Maximum active streams violated for this endpoint.
from client side #3858