Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient gets stuck/never calls onComplete() when multiple requests with timeouts are sent #6323

Closed
mike-strauch opened this issue May 25, 2021 · 14 comments · Fixed by #6334 or #6355

Comments

@mike-strauch
Copy link

Jetty version
v9.4.41
v11.0.3 (does not get stuck, but exhibits unexpected behavior, more on that below)

Java version/vendor (use: java -version)
11.0.7+8-LTS

OS type/version
OSX 10.15.4

Description
I'm using HttpClient to send multiple HEAD requests with multiple threads with followRedirects=true and a timeout of 5 seconds for each request. I've observed that with sufficient time between requests (~200ms or more), some of the requests will never timeout and their respective onComplete listeners never fire. This causes our application to wait indefinitely on FuturePromise.get() calls for these requests to return.

This behavior did not happen in version 9.4.11. We've recently upgraded to 9.4.11+ (first tried 9.4.40 and then 9.4.41) in order to get official TLS 1.3 support.

In version 11.0.3, the onComplete listeners fire, but there seems to be a 20-25 second delay after the requests get "stuck" before the on complete listeners fire. Obviously this is better than the requests never firing the on complete listeners at all, but still problematic.

One thing I've noticed is that the test URL goes through a series of redirects. There is a second test URL at the top of the test classes (attached below) that does not run into the same issue but it doesn't go through any redirects.

With debug logging enabled in Jetty 9.4.41 for the TimeoutCompleteListener, I noticed that the requests get "stuck" when no call to TimeoutCompleteListener.schedule() is made for some of them. Example, I'd test 20 requests and see some number of TimeoutCompleteListener.schedule() calls that was less than 20 even though all of the requests should have been scheduled to time out. I'm not super familiar with the Jetty codebase, so this might not be on the right trail, but hope that helps.

Attached are two test classes for reproducing the behaviors in v9.4.41 and v11.0.3.
test-classes.zip

Thanks!

@sbordet sbordet self-assigned this May 26, 2021
@sbordet
Copy link
Contributor

sbordet commented May 26, 2021

@mike-strauch I can reproduce the issue, investigating.

sbordet added a commit that referenced this issue May 28, 2021
…ls onComplete()

* Reworked the total timeout handling.
* Now a CyclicTimeouts handles the exchanges in each HttpDestination,
and a CyclicTimeouts handles the exchanges in each HttpConnection
(rather than in HttpChannel).
* Now adjusting the total timeout for copied requests generated by
redirects and authentication.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@sbordet
Copy link
Contributor

sbordet commented May 28, 2021

@mike-strauch do you have the chance to try branch jetty-9.4.x-6323-review-httpclient-timeout to confirm that your problem is fixed?

@mike-strauch
Copy link
Author

I may not get to it until next week; but yes, I will try it out. Thanks!

sbordet added a commit that referenced this issue May 30, 2021
…ls onComplete()

Moved IteratorWrapper in HttpConnectionOverFCGI.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@sbordet sbordet added this to To do in Jetty 9.4.42 FROZEN via automation May 30, 2021
@mike-strauch
Copy link
Author

@sbordet things are looking better in that requests are no longer getting stuck. I did notice one inconsistency and that is the error message when a timeout occurs looks like Total timeout 4552 ms elapsed as opposed to Total timeout 5000 ms elapsed where the 4552 value seems to fluctuate rather than reporting the timeout that was specified on the request.

I'm also seeing an increase in the number of 503 responses from the foreign host, and I'm not sure why those would be occurring now. It's possible that there was a temporary issue with the server we're connecting to. This occurred while testing in our app and not with the test class. The actual error message is Server responded to GET request with HTTP code: 503.

In our app, we first send a HEAD request to verify a link is alive and if for some reason the HEAD request fails we send a follow up GET request. I can run the same test again in the app and see if the 503s resolve, but do you have any insight as to why that might be happening now and not in v9.4.11?

@sbordet
Copy link
Contributor

sbordet commented Jun 1, 2021

I did notice one inconsistency and that is the error message when a timeout occurs looks like Total timeout 4552 ms elapsed as opposed to Total timeout 5000 ms elapsed where the 4552 value seems to fluctuate rather than reporting the timeout that was specified on the request.

Yes this is because the many redirects you have shave off the original request timeout.
I agree that the original timeout should be reported though.

As for the 503, seems like a server error, not sure what happens there.
Do you have DEBUG logs?
Or access to the server logs to see why it fails?

@mike-strauch
Copy link
Author

Or access to the server logs to see why it fails?

Nah, it's not our server. I ran the test again with the new Jetty build and did see some 503s go by while debugging, but the re-check using a GET request succeeded so no 503s were reported in our link check report because 1 of the 2 requests succeeded. My guess is I'm overwhelming the server with too many requests to try and get things to fail. It was just odd that I didn't see any 503s on older Jetty, but I'd only run the test once. I can try again on old Jetty with debugging enabled and see if any 503s go by.

@mike-strauch
Copy link
Author

I just confirmed that I'm seeing 503s go by in Jetty 9.4.11, so I think we're good there.

@sbordet
Copy link
Contributor

sbordet commented Jun 3, 2021

@mike-strauch thanks for confirming.

sbordet added a commit that referenced this issue Jun 3, 2021
#6334)

Fixes #6323 - HttpClient requests with redirects gets stuck/never calls onComplete()

* Reworked the total timeout handling.
* Now a CyclicTimeouts handles the exchanges in each HttpDestination,
and a CyclicTimeouts handles the exchanges in each HttpConnection
(rather than in HttpChannel).
* Now adjusting the total timeout for copied requests generated by
redirects and authentication.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@sbordet sbordet added this to To do in Jetty 10.0.4/11.0.4 FROZEN via automation Jun 3, 2021
sbordet added a commit that referenced this issue Jun 3, 2021
…ls onComplete()

* Reworked the total timeout handling.
* Now a CyclicTimeouts handles the exchanges in each HttpDestination,
and a CyclicTimeouts handles the exchanges in each HttpConnection
(rather than in HttpChannel).
* Now adjusting the total timeout for copied requests generated by
redirects and authentication.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
(cherry picked from commit 2e7d174)
@mike-strauch
Copy link
Author

Is there a formal release schedule I can keep an eye on for this fix? I could use my own dev build for this fix but would prefer to use an official release. Thanks!

@sbordet
Copy link
Contributor

sbordet commented Jun 3, 2021

We are planning for next week.

Jetty 10.0.4/11.0.4 FROZEN automation moved this from To do to Done Jun 4, 2021
Jetty 9.4.42 FROZEN automation moved this from To do to Done Jun 4, 2021
sbordet added a commit that referenced this issue Jun 4, 2021
…ls onComplete()

* Reworked the total timeout handling.
* Now a CyclicTimeouts handles the exchanges in each HttpDestination,
and a CyclicTimeouts handles the exchanges in each HttpConnection
(rather than in HttpChannel).
* Now adjusting the total timeout for copied requests generated by
redirects and authentication.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
(cherry picked from commit 2e7d174)
@mike-strauch
Copy link
Author

Hey @sbordet was this addressed or are you just leaving it as is for now?

Yes this is because the many redirects you have shave off the original request timeout.
I agree that the original timeout should be reported though.

@sbordet
Copy link
Contributor

sbordet commented Jun 9, 2021

@mike-strauch it was addressed in 21aba4a.

Jetty 9.4.42 was released with the fix for this issue.

@mike-strauch
Copy link
Author

Hm, I built the project from the jetty-9.4.42.v20210604 tag and I'm still seeing messages like:

Total timeout 3808 ms elapsed

This is when calling result.getFailure().getMessage() in the onComplete() handler like in the test classes attached to this issue around line 158.

@sbordet
Copy link
Contributor

sbordet commented Jun 9, 2021

@mike-strauch I can reproduce, please open a new issue, as it is just about improving the exception message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
2 participants