Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring WebFlux & Jetty, large responses are cut off due to prematurely closed connection #24128

Closed
scottjohnson opened this issue Dec 3, 2019 · 7 comments
Labels
status: duplicate A duplicate of another issue

Comments

@scottjohnson
Copy link

scottjohnson commented Dec 3, 2019

See demonstration repository here: https://github.com/opentable/webflux-large-response-spring

I. Description

Given a Spring WebFlux application running on Jetty server using Spring 5.1.6 / Spring Boot 2.1.5 and above:

When returning a response from the server that is sufficiently large*, the connection is closed prematurely and the response is cut off part-way through.

The client will only receive the bytes of the response prior to the cut off point. The server will emit a Jetty error, java.io.IOException: Closed while Pending/Unready.

2019-12-03 20:28:13.862  WARN 1 --- [ttp@6bb4dd34-15] org.eclipse.jetty.server.HttpOutput      : java.io.IOException: Closed while Pending/Unready

* What is a "sufficiently large" response to trigger the cut-off depends on various factors detailed below.

This repository (https://github.com/opentable/webflux-large-response-spring) is a demonstration of the issue that can be used to reproduce when running locally from IDE, in a local Docker container, in Docker container deployed to Heroku, or running on some other container resource cluster (e.g. Mesos).

See usage details in the Background & Analysis section below. In short:

$ curl -v localhost:8080/just?size=8000000
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /just?size=8000000 HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Tue, 03 Dec 2019 21:56:03 GMT
< Content-Type: text/plain;charset=utf-8
< Content-Length: 8000000
< 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

(etc...)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa* transfer closed with 7197652 bytes remaining to read
* stopped the pause stream!
* Closing connection 0
curl: (18) transfer closed with 7197652 bytes remaining to read

II. Affected Versions

  • Spring 5.1.6 / Spring Boot 2.1.5 and above (tested up to Spring 5.2.1/Boot 2.2.1)
  • Jetty 9.4.15.v20190215 and above (tested up to Jetty 9.4.24.v20191120)
  • Java 8 and above (tested up to Java 11)

Note that the issue is only impacted by the Spring/Boot version and not the Jetty or Java versions.

We tested various combinations of Spring, Spring Boot, Jetty, and Java versions and determined that the issue is introduced only with an update to Spring 5.1.6/Spring Boot 2.1.5, regardless of which Jetty or Java version is used.

The issue is not exhibited on Spring 5.1.5/Spring Boot 2.1.4, regardless of which Jetty or Java version is used.

Tests with different Spring and Jetty versions

Does the issue occur?

--

Boot 2.1.4, Jetty 9.4.15.v20190215: No (This is the default pairing for Spring Boot 2.1.4)

Boot 2.1.4, Jetty 9.4.16.v20190411: No

Boot 2.1.4, Jetty 9.4.17.v20190418: No

Boot 2.1.4, Jetty 9.4.18.v20190429: No

Boot 2.1.4, Jetty 9.4.19.v20190610: No

Boot 2.1.4, Jetty 9.4.24.v20191120: No

--

Boot 2.1.5, Jetty 9.4.15.v20190215: Yes

Boot 2.1.5, Jetty 9.4.16.v20190411: Yes

Boot 2.1.5, Jetty 9.4.17.v20190418: Yes

Boot 2.1.5, Jetty 9.4.18.v20190429: Yes (this is the default pairing for Spring Boot 2.1.5)

Boot 2.1.5, Jetty 9.4.19.v20190610: Yes

Boot 2.1.5, Jetty 9.4.24.v20191120: Yes

--

Boot 2.2.1, Jetty 9.4.22.v20191022: Yes (this is the default pairing for Spring Boot 2.2.1)

Boot 2.2.1, Jetty 9.4.24.v20191120: Yes

Conclusion: The specific version of Jetty doesn't matter. The issue occurs consistently with Spring 5.1.6/Boot 2.1.5 and up, regardless of the Jetty version used.

Tests with different servers (Jetty, Netty, Tomcat Undertow)

Does the issue occur?

Jetty server: Yes

Reactor Netty server: No

Tomcat Server: No

Undertow Server: No

Conclusion: The issue is produced only by a combination of Spring 5.1.6/Boot 2.1.5 with the Jetty server. We were able to make requests for significantly larger amounts of data using, e.g., Reactor Netty server without ever encountering the error.

III. Background & Analysis

Q: Is it a problem with WebClient? A: No

We originally encountered this issue with an API gateway proxy server built on Spring Webflux, that uses the WebFlux WebClient to proxy requests and responses from downstream services. Web apps consuming these APIs would encounter an issue with large JSON responses returned by the API gateway being cut-off part way through.

Our initial assumption was that the issue has to do with the WebClient client connector used, but this hypothesis was disproved in testing. Ultimately, what matters is that the response from the server is large, even if no downstream WebClient call is made.

The demo app contains an API /webclient?path="foo" which proxies a request to the URI foo and returns the result as an un-parsed String. When curling this API and proxying to a downstream URI that returns a very large response to the demo app, the demo app will attempt to return this to the caller. The server will close the connection prematurely and the client receives only a partial response.

Testing different WebClient client connectors

Does the issue occur?

Reactor-netty client: Yes

Jetty-reactive-httpclient: Yes

Conclusion: The client connector doesn't matter. The issue occurs consistently with either client connector, as long as the server component is Jetty.

Removing WebClient from the mix...

The demo app also contains an API /just?size=bar which simply generates a large static string of size number of characters and returns it in the server response using Mono.just. When curling this API with a sufficiently large size the server will close the connection prematurely and the client receives only a partial response.

Conclusion: The use of WebClient as part of the call stack doesn't matter. All that matters is returning a large response from the server.

What is a "sufficiently large" response?

We had a difficult time reproducing this issue at first because what is a "sufficiently large" response to trigger the premature connection close can vary by several orders of magnitude depending on the operating environment where the code is running.

We were able to reproduce the issue using this demo app in the following operating environments:

  • Running locally in the IDE (IntelliJ)
  • Running in a local Docker container in Docker Desktop
  • Running in a Docker container in Heroku
  • Running in a Docker container in a Mesos cluster in our organization's remote data center

The number of bytes of the response that are necessary to trigger the cutoff decreases as you go down this list.

In other words, when running in our Mesos cluster, we are able to consistently reproduce the issue when requesting a response of 78200 characters or more. But when running in Heroku it requires up to 10x the number of bytes (as our Mesos cluster), and when running locally (in local Docker or in IDE) it can require up to 100x the number of bytes to trigger the issue.

Something about our Mesos cluster or our organizational network lowers the threshold significantly, but it is still possible to trigger the issue in these other environments by making the response size sufficiently large.

One hypothesis is that varying threshold has something to do with the network itself (maybe network latency?) over which the request is being made, but we have yet been unable to empirically test this.

Incidentally the threshold to trigger the issue in our Mesos cluster is always exactly 78200 characters.

What about container resource provisioning?

Given that we know the exact threshold to trigger the issue in a particular operating environment (78200 characters in Mesos), we were able to test whether container resource provisioning has any impact on triggering the issue.

Test container resource sizing

Does the issue occur?

Tested with a provisioned container with 512m native memory and 0.1 cpus, and max heap of 256m: Yes, after 78200 characters

Tested with a provisioned container with 2048m native memory and 2.0 cpus, and max heap of 1024m: Yes, after 78200 characters

Conclusion: Container resource sizing does not appear to have any impact on whether or not the issue is triggered or any affect on what is the triggering threshold.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Dec 3, 2019
@scottjohnson
Copy link
Author

scottjohnson commented Dec 4, 2019

I believe this is fixed in #24050. We did a quick test with 5.2.2.BUILD-SNAPSHOT and the problem disappears. Going to do more thorough testing tomorrow. This is a case where the issue manifests when using HTTP/1.1 with large writes.

@bclozel bclozel added the status: waiting-for-feedback We need additional information before we can continue label Dec 4, 2019
@rstoyanchev
Copy link
Contributor

rstoyanchev commented Dec 4, 2019

Thanks for the detailed report. This does sound like the same issue as #24050 although it is news for me that it did not manifest prior to 5.1.6. It's possible something else changed that altered the timing that in turn made it easier to demonstrate.

I am going to close this for now as duplicate but please do comment further. Also note that 5.2.2 and 5.1.12 were released yesterday so you can test with the released versions.

@rstoyanchev rstoyanchev added status: duplicate A duplicate of another issue and removed status: waiting-for-feedback We need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged or decided on labels Dec 4, 2019
@scottjohnson
Copy link
Author

Thanks @rstoyanchev. We tested with 5.1.12 and 5.2.2 release and it looks good.

@rstoyanchev
Copy link
Contributor

Great, thanks for confirming.

@vanniktech
Copy link

I'm getting the same when running Spring Boot 2.3.4 with embedded Tomcat. Should this have been fixed as part of this issue as well?

@bclozel
Copy link
Member

bclozel commented Oct 10, 2020

@vanniktech This sounds like a different problem. If you've got a sample project we can clone/download that reproduces the problem, please create a new issue.

@vanniktech
Copy link

After some digging it ended up being a configuration issue in NGINX which was truncating larger responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: duplicate A duplicate of another issue
Projects
None yet
Development

No branches or pull requests

5 participants