Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Premature close errors unexpectedly but very reliably in specific circumstances #1576

Open
jmurty opened this issue Jun 8, 2022 · 6 comments · May be fixed by #1687
Open

Getting Premature close errors unexpectedly but very reliably in specific circumstances #1576

jmurty opened this issue Jun 8, 2022 · 6 comments · May be fixed by #1687
Labels

Comments

@jmurty
Copy link

jmurty commented Jun 8, 2022

Hi,

Thanks for this great library!

We have been getting Premature close errors on our production server when calling API endpoints.

These errors are entirely repeatable for the same API request with a given input and output – for example looking up details for a specific user – while API requests to the same endpoint with different inputs/outputs work as expected.

The errors occur only on our production Linux server (of course) and are not reproducible on our development machines (macOS), at least not for the specific users that trigger the error on the server.

I have not dug deeply enough to be sure, but it smells to me like faulty handling of chunked responses that happen to trigger a boundary condition due to the packet size or chunk read size on our server.

I am suspicious of the code here

node-fetch/src/index.js

Lines 391 to 399 in 043a5fc

properLastChunkReceived = Buffer.compare(buf.slice(-5), LAST_CHUNK) === 0;
// Sometimes final 0-length chunk and end of message code are in separate packets
if (!properLastChunkReceived && previousChunk) {
properLastChunkReceived = (
Buffer.compare(previousChunk.slice(-3), LAST_CHUNK.slice(0, 3)) === 0 &&
Buffer.compare(buf.slice(-2), LAST_CHUNK.slice(3)) === 0
);
}
for recognising the terminating bytes of a chunked response. It seems to assume that the final 5 bytes will always be either contained in a single chunk, or be at a chunk boundary exactly 3 bytes into the sequence. Should/could this check instead look at the last 5 bytes received in all cases, no matter which chunk contains those bytes?

The same API requests that trigger Premature close with node-fetch on the production server succeed if we use the Axios HTTP client library instead, and if we make the requests on the command line with curl.

Reproduction

Steps to reproduce the behavior:

  1. Run on a Linux server (Ubuntu 20.04 LTS)
  2. Make GET request to an API endpoint that works in 99% of cases
  3. Get a response that happens to trigger the issue, e.g. in our case with a Content-Length of 9171 or 12816
  4. Request with same request/response data fails every time with the Premature close error

Expected behavior

Complete the request and return the response body.

Your Environment

software version
node-fetch Confirmed issue with versions 3.2.1 and 3.2.5
node 16.14.0 and 16.15.1
npm 8.3.1 and 8.11.0
Operating System Ubuntu 20.04.4 LTS

Additional context

Requests to the same API endpoints and data request/responses work correctly on dev machines (macOS) and with alternative HTTP clients (Axios, curl)

@jmurty jmurty added the bug label Jun 8, 2022
@cyruscollier
Copy link

I believe this edge case is also happening for us (see discussion I started #1614). It occurs on less than 1% of the requests, but when it happens, it is usually reproducible for a period of time, until either the inputs change or the data in the response changes (due to a back-end data source change). In our environment:

  • All of the API responses are gzipped.
  • I'm pretty sure these errors only happen with GET requests.
  • Although there are no specific endpoints that trigger these errors, the Premature Close error does seem to happen more with requests that return larger response bodies, i.e. 25-100KB gzipped. It also might happen more when making several requests in a row and the API server is under slightly higher load (although still responding normally from what I see).

@jmurty Your theory about specific content lengths triggering this makes sense. Are your API responses gzipped as well? I'd like to rule that out as a culprit. We are going to push a production update soon that adds the keep-alive agent and a backoff retry when this error occurs, to see if that helps. I don't know enough about network sockets and packets to know if keep-alive or waiting a second or two before trying again has any effect on the content length or how the packets are processed on the next request.

@jmurty
Copy link
Author

jmurty commented Aug 8, 2022

Hi @cyruscollier I don't have any more insight into this issue since we "solved" it by switching to the Axios client at the time I opened this issue, but I can confirm:

  • We got the unexpected Premature close errors for GET requests. We never saw the error for other HTTP methods, though our workload was very GET-heavy so that doesn't mean much
  • Gzip was not a factor for us, our server responses were not gzipped
  • Load might have been a factor: the problem occurred when our service was under higher than usual load.

For what it's worth request retries didn't help in our case. We saw the problem when fetching slow-changing data for specific users, and for these problem requests we would get exactly the same error no matter how many times we retried or when. I could even reproduce the problem by running the same GET request manually in a small script on the server.

@cyruscollier
Copy link

Thanks for the additional info, @jmurty. It certainly sounds like the same problem. I hope we won't have to swap Node Fetch for Axios for connecting to our API, but that'll be our next course of action for sure.

@Treverix
Copy link

Treverix commented Dec 5, 2022

I have one example where the 3/2 assumption does not hold true. On a current request, the LAST_CHUNK marker is split like so

previousChunk = [ (...), 0, 13, 10, 48, 13]
lastChunk = [ 10, 13, 10]

So both tests to calculate properLastChunkReceived fail and I get a Premature Close, even though all data was received.

I patched my version and this does the trick. If the last buffer has size smaller then 5, then some bytes of that marker are expected to be in the previous chunk. So we create a small buffer with some last bytes from the previous chunk concatenated with the current buffer and do that test on that.

const onData = buf => {
	properLastChunkReceived = Buffer.compare(buf.slice(-5), LAST_CHUNK) === 0;

	// Sometimes final 0-length chunk and end of message code are in separate packets
	if (!properLastChunkReceived && previousChunk) {
		if (buf.length < 5) {
			properLastChunkReceived = Buffer.compare(Buffer.from([...previousChunk.slice(-5), ...buf]).slice(-5), LAST_CHUNK) === 0;
		}
	}

	previousChunk = buf;
};

Treverix added a commit to Treverix/node-fetch that referenced this issue Dec 5, 2022
@Treverix Treverix linked a pull request Dec 5, 2022 that will close this issue
2 tasks
@xfournet
Copy link

We also have the same problem, on a specific content (maybe due to to it's length ?) we have ~1% of the GET request that are failing while the request is working (as we can diagnose from the loadbalancer logs).

We tested the @Treverix patch and it's working well now, thanks!

@steveluscher
Copy link

If you folks are using an http.Agent with your connections in keepAlive mode with a fixed number of maxSockets, #1767 is the problem.

chungwu added a commit to plasmicapp/unfetch that referenced this issue Sep 30, 2023
We see that for some fetches of certain lengths after certain amount
of parallelism, we see "Premature close" errors as documented in this
issue: node-fetch/node-fetch#1576

We are applying the suggested fix in this PR:
Treverix/node-fetch@625fd38?diff=split

We do so by using patch-package to path our node-fetch-cjs, an then
inlining a copy of node-fetch-cjs into our built files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants