Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Socket Hung up and ECONNRESET issue between two Node.js processes #199

Open
1 task done
pcoulso1 opened this issue Jan 6, 2023 · 0 comments
Open
1 task done
Labels
Needs Triage needs an initial review

Comments

@pcoulso1
Copy link

pcoulso1 commented Jan 6, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am seeing a Socket Hung up issues occurring between two Node.js processes, one acting as the server and the other as the client. It appears to be a race condition but is happening frequently enough to cause issues. The client either reports that the “Socket Hung up” or that an ECONNRESET occurred.

On investigation the issues appears to be similar to the one reported in https://medium.com/ssense-tech/reduce-networking-errors-in-nodejs-23b4eb9f2d83, where there is a race condition between the client and the server closing the connection.

It seems to be related to the "freeSocketTimeout" implemented within the 'agentkeepalive' agent (which is used by 'make-fetch-happen'). In the 'make-fetch-happen', freeSocketTimeout is hard coded to 15000, (see https://github.com/npm/make-fetch-happen/blob/main/lib/agent.js#L80) but by default the server side socket keep alive timeout is set to just 5000 (see https://nodejs.org/docs/latest-v16.x/api/http.html#serverkeepalivetimeout). So there is a window of opportunity for the server to close the socket but the client still consider it as a usable socket.

Expected Behavior

From reading the post on the subject, I think the solution is to make the freeSocketTimeout (on the client) less than the keep alive timeout (on the server). If you do this then the client should close the connection before the server does and therefore avoid this race condition.

This is difficult to do on the client side as currently the freeSocketTimeout has been hard coded.

Should better defaults be considered for the freeSocketTimeout ? Or should they be able to be configurable via the fetcher default config, so that we can make use of the agent functionality within make-fetch-happen ?

Steps To Reproduce

  1. I’ve modified the example app (in https://medium.com/ssense-tech/reduce-networking-errors-in-nodejs-23b4eb9f2d83) to use make-fetch-happen. This app is both the client and the server process. (I've also added a workaround in the comments to override the agent used, but I think a better solution would be to make the freeSocketTimeout configurable).
import _ from 'lodash';
import restify from 'restify';
import fetcher from 'make-fetch-happen';
 
// import HttpAgent from 'agentkeepalive';
 
var fetch = fetcher.defaults({
    maxSockets: Infinity,
    retry: false,
 
    // Workaround : Generate your own agent with a freeSocketTimeout
    // which is less than the default server.keepAliveTimeout in
    // Node.js (https://nodejs.org/api/http.html#serverkeepalivetimeout)
    //agent: new HttpAgent({freeSocketTimeout: 4500}),
  });
 
const server = restify.createServer();
 
server.get('/stuff', async (req, res) => {
    await res.send('some stuff');
});
 
server.listen(8080, startSendingRequests);
 
const wait = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
 
async function startSendingRequests() {
    let successes = 0;
    let failures = {};
 
    for (const i of _.times(30)) {
        await wait(4999);
        console.log(i)
        await fetch('http://localhost:8080/stuff')
            .then((r) => {
                successes++;
                console.log("successes")
            })
            .catch((e) => {
                failures[e.message] = (failures[e.message] || 0) + 1;
                console.log(e.message)
            });
    }
 
    console.log({ successes, failures });
}
  1. Run the app and as you can see about half the calls fail.
{
  successes: 16,
  failures: {
    'request to http://localhost:8080/stuff failed, reason: read ECONNRESET': 14
  }
}

Environment

  • npm: 8.19.13
  • Node: 16.19.0
  • OS: Windows 10
  • platform: Windows 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage needs an initial review
Projects
None yet
Development

No branches or pull requests

2 participants
@pcoulso1 and others