socket hang up (ECONNRESET) - Web3js #27859

dancamarg0 · 2022-09-17T14:17:40Z

Problem

We at Triton One have seen many developers facing an error like this when using the web3js:

FetchError: request to https://client.rpcpool.com/ failed, reason: socket hang up
    at ClientRequest.<anonymous> (/home/ec2-user/processes/dex-webserver-mainnet-multi/node_modules/node-fetch/lib/index.js:1491:11)
    at ClientRequest.emit (node:events:527:28)
    at ClientRequest.emit (node:domain:475:12)
    at TLSSocket.socketOnEnd (node:_http_client:478:9)
    at TLSSocket.emit (node:events:539:35)
    at TLSSocket.emit (node:domain:475:12)
    at endReadableNT (node:internal/streams/readable:1345:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  type: 'system',
  errno: 'ECONNRESET',
  code: 'ECONNRESET'
}

I've been collecting some tcp dump from our servers and I can see in the vast majority of times this is caused due to an RST packet sent by our HAproxy Loadbalancer which abruptly closes the connection in the client side, see this screenshot as an example.

IP: 204.16.246.170 (Load Balancer managed by Triton)
IP: 18.237.101.162 (Client)

Notice the Load Balancer first sends a FIN flag indicating to the client the socket will close.
Shortly afterwards the client attempts to PUSH data to a read/write-closed socket.
The server responds with a TCP RST flag.
Nodejs handles this abrupt 'disconnection' with the error above.

This seems to be a common issue across many nodejs applications when I search through stackoverflow.com. While the client can simply ignore it and re-connect, this raises complains from our customers that are expecting to extract maximum read performance from our servers.

Proposed Solution

Here's a few proposed solutions:

Remove HTTP keep-alive functionality completely from Web3js so it closes sockets as soon as the client gets a response.
Enforce client-side timeouts in the http keep-alive settings, e.g: https://github.com/solana-labs/solana-web3.js/blob/master/src/agent-manager.ts#L13 could be set as {keepAlive: true, maxSockets: 25, timeout: 30000}; (30s) or shorter?

Note: 2) won't likely solve the issue completely but potentially reduce the error rate as the errors don't happen in a fixed interval, it varies on every application. Some customers see it every X minutes while others see it pretty much every few seconds. So here i'm proposing that the client closes the socket before the LB does to avoid the client abruptly closing the connection

Destroy socket just after sending a new request, see this interesting discussion on nodejs repo Make it possible to forcibly RST a net.Socket nodejs/node#27428. I can see at the bottom folks have created a PR in May attempting to fix this, it may be good for Node developers to have a look.

The text was updated successfully, but these errors were encountered:

0xCactus · 2022-11-12T15:37:57Z

Bump on this as Solend often sees this error

y2kappa · 2022-12-05T18:13:14Z

Bump also, Hubble and Kamino have oracle staleness issues due to this.

steveluscher · 2022-12-05T19:51:06Z

Love it. I'll dig into this, this week.

steveluscher · 2022-12-06T22:29:31Z

K, here's what I think I've learned from this excellent article on tuning keep-alive.

The underlying HTTP library that the Solana RPC uses (hyper) has a default keep-alive timeout of 20s.
Typical Node.js servers have a default keep-alive timeout of 5s.
When the RPC is behind a load balancer, a higher ‘free socket timeout’ in the load balancer can result in the RPC closing the socket, but the load balancer (ergo, the client) thinking that it's still open. The next request will fail.

I believe the solutions to be as follows:

Let people supply their own agents or disable agents altogether should they like to do some tuning feat: you can now supply your own HTTP agent to a web3.js Connection #29125.
Reduce the timeout of our default agent to the Solana RPC's timeout, minus one second (20s - 1s = 19s) fix: reduce Connection keep-alive timeout to 1 second fewer than the Solana RPC's keep-alive timeout #29130.
RPC providers should maybe do the same – setting their load balancer timeouts to 1 second less than the Solana RPC's timeout (20s - 1s = 19s).

Let's discuss on over at #29130.

gallynaut · 2022-12-12T19:49:28Z

@steveluscher Switchboard is having better performance with this version of web3.js

Thanks for getting this fixed. Will report back if anything changes.

steveluscher · 2022-12-12T20:17:36Z

Rad. What exactly does better look like in your case @gallynaut?

gallynaut · 2022-12-14T16:30:24Z

We monitor event loop health for our oracles. With the ECONNRESET issue the oracles would be blocked from 1s to 2min which caused some feeds to be stale. With this patch we no longer see the event loop blocked warnings.

steveluscher · 2022-12-14T17:09:02Z

Yaaaas. This is great news. @gallynaut, can you check out this discussion from another team that's having some success with this patch? I'm curious to know how your setup is structured, and what the keep-alive timeouts are configured to at every step in the network (the client is now 19s, your load balancer is ???, and presumably your RPC endpoint is the Solana official RPC which is set to 20s).

dancamarg0 added the community Community contribution label Sep 17, 2022

steveluscher self-assigned this Sep 17, 2022

steveluscher added the javascript Pull requests that update Javascript code label Sep 17, 2022

steveluscher mentioned this issue Sep 17, 2022

Implement retry count and error state for stateful subscription. #27833

Closed

steveluscher added the web3.js Related to the JavaScript client label Dec 2, 2022

steveluscher added this to the web3.js Roadmap – December 2022 milestone Dec 5, 2022

steveluscher mentioned this issue Dec 6, 2022

fix: reduce Connection keep-alive timeout to 1 second fewer than the Solana RPC's keep-alive timeout #29130

Merged

steveluscher closed this as completed in #29130 Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

socket hang up (ECONNRESET) - Web3js #27859

socket hang up (ECONNRESET) - Web3js #27859

dancamarg0 commented Sep 17, 2022 •

edited

0xCactus commented Nov 12, 2022

y2kappa commented Dec 5, 2022

steveluscher commented Dec 5, 2022

steveluscher commented Dec 6, 2022

gallynaut commented Dec 12, 2022

steveluscher commented Dec 12, 2022

gallynaut commented Dec 14, 2022

steveluscher commented Dec 14, 2022

socket hang up (ECONNRESET) - Web3js #27859

socket hang up (ECONNRESET) - Web3js #27859

Comments

dancamarg0 commented Sep 17, 2022 • edited

Problem

Proposed Solution

0xCactus commented Nov 12, 2022

y2kappa commented Dec 5, 2022

steveluscher commented Dec 5, 2022

steveluscher commented Dec 6, 2022

gallynaut commented Dec 12, 2022

steveluscher commented Dec 12, 2022

gallynaut commented Dec 14, 2022

steveluscher commented Dec 14, 2022

dancamarg0 commented Sep 17, 2022 •

edited