Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken timeout behavior in Node.js environment #2874

Closed
wants to merge 2 commits into from

Conversation

mooyoul
Copy link

@mooyoul mooyoul commented Apr 4, 2020

Background

Please see #2710 (comment) for details.

Currently, timeout behavior is different between Browser (XHR) and Node.js environment.
It was caused by PR #1752, Affected axios versions are v0.19.1 and v0.19.2 (latest).

What's changed

It reverts changes of #1752, and adds additional tests for various timeout scenarios.

Copy link
Collaborator

@chinesedfan chinesedfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mooyoul Thanks for your solid works. I left some comments to share with you.


(Saw your comments in #2710 and updated) Yes, keep idiomatic timeout behavior, but just add some document to tell users how to handle problems that fixed by #1752.

var success = false, failure = false;
var error;

axios.get('http://10.255.255.1/', {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ip have special meaning? Can we use a non-existed host or something else?

Copy link
Author

@mooyoul mooyoul Apr 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's a one of well-known private address, which can be used for non-routable address.

In general, Dropping TCP SYN packet is common way to simulate connect timeout artificially.
Using http://google.com:81 as request URL will result same effect because Google's Firewall drops TCP SYN packet, but i thought using Google's URL is not reliable because Google can change their firewall policy in someday, which is unpredictable. So that's why i use non-routable address as request url.

Please see https://stackoverflow.com/questions/100841/artificially-create-a-connection-timeout-error for further details.

I'll add some comments to test code for explanation.

reject(createError('timeout of ' + config.timeout + 'ms exceeded', config, 'ECONNABORTED', req));
});

reject(createError('timeout of ' + config.timeout + 'ms exceeded', config, 'ETIMEDOUT', req));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECONNABORTED is changed as ETIMEDOUT. Do you share the same idea with #1543?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ETIMEOUT on server side different with browser. the http request on nodejs while be block the js until the poll finished (successed or failed),. so the settimeout can not be executed when connection failed.

you can read the detail from here : https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/

Copy link
Author

@mooyoul mooyoul Apr 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it was a unintended change. Sorry about that.
Anyway, Yes. I think it should be a ETIMEDOUT.

As a axios consumer, There's no reliable ways to know given request was timed out or not. Current error does not help consumers to detect timeouts easily. Currently, It only can be done by matching error message like below:

const isTimeoutError = /timed?out/i.test(error.message);
// or
const isTimeoutError = error.message.includes("timeout");

I think it is not reliable way to match timeout errors.

Another reason why we should change error code is for clarity. ETIMEDOUT error code is more clearer than ECONNABORTED. because timeout was really happened so that's why we aborted request. I also think ECONNABORTED is not actual root cause of given error.

Final reason why we should change error code is for consistency. Node.js built-in modules use ETIMEDOUT error code for timeout errors.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think this would be a breaking change tho? Current users of axios might expect it to be ECONNABORTED and might rely on that for their own implementations.

Copy link
Author

@mooyoul mooyoul Apr 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, It would be a breaking change. If user relied to ECONNABORTED error code only. it would break their applications.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinesedfan Any plans to start using semver based versioning?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fishcharlie Yes, next version should be 0.20 at least, which means breaking change because it is less than 1.0. And 0.19.1 is not a standard version.

// At this time, if we have a large number of request, nodejs will hang up some socket on background. and the number will up and up.
// And then these socket which be hang up will devoring CPU little by little.
// ClientRequest.setTimeout will be fired on the specify milliseconds, and can make sure that abort() will be fired after connect.
req.setTimeout(config.timeout, function handleRequestTimeout() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dropping connect timeout, users will complain about server problems that fixed by #1752. Maybe we should consider #1739 together, and got#timeout already gives a good example.

To keep things simple, we can only support partial kinds of timeout at the beginning.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, HTTPRequest#setTimeout cannot detect connect timeout. I'm not sure it's intended or just a bug of Node.js core. Current HTTPRequest#setTimeout implementation won't call Socket#setTimeout internally if connect timeout was happened. Please refer to https://github.com/nodejs/node/blob/v13.12.0/lib/_http_client.js#L808-L812 . You'll notice Socket#setTimeout will not be called until connection establish.

Copy link
Author

@mooyoul mooyoul Apr 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a proof that HTTPRequest#setTimeout cannot detect connect timeout due to described reason:

스크린샷 2020-04-06 오전 5 57 29

스크린샷 2020-04-06 오전 5 58 59

As you can see. timeout event wasn't emitted, elapsed 1 minutes either. (macOS system default connect timeout)

It results to axios too. latest axios release (which is #1752 changes applied) cannot detect connect timeout either:

스크린샷 2020-04-06 오전 5 52 22

As you can see. axios v0.19.2 cannot detect connect timeout and elapsed 1 minutes (macOS system default connect timeout)

Copy link

@fishcharlie fishcharlie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this time, I do not support this PR. I'm not confident that the problems brought up in #1752 are sufficiently solved in this PR.

The PR specifically mentions a situation where with a large number of requests, it has the potential to choke the CPU. I do not see any tests or proof that this PR fixes that issue. Simply reverting the changes that PR makes, doesn't address the core issue.

Even after reading @mooyoul's comment on the main issue, although it discusses timeouts, nothing addressed the issue brought up by the original PR, and no indication was given that this fixes that issue.

I believe @ryouaki should give some input here and on #2710 if possible. It would have been nice if #1752 had some type of unit test or something that we could test against to attempt different solutions. That way we could have had a reproducible example of the issue @ryouaki was running into with more information. Maybe that is something to consider making a requirement before merging a PR in the future @chinesedfan?

@ryouaki
Copy link
Contributor

ryouaki commented Apr 5, 2020

At this time, I do not support this PR. I'm not confident that the problems brought up in #1752 are sufficiently solved in this PR.

The PR specifically mentions a situation where with a large number of requests, it has the potential to choke the CPU. I do not see any tests or proof that this PR fixes that issue. Simply reverting the changes that PR makes, doesn't address the core issue.

Even after reading @mooyoul's comment on the main issue, although it discusses timeouts, nothing addressed the issue brought up by the original PR, and no indication was given that this fixes that issue.

I believe @ryouaki should give some input here and on #2710 if possible. It would have been nice if #1752 had some type of unit test or something that we could test against to attempt different solutions. That way we could have had a reproducible example of the issue @ryouaki was running into with more information. Maybe that is something to consider making a requirement before merging a PR in the future @chinesedfan?

Thank you for you replay. we should know that the event loop is different from nodejs and browser.

on nodejs side, the connection event will poll until event finished and block js . so setTimeout can not be invoked on nodejs side. you can read the detail from https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/

About the test case.... I do not know how to write them .becasue only when the remote server is block (it can be connected and not failed, but do not response you)

@mooyoul
Copy link
Author

mooyoul commented Apr 5, 2020

@chinesedfan @fishcharlie

Hello all, I've spent time to investigate concerns about @fishcharlie pointed out.

According to @ryouak (who proposed PR #1752) changed axios to use HTTPRequest#setTImeout instead of creating timers not to block event loop.

I was curious how HTTPRequest#setTimeout could resolve the issue that @ryouaki described.

So, first step was understanding how HTTPRequest.setTimeout works.
In summary, HTTPRequest#setTimeout calls Socket#setTimeout internally.

Here's a code snippet from Node.js core (or see https://github.com/nodejs/node/blob/v13.12.0/lib/_http_client.js#L789-L815)

ClientRequest.prototype.setTimeout = function setTimeout(msecs, callback) {
  if (this._ended) {
    return this;
  }

  listenSocketTimeout(this);
  msecs = getTimerDuration(msecs, 'msecs');
  if (callback) this.once('timeout', callback);

  if (this.socket) {
    setSocketTimeout(this.socket, msecs);
  } else {
    this.once('socket', (sock) => setSocketTimeout(sock, msecs));
  }

  return this;
};

function setSocketTimeout(sock, msecs) {
  if (sock.connecting) {
    sock.once('connect', function() {
      sock.setTimeout(msecs);
    });
  } else {
    sock.setTimeout(msecs);
  }
}

As you can see. HTTPRequest#setTimeout is thin wrapper.
Following task will be happen if HTTPRequest#setTimeout called:

  1. Setup timeout event listener to Socket to propagate timeout event.
  2. Setup timeout event listener to HTTPRequest (= ClientRequest) if callback was supplied
  3. If socket is not assigned yet, Setup socket event listener to HTTPRequest to continue step to (4)
  4. Call Socket#setTimeout

That's all. Then how Socket#setTimeout works?
Socket#setTimeout is just reference of setStreamTimeout function of internals.

Here's a code snippet from Node.js core (or see https://github.com/nodejs/node/blob/v13.12.0/lib/internal/stream_base_commons.js#L224-L254)

function setStreamTimeout(msecs, callback) {
  if (this.destroyed)
    return;

  this.timeout = msecs;

  // Type checking identical to timers.enroll()
  msecs = getTimerDuration(msecs, 'msecs');

  // Attempt to clear an existing timer in both cases -
  //  even if it will be rescheduled we don't want to leak an existing timer.
  clearTimeout(this[kTimeout]);

  if (msecs === 0) {
    if (callback !== undefined) {
      if (typeof callback !== 'function')
        throw new ERR_INVALID_CALLBACK(callback);
      this.removeListener('timeout', callback);
    }
  } else {
    this[kTimeout] = setUnrefTimeout(this._onTimeout.bind(this), msecs);
    if (this[kSession]) this[kSession][kUpdateTimer]();

    if (callback !== undefined) {
      if (typeof callback !== 'function')
        throw new ERR_INVALID_CALLBACK(callback);
      this.once('timeout', callback);
    }
  }
  return this;
}

As you can see, It creates a timer and update timer at every stream read/write completion.
Yeah, and Node.js core also uses timer to detect timeouts.
So, As of my understanding, HTTPRequest#setTimeout can be affected by blocked event loop. It looks there are no reasonable performance differences between creating timers and calling HTTPRequest#setTimeout.

Anyway, It's still not clear how HTTPRequest#setTimeout could resolve the issue. right?
Please read #2710 (comment) if you haven't read previous comment.

As you can see. HTTPRequest#setTimeout calls Socket#setTimeout right after connection establish. If socket connection was not established yet, connect event listener will be setup and waits for connection establish. and then, calls Socket#setTimeout. Socket#setTimeout won't be called until socket successfully connected.

As i earlier mentioned, Socket#setTimeout only measures "socket inactivity". So this change will change timeout behavior also. Previously, axios's timeout worked as "Call Timeout", but it changed timeout behavior as "Socket Idle timeout (Read Timeout + Write Timeout)". That's why PR #1752 could resolve @ryouaki's issue.

As of my understanding, libuv does not block TCP connect at all. So calling connect will not block event loop either.

Also, I've read the link which was @ryouaki actually faced:
vuejs/vue-router#1706 (comment)

From mentioned issue, @ryouaki mentioned ECONNRESET error code, which is not related to timeouts. because we've been using ECONNABORTED as error code to indicate timeout.

Also, According to his blog article (which was mentioned in upper linked issue):
https://juejin.im/post/5b8d31d3f265da43594407e2

(truncated, translated from Chinese to English using Google Translate)
however. . . . . . I was wrong again.

One day I forgot to turn off the computer, the local pressure test environment was still running, and the next day I was surprised to find that all the socket resources that were suspended were released. But the memory and CPU are still not recycled. On this point, I verified the operation and maintenance colleagues, indeed the operating system will automatically handle these long inactive links. Although I solved the problem by modifying the axios source code, it seems that the essential cause of the problem is not correct.

So, my conclusion are:

  • There are no reasonable performance differences
  • Actually, It was not actual issue of axios. nor it wasn't issue of Node.js. Leak was caused by SSR
  • It just changed timeout behavior. It didn't fix timeout issue.
  • We need to revert our timeout behavior as this PR proposed
  • We don't need to add document to tell users how to handle problems that fixed by Fixing socket hang up error on node side for slow response. #1752

@mooyoul mooyoul requested a review from chinesedfan April 5, 2020 22:52
@mooyoul mooyoul force-pushed the read-timeout branch 2 times, most recently from d3d272c to 05bb253 Compare April 5, 2020 22:58
@ryouaki
Copy link
Contributor

ryouaki commented Apr 6, 2020

thank you for your comment

As you can see. HTTPRequest#setTimeout calls Socket#setTimeout right after connection establish. If socket connection was not established yet, connect event listener will be setup and waits for connection establish. and then, calls Socket#setTimeout. Socket#setTimeout won't be called until socket successfully connected.

this is because connection is system's behavior and is sync. it will block the current thread unless you use another thread.

So js only has one thread, so connection will block js. this is the flow below from nodejs.org

   ┌───────────────────────────┐
┌─>│           timers          │
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │     pending callbacks     │
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │       idle, prepare       │
│  └─────────────┬─────────────┘      ┌───────────────┐
│  ┌─────────────┴─────────────┐      │   incoming:   │
│  │           poll            │<─────┤  connections, │
│  └─────────────┬─────────────┘      │   data, etc.  │
│  ┌─────────────┴─────────────┐      └───────────────┘
│  │           check           │
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
└──┤      close callbacks      │
   └───────────────────────────┘

The poll phase has two main functions:

Calculating how long it should block and poll for I/O, then
Processing events in the poll queue.

If the poll queue is not empty, the event loop will iterate through its queue of callbacks executing them synchronously until either the queue has been exhausted, or the system-dependent hard limit is reached.

so this is why setTimeout for axios about timeout on server side can not work well.

@mooyoul
Copy link
Author

mooyoul commented Apr 6, 2020

thank you for your comment

As you can see. HTTPRequest#setTimeout calls Socket#setTimeout right after connection establish. If socket connection was not established yet, connect event listener will be setup and waits for connection establish. and then, calls Socket#setTimeout. Socket#setTimeout won't be called until socket successfully connected.

this is because connection is system's behavior and is sync. it will block the current thread unless you use another thread.

Have you read all of my comment? I know how Node's Event Loops works. I am not sure why are you mentioning about "system's behavior" and "sync". But to clarify, I would like to tell you some facts.

The difference behavior between HTTPRequest#setTimeout and Socket#setTimeout should not be a reason of "blocking". As you can see, Node.js internals also use timers. HTTPRequest#setTimeout and Socket#setTimeout methods are currently relies on timers. Also, connect() operation is performed in async & non-blocking manner by libuv. All network operations except DNS lookups (e.g. getaddrinfo) are non-blocking. (See http://docs.libuv.org/en/v1.x/guide/networking.html)

The poll phase processes I/O callbacks. It does not do actual I/O operations. It just calls I/O callbacks - Just like fs.readFile callback. Once again, All network operations are performed by libuv, in non-blocking manner.

connect() operation does not block event loop either. If connect() blocks event loop, below result is impossible:

스크린샷 2020-04-07 오전 12 39 49

스크린샷 2020-04-07 오전 12 42 18

So js only has one thread, so connection will block js. this is the flow below from nodejs.org

   ┌───────────────────────────┐
┌─>│           timers          │
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │     pending callbacks     │
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
│  │       idle, prepare       │
│  └─────────────┬─────────────┘      ┌───────────────┐
│  ┌─────────────┴─────────────┐      │   incoming:   │
│  │           poll            │<─────┤  connections, │
│  └─────────────┬─────────────┘      │   data, etc.  │
│  ┌─────────────┴─────────────┐      └───────────────┘
│  │           check           │
│  └─────────────┬─────────────┘
│  ┌─────────────┴─────────────┐
└──┤      close callbacks      │
   └───────────────────────────┘

The poll phase has two main functions:

Calculating how long it should block and poll for I/O, then
Processing events in the poll queue.

If the poll queue is not empty, the event loop will iterate through its queue of callbacks executing them synchronously until either the queue has been exhausted, or the system-dependent hard limit is reached.

so this is why setTimeout for axios about timeout on server side can not work well.

So, It is still not clear why we must use HTTPRequest#setTimeout instead of using setTimeout.
Please leave facts or proofs if you believe we must keep HTTPRequest#setTimeout.

@ryouaki
Copy link
Contributor

ryouaki commented Apr 8, 2020

》》The poll phase processes I/O callbacks. It does not do actual I/O operations. It just calls I/O callbacks - Just like fs.readFile callback. Once again, All network operations are performed by libuv, in non-blocking manner.

are you sure?

The connection will block thread unless you do some thing with another thread. because TCP is blocked.

For my case:

  1. My server will received more than 100000 request in an hour.
  2. About 3% will timeout from my test report.
  3. most of them which timeout can be close by system and HTTPRequest#setTimeout work will. but some of them blocked and HTTPRequest#setTimeout not be invoked. I can find them from logs.
  • I add log on request start .and add logs with Promise.then and Promise.catch.
    -compare the report logs which record last 5 days, logs only saved 5days for my system.
  • if the number of received request is the same as the summation that failed and success. then HTTPRequest#setTimeout is ok.

**but for my system is not. some request failed but blocked. I found that some timer did not execute and still hang up, because the number of [success] and [failed] is less than the number we [received] **

After I change HTTPRequest#setTimeout to Socket#setTimeout and analysis the logs which we report after change., the number of [success] and [failed] is always the same as number we [received]

So I think for xhr HTTPRequest#setTimeout is ok, but for http. I do not think so.

@chinesedfan
Copy link
Collaborator

Thanks for your long discussions. After reading codes of sindresorhus/got, which provides lots for timeouts, I prefer @mooyoul's decision currently. And @ryouaki, would you mind explaining following two questions?

After I change HTTPRequest#setTimeout to Socket#setTimeout

I think we are comparing setTimeout with HTTPRequest#setTimeout, and HTTPRequest#setTimeout calls Socket#setTimeout internally. Or did you call setTimeout as HTTPRequest#setTimeout?

Although I solved the problem by modifying the axios source code, it seems that the essential cause of the problem is not correct.

It was auto-translated from your post. Did that mean your server problem was caused by vue-router SSR?

@ryouaki
Copy link
Contributor

ryouaki commented Apr 13, 2020

@chinesedfan @mooyoul @fishcharlie
vue-router ssr has the problem because HTTPRequest#setTimeout on server side.

yesterday, I got the problem again with my new project.

  1. I send request to report api by axios.post and timeout with 10000ms,the axios version is 0.17.1
  2. the report api has a limit that 10 request on hour. if you send the request more than 10/hour. the api will no response.
  3. for a new case, my team member debug a new feature and send more request to report api and then the script hang up.
  4. then I change to use http of nodejs and use Socket#setTimeout fix this problem.

So I think this is the same reason that the timer is blocked by TCP connection when no response from api which you requested.

@AceMood
Copy link

AceMood commented May 2, 2020

Have you ever considered using ClientRequest's constructor's timeout option to implement the so called Call Timeout? @mooyoul
In my test, it's different from using req.setTimeout such as axios library doing now. More source code I have dived into: https://acemood.github.io/2020/05/02/node%E4%B8%AD%E8%AF%B7%E6%B1%82%E8%B6%85%E6%97%B6%E7%9A%84%E4%B8%80%E4%BA%9B%E5%9D%91/

pieterjan84 added a commit to stellarbeat/js-stellarbeat-backend that referenced this pull request May 6, 2020
@github-actions
Copy link
Contributor

Hello! 👋 \n\nThis pull request is being automatically marked as stale because it has not been updated in a while. Please confirm that the issue is still present and reproducible. If no updates or new comments are received the pull request will be closed in a few days. \n\nThanks

@jasonsaayman
Copy link
Member

Please can you fix the conflicts in this branch, update it to the latest master, and let me know so I can review it fully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants