Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-Message Deflate Memory Leak #1617

Closed
yosiat opened this issue Aug 11, 2019 · 26 comments
Closed

Per-Message Deflate Memory Leak #1617

yosiat opened this issue Aug 11, 2019 · 26 comments

Comments

@yosiat
Copy link
Contributor

yosiat commented Aug 11, 2019

sorry for not following the issue template, this way is easier to explain the issue, I will be happy to elaborate more

Description

In a production server after running a long time / mass-disconnection, I have noticed memory leak according to zlib & websockets.

This happens to all servers, but extremely happened to one of our servers which got mass-disconnect from 2.5k connected users to 50 connceted users. memory dump on this server showed:

  • We have 87k instances of WebSocket class
  • AsyncLimiter Queue have thousands of items

I managed to reproduce locally via:

  • socket.io server (this my setup) with per-message deflate enabled
  • connected with 200 users simultaneously and disconnected at once - memory stays the same, did manual garbage collection, wait for 2 for minutes, taking memory dump shows server still have WebSocket classes in memory.
  • Connected again with 20 users and now I can't connect - not getting responses from the server. the server is stuck.

Root cause

Because of AsyncLimiter have 10 pending jobs that never finished, this happens as a result of a race - calling close on inflate/deflate won't close any further flushes.

Example:

inflate.write(...);
inflate.close(); // happened async
inflate.flush(() => /* won't be called */);

I made some patches in the code to make sure I call the callback, but I didn't like the solution of attaching a callback to inflate/deflate objects, since the object can handle multiple operations at the same time and we can have further racers.

In my opinion, we should decouple the inflate/deflate from per-message deflate and create a class which will have an internal queue and call to inflate/deflate, one cleanup it will flush the queue - this way we can the solve the issues I wrote above.

I can submit a pull request for the above approach.

Reproducible in:

  • ws version: 6.4.1
  • socket.io version: 2.2.0
  • Node.js version(s): v10.15.3
  • OS version(s): Mac (local) / Linux (production)

Memory dump screenshot

Screen Shot 2019-08-11 at 16 12 25

The picture above is a memory dump for the server which had mass-disconnect from 2.5k to 50 sockets:

  • 88,992 instance of PerMessageDeflate
  • 178,012 instances of WebSocket
  • Array / Queue - is AsyncLimiter queue
@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

async-limiter was added in #1204 to work around this issue nodejs/node#8871, specifically nodejs/node#8871 (comment).

If I understand correctly the issue now is that it keeps stuff in its queue indefinitely due to never completing jobs (because flush() callback might never be called) right?

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

See also discussion on #1369

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

@lpinca hi!

I read both issues and the history behind it, but yes basically the issue that jobs are never completing and this way we don't clean up properly upon disconnection.

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

Are you sure this happens with inflate streams and not deflate? Asking because due to how the WebSocket is closed, close() should not be called on the deflate stream until there is data to read.

For deflate streams instead close() can also be called while data is being compressed.

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

@lpinca this happens on deflate, sorry

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

Ok it makes sense and I can see why jobs might never complete. This also made things worse

if (!this._deflate) {
//
// This `if` statement is only needed for Node.js < 10.0.0 because as of
// commit https://github.com/nodejs/node/commit/5e3f5164, the flush
// callback is no longer called if the deflate stream is closed while
// data is being processed.
//
return;
}

but reverting it would cause other problems. I'm not sure how to properly fix this.

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

@lpinca in my tests the flush callback is never called after you call close, I suspected at this code but it wasn't relevant

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

If you are using Node.js 10 it is expected, it was called before. Anyway we have to store the callback somewhere and call it when the final close() completes without actually calling this other callback

callback(err, result);

if _deflate was nulled by the final close.

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

Thank you for the report. If you want to try to fix it please do. I'm not sure when I can do it.

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

@lpinca I have some fix currently and I don't like it.

Which is putting the callback in inflate as well, and on PerMessageDeflate#cleanup call those callbacks. but this is problematic IMO because we are settings callback on shared-resource.

A better fix IMO is to have a class which does deflate/inflate and have in its queue - data (to compress) and promise, this way we don't have a direct reference to WebSocket.

I'll play with this approach and update you

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

Which is putting the callback in inflate as well, and on PerMessageDeflate#cleanup call those callbacks. but this is problematic IMO because we are settings callback on shared-resource.

It is not a big problem. There is one PerMessageDeflate instance per WebSocket and one inflate/deflate stream per PerMessageDeflate instance. Furthermore there can only be a single write/compress at time so we have to save/call only the latest callback.

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

@lpinca how do you guarantee that you have single write at the time?

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

It's done in https://github.com/websockets/ws/blob/master/lib/sender.js, we queue writes.

ws/lib/sender.js

Lines 266 to 267 in fa99173

if (this._deflating) {
this.enqueue([this.dispatch, buf, this._compress, opts, cb]);

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

and what about reads (deflate) ?

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

They should not have the issue described here because close() should never be called on inflate streams while data is being decompressed due to how the WebSocket is closed.

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

Cool @lpinca ! I will make a PR.

I started re-making my old, the question is what should I pass in the arguments to the callback, passing data their I think is in-correct, I think I should pass Error make sense?

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

I would just call it without arguments. This callback

callback(err, result);

must not be called after PerMessageDeflate#cleanup() is called or we will end up writing on a closed socket. I think we need a guard for this.

We only have to ensure this

is called to signal that the job completed.

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

Umm, I actually thought about adding code in PerMessageDeflate#cleanup that will call the callback.

I am not sure how can I call the done() there without calling the callback? or I am not understanding you correctly?

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

Yes do in PerMessageDeflate#cleanup() but also add a guard before calling the callback so that the outer callback on line 351 will not be called. Makes sense?

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

Oh, you want here -

something like:

    zlibLimiter.push((done) => {
      this._compress(data, fin, (err, result) => {
        done();
        if(result) { // call only if we got result
            callback(err, result);
        }
      });
    });

and in cleanup - call the callback like callback() ?

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

Something like this:

diff --git a/lib/permessage-deflate.js b/lib/permessage-deflate.js
index 29881b3..96fcf8b 100644
--- a/lib/permessage-deflate.js
+++ b/lib/permessage-deflate.js
@@ -134,6 +134,10 @@ class PerMessageDeflate {
 
     if (this._deflate) {
       this._deflate.close();
+      if (this._deflate.cb) {
+        this._doNotCallOuterCallback = true;
+        this._deflate.cb();
+      }
       this._deflate = null;
     }
   }
@@ -312,6 +316,8 @@ class PerMessageDeflate {
     zlibLimiter.push((done) => {
       this._compress(data, fin, (err, result) => {
         done();
+        if (this._doNotCallOuterCallback) return;
+
         callback(err, result);
       });
     });

with this._deflate.cb set where needed and possible cleaner :) Hope it makes sense.

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

make sense, thanks I'll PR later!

@lpinca
Copy link
Member

lpinca commented Aug 11, 2019

Here is something cleaner without using an additional flag:

diff --git a/lib/permessage-deflate.js b/lib/permessage-deflate.js
index 29881b3..204816e 100644
--- a/lib/permessage-deflate.js
+++ b/lib/permessage-deflate.js
@@ -134,6 +134,7 @@ class PerMessageDeflate {
 
     if (this._deflate) {
       this._deflate.close();
+      if (this._deflate[kCallback] !== null) this._deflate[kCallback]();
       this._deflate = null;
     }
   }
@@ -312,7 +313,7 @@ class PerMessageDeflate {
     zlibLimiter.push((done) => {
       this._compress(data, fin, (err, result) => {
         done();
-        callback(err, result);
+        if (err || result) callback(err, result);
       });
     });
   }
@@ -419,6 +420,8 @@ class PerMessageDeflate {
       this._deflate.on('data', deflateOnData);
     }
 
+    this._deflate[kCallback] = callback;
+
     this._deflate.write(data);
     this._deflate.flush(zlib.Z_SYNC_FLUSH, () => {
       if (!this._deflate) {
@@ -431,6 +434,8 @@ class PerMessageDeflate {
         return;
       }
 
+      this._deflate[kCallback] = null;
+
       let data = bufferUtil.concat(
         this._deflate[kBuffers],
         this._deflate[kTotalLength]

@yosiat
Copy link
Contributor Author

yosiat commented Aug 11, 2019

@lpinca liked your approach, removing the flag and updating the PR.

@lpinca lpinca closed this as completed in 3a695e8 Aug 12, 2019
@phiresky
Copy link

phiresky commented Aug 21, 2019

I'm not sure if this is related, but I'm having an issue where after having the ws server running for a few days, suddenly the server just starts ignoring all messages from the client (except for the connection message). After 4 hours of debugging, I finally found that the messages disappear in the call to decompress because they are just pushed into the zlibLimiter queue but the queue is never consumed, so the callback is never called.

image

Now in general, it seems insane to me to do decompression of like 20byte segments asynchronously?? but ignoring that I have no idea why this happens. Is this something that might be fixed with this patch? I'm using version ws@6.1.4 that does not contain it.

@yosiat
Copy link
Contributor Author

yosiat commented Aug 21, 2019

@phiresky the scenario I wrote in the PR is the same as yours, please try to upgrade to latest version and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants