Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reset compressor/decompressor instead of re-initialize #1840

Merged
merged 2 commits into from Feb 2, 2021
Merged

reset compressor/decompressor instead of re-initialize #1840

merged 2 commits into from Feb 2, 2021

Conversation

cTn-dev
Copy link
Contributor

@cTn-dev cTn-dev commented Feb 2, 2021

Use zlib.reset() to (as per nodejs documentation Reset the compressor/decompressor to factory defaults. Only applicable to the inflate and deflate algorithms.)

Recently we have introduced/enabled server_no_context_takeover on one of our canary production servers which i monitor with nodejs with the help of this ws library (approx 250 connections on a single nodejs process connecting to the server).

With the introduction of server_no_context_takeover i have noticed significantly increased CPU usage on the client of approx 2x (8% -> 16%) to what it was before the parameter was added).

I have traced it down to significantly higher GC activity, which went from ~100ms total time/minute to ~500ms.
When it comes to actual GC counts, it went from ~25 GC/minute to ~150.
Minor GCs almost stopped and majority of the time was spent in the major GCs and incremental GCs.

The root cause of this overhead appears to be the constant destruction/re-initialization of the zlib.createInflateRaw.
The attached changes eliminate this overhead, and brings the performance back in line to where it was before the server_no_context_takeover parameter was introduced.
The changes are holding up fine in production/i haven't encountered any issues so far.

I don't know if there are any potential side-effects of the below changes, the changes do pass all the tests currently available for the library (although i am not sure the current tests would catch compressor/decompressor "corruption" resulting from the changes in this PR, at least i wasn't able to run into such case so far).

Running the available speed.js points towards a very minor throughput improvement for the 5000 roundtrips part of the test.
Each benchmark run fluctuates quite a bit on my work machine so i opted for not posting the results here.

I am attaching a set of graphs showing the performance

  1. before server_no_context_takeover was introduced on the server side
  2. with the current WS library version (v7.4.2)
  3. and the performance after the patch below was pushed

Node version: LTS v14.15.4
Platform: Linux x64

image
image

@lpinca
Copy link
Member

lpinca commented Feb 2, 2021

I don't know if there are any potential side-effects of the below changes, the changes do pass all the tests currently available for the library (although i am not sure the current tests would catch compressor/decompressor "corruption" resulting from the changes in this PR, at least i wasn't able to run into such case so far).

I don't know either and no, I don't think existing tests cover this condition but the patch seems good. We should just verify that resetting has the same behavior of re-instantiating.

Comment on lines 383 to 384
this._inflate[kTotalLength] = 0;
this._inflate[kBuffers] = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you move this before the if statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 454 to 455
this._deflate[kTotalLength] = 0;
this._deflate[kBuffers] = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and done

@lpinca lpinca merged commit 4e9607b into websockets:master Feb 2, 2021
@lpinca
Copy link
Member

lpinca commented Feb 2, 2021

Thank you. I'll cut a new release later and eventually revert if side effects will be reported.

@cTn-dev
Copy link
Contributor Author

cTn-dev commented Feb 2, 2021

Thank you as well, ill keep an eye on the issues list once the new build is out.
I don't expect any issues but i am indeed curious if there is some edge case i haven't thought of while testing this manually (will see).

@cTn-dev cTn-dev deleted the no_context_takeover_reset branch February 2, 2021 18:38
@schoudhary21
Copy link

ws_issue
with the latest version "7.4.3" i am getting this issue, it is just crashing.

@cTn-dev
Copy link
Contributor Author

cTn-dev commented Feb 15, 2021

@schoudhary21 I see you are running into this while using nw.js (it might be worth listing the version for it as well), are you able to reproduce the same crash in a regular node.js ?
Are you connecting to a specific url/host or have some specific repro case that you could share?

@schoudhary21
Copy link

schoudhary21 commented Feb 15, 2021

@cTn-dev i am using https://dl.nwjs.io/v0.51.1/nwjs-sdk-v0.51.1-win-x64.zip version of nw.js, for connecting localhost websocket server using 127.0.0.1 ip address.

I never had an issue with ws module. after recent update of ws, it stopped working.
Now i forced lower version using 'npm install ws@7.4.1' and this is working perfectly fine for me.

it will be great if it can be fixed asap; other npm modules as 'engine.io' etc are using it so; many components with different versions of ws are going to give more problem.

for ws init; options i am using:

options.perMessageDeflate = {
zlibDeflateOptions: {
level: 9
},
threshold: 20
}

for sending i am using options:
const options = { compress: true, mask: true, fin: true }

thank you :)

@lpinca
Copy link
Member

lpinca commented Feb 15, 2021

@schoudhary21 are you able to consistently reproduce the issue and write a test case?

@schoudhary21
Copy link

schoudhary21 commented Feb 15, 2021

yes, every time.
for testing let me know what type of data is needed i will try to provide that.

@lpinca
Copy link
Member

lpinca commented Feb 15, 2021

Ok, would you be so kind to write some code here, using only ws, that reproduce the issue? Thank you.

@schoudhary21
Copy link

i did another test;
i installed the latest ws 7.4.3 and the app started crashing then i went to the ws package location and under lib folder in the 'permessage-deflate.js' i change following line ...

**> if (fin && this.params[${endpoint}_no_context_takeover]) {

    this._inflate.reset();
  }

**

with this ....

**> if (fin && this.params[${endpoint}_no_context_takeover]) {

   this._inflate.close();
    this._inflate = null;
  }**

and now the app started working fine without any issue.

@lpinca
Copy link
Member

lpinca commented Feb 15, 2021

@schoudhary21 it would be great if you could write something like this:

const WebSocket = require('ws');

const server = new WebSocket.Server(
  {
    perMessageDeflate: true,
    port: 0
  },
  function () {
    const ws = new WebSocket(`ws://localhost:${server.address().port}`, {
      perMessageDeflate: {
        clientNoContextTakeover: true,
        threshold: 0
      }
    });

    ws.on('open', function () {
      ws.send('A'.repeat(50));
    });
  }
);

server.on('connection', function (ws) {
  ws.on('message', console.log);
  ws.on('error', console.error);
});

with the missing bits that make it crash.

@schoudhary21
Copy link

ok i will try for that;

another key point it connects to my websocket server and don't crash and stay connected for long time but as soon as i try to send msg which is compression enabled then it crashes.

@lpinca
Copy link
Member

lpinca commented Mar 5, 2021

@jeffrson can you please log the handshake response headers and the data received? You can do that like this:

const ws = new WebSocket('ws://127.0.0.1:8888/svc');

ws.on('upgrade', function(res) {
  console.log(res.headers);
}); 

ws.onopen = function() {
  ws._socket.on('data', console.log);
};

I have some problems installing dotnet and compiling the server of https://github.com/jeffrson/node_zlib_binding_err_internal_assertion on this machine.

@jeffrson
Copy link

jeffrson commented Mar 5, 2021

No problem

<Buffer c1 0b 33 34 32 36 31 35 33 b7 00 00 00>
<Buffer c1 0b 33 34 32 36 31 35 33 b7 00 00 00>

@lpinca
Copy link
Member

lpinca commented Mar 5, 2021

What about the headers? I need those to see how the permessage-deflate extension is negotiated.

@lpinca
Copy link
Member

lpinca commented Mar 5, 2021

I used the wrong event in the above example, sorry. It's the 'upgrade' event not 'headers'.

@jeffrson
Copy link

jeffrson commented Mar 5, 2021

Yes, I copied the snippet from the email notification... never mind!

{
  server: 'websocket-sharp/1.0',
  upgrade: 'websocket',
  connection: 'Upgrade',
  'sec-websocket-accept': 'XNWJgYkWYdm162Df815xmcnJnos=',
  'sec-websocket-extensions': 'permessage-deflate; client_no_context_takeover; server_no_context_takeover'
}

@lpinca
Copy link
Member

lpinca commented Mar 5, 2021

Thank you. Will test it in a bit.

@jeffrson
Copy link

jeffrson commented Mar 5, 2021

BTW, if you need to test and have a Linux version of dotnet (https://docs.microsoft.com/de-de/dotnet/core/install/linux-debian), you may create a project like so:

dotnet new console
dotnet add package websocketsharp --prerelease
<replace Program.cs with the file from the repo>
dotnet run

This will work on Windows too with .Net SDK only (I strongly assume, but I have an installation of Visual Studio)

@lpinca
Copy link
Member

lpinca commented Mar 5, 2021

I can reproduce. Here is a test case:

const crypto = require('crypto');
const http = require('http');
const WebSocket = require('ws');

const GUID = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11';

const data = Buffer.from('c10b33343236313533b7000000', 'hex');

const server = http.createServer();

server.on('upgrade', (req, socket) => {
  const key = crypto
    .createHash('sha1')
    .update(req.headers['sec-websocket-key'] + GUID)
    .digest('base64');

  socket.on('error', console.error);
  socket.on('data', function () {
    socket.write(data);
    socket.write(data);
  });

  socket.write(
    [
      'HTTP/1.1 101 Switching Protocols',
      'Upgrade: websocket',
      'Connection: Upgrade',
      `Sec-WebSocket-Accept: ${key}`,
      'Sec-Websocket-Extensions: permessage-deflate; ' +
        'client_no_context_takeover; server_no_context_takeover',
      '\r\n'
    ].join('\r\n')
  );
});

server.listen(function () {
  const ws = new WebSocket(`ws://localhost:${server.address().port}`);

  ws.on('open', function () {
    ws.send('foo');
  });

  ws.on('message', console.log);
});

One thing I noticed is that our server behaves slightly differently

const WebSocket = require('ws');

const server = new WebSocket.Server(
  {
    perMessageDeflate: {
      clientNoContextTakeover: true,
      serverNoContextTakeover: true,
      threshold: 0
    },
    port: 0
  },
  function () {
    const ws = new WebSocket(`ws://localhost:${server.address().port}`);

    ws.on('open', function () {
      ws._socket.on('data', console.log);
      ws.send('foo');
    });

    ws.on('message', console.log);
  }
);

server.on('connection', function (ws) {
  ws.on('error', console.error);
  ws.on('message', function (message) {
    console.log(message);

    ws.send('12345678');
    ws.send('12345678');
  });
});

The frame with ws server is c1 0a 32 34 32 36 31 35 33 b7 00 00 and not c1 0b 33 34 32 36 31 35 33 b7 00 00 00.

I will dig deeper later.

@lpinca
Copy link
Member

lpinca commented Mar 5, 2021

Here is a test case that reproduces the issue using only the Node.js zlib module:

const zlib = require('zlib');

const TRAILER = Buffer.from([0x00, 0x00, 0xff, 0xff]);
const data = Buffer.from([
  0x33,
  0x34,
  0x32,
  0x36,
  0x31,
  0x35,
  0x33,
  0xb7,
  0x00,
  0x00,
  0x00
]);

const inflate = zlib.createInflateRaw({
  windowBits: zlib.Z_DEFAULT_WINDOWBITS
});

inflate.resume();

inflate.on('error', console.error);

inflate.write(data);
inflate.write(TRAILER);
inflate.flush(function () {
  inflate.reset();

  inflate.write(data);
  inflate.write(TRAILER);
  inflate.flush(function() {
    inflate.reset();
  })
});

If the second inflate.reset() is removed the following error is emitted

Error [ERR_STREAM_PUSH_AFTER_EOF]: stream.push() after EOF
    at new NodeError (node:internal/errors:329:5)
    at readableAddChunk (node:internal/streams/readable:276:30)
    at InflateRaw.Readable.push (node:internal/streams/readable:227:10)
    at Zlib.processCallback (node:zlib:553:10) {
  code: 'ERR_STREAM_PUSH_AFTER_EOF'
}

so it seems the stream already ended when the first inflate.flush() callback is called.

This is a bug in ws. We should check if the stream ended before writing more data to it.

@lpinca
Copy link
Member

lpinca commented Mar 5, 2021

Something like this

diff --git a/lib/permessage-deflate.js b/lib/permessage-deflate.js
index 74bf14a..a8974b9 100644
--- a/lib/permessage-deflate.js
+++ b/lib/permessage-deflate.js
@@ -376,11 +376,16 @@ class PerMessageDeflate {
         this._inflate[kTotalLength]
       );
 
-      this._inflate[kTotalLength] = 0;
-      this._inflate[kBuffers] = [];
+      if (this._inflate._readableState.endEmitted) {
+        this._inflate.close();
+        this._inflate = null;
+      } else {
+        this._inflate[kTotalLength] = 0;
+        this._inflate[kBuffers] = [];
 
-      if (fin && this.params[`${endpoint}_no_context_takeover`]) {
-        this._inflate.reset();
+        if (fin && this.params[`${endpoint}_no_context_takeover`]) {
+          this._inflate.reset();
+        }
       }
 
       callback(null, data);

should fix the issue but I have to add a regression test and check if it works on all supported Node.js versions. I'll try to do that tomorrow.

@jeffrson
Copy link

jeffrson commented Mar 5, 2021

Oh - great work! I had only tested with some of zlib options of WebSocket.Server but could not reproduce.

@cTn-dev
Copy link
Contributor Author

cTn-dev commented Mar 6, 2021

@lpinca Is this an "expected" behavior of zlib triggered by the trailer shown above that some other WS libraries use ?
I assume RFC 7692 is "loose enough" to allow for a slightly different compressor/decompressor "reset" logic which this regression falls under?

@lpinca
Copy link
Member

lpinca commented Mar 6, 2021

@cTn-dev yes, see #1669 (comment). See also nodejs/node@28db96f and nodejs/node@0e89b64.

The test cases above (#1840 (comment) and #1840 (comment)) should work without errors on Node.js < 12.

@lpinca
Copy link
Member

lpinca commented Mar 6, 2021

@schoudhary21 @jeffrson the issue should be fixed in ws@7.4.4.

@jeffrson
Copy link

jeffrson commented Mar 7, 2021

Yes, it's fixed here. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants