Fix unicode chunking issue #108

rexxars · 2018-08-22T14:09:50Z

If you send a message with unicode characters which ends up being split into two chunks on the receiving end, the text will be garbled because we are simply concatenating to a string.

I had to implement a fix for this quickly and publish it to an internal branch and couldn't (quickly) figure out how to modify the current code to be unicode safe, so I reimplemented the streaming/parsing using some higher-level modules that I find makes stream handling easier (through, split and pump).

I realize this adds dependencies to an otherwise nearly dependency-free project, so I'd be happy if anyone would take a look at an alternative solution. Still raising this as a PR to show the issue in question, and I've also added some tests to reproduce the issue.

piranna · 2018-08-22T14:23:10Z

If that's the problem, then the solution would be as simple as work with Buffer object instead of strings, just until it can be detected the event is complete. Also, both though2 and pump are useless now, you can use instead Node.js Transform streams and pipeline function.

rexxars · 2018-08-22T18:35:29Z

I think we need to establish a node version support policy. Only supporting node 10 seems very premature to me. Node LTS seems logical to me. I'll see if I can fix this one in a cleaner way, though.

rexxars · 2018-08-23T07:36:59Z

OK, I've rewritten the PR - it now has minimal changes from the original source, basically just using buffers instead of strings where possible, and adding a check for the BOM mark at the start of the stream. No new dependencies (except in development), and maintains compatibility all the way down to node 0.12.

piranna · 2018-08-23T08:33:38Z

lib/eventsource.js

+var colon = 58
+var space = 32
+var lineFeed = 10
+var carriageReturn = 13


Instead of magic numbers, I would have used their escape codes... :-/

'\n'.charCodeAt(0), or do you mean actually comparing to \n in code?
If you mean the first, sure, simple change.
If you mean the second, that would require converting the buffer to a string repeatedly, which seems excessive and slow.

I was sure the second one would work, but first one would be good too. This can be left for a future refactor.

piranna · 2018-08-23T08:34:22Z

lib/eventsource.js

+function hasBom (buf) {
+  return bom.every(function (charCode, index) {
+    return buf[index] === charCode
+  })


slice + compare Buffer functions

Constructing a buffer for the BOM (so we have something to compare to) would require one of the following:

Using Buffer.from() which isn't present in all node versions we currently support

Using new Buffer() which is deprecated in newer versions of node

Adding and using buffer-from module (introducing a dependency)

Replicate logic from buffer-from module, introducing node version checks in code

Performance-wise, we are only doing this for the start of the stream, so shouldn't matter. If we bump the minimum version to node 6 for this module and bump the version to 2.x, we could introduce a bunch of similar changes, but for now I think this is an acceptable fix without breaking compatibility with older versions of node.

rexxars requested review from aslakhellesoy and piranna August 22, 2018 14:09

rexxars force-pushed the unicode-safe branch from aba2c2c to 1a91936 Compare August 23, 2018 07:25

Fix unicode chunking issue, ignore BOM mark

e28e668

rexxars force-pushed the unicode-safe branch from 1b247de to e28e668 Compare August 23, 2018 07:35

rexxars merged commit f073e55 into EventSource:master Aug 23, 2018

rexxars deleted the unicode-safe branch August 23, 2018 07:39

piranna reviewed Aug 23, 2018

View reviewed changes

aslakhellesoy mentioned this pull request Oct 9, 2018

Incomplete message parsing #113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unicode chunking issue #108

Fix unicode chunking issue #108

rexxars commented Aug 22, 2018

piranna commented Aug 22, 2018

rexxars commented Aug 22, 2018

rexxars commented Aug 23, 2018

piranna Aug 23, 2018

rexxars Aug 23, 2018

piranna Aug 23, 2018

piranna Aug 23, 2018

rexxars Aug 23, 2018 •

edited

Navigation Menu

Fix unicode chunking issue #108

Fix unicode chunking issue #108

Conversation

rexxars commented Aug 22, 2018

piranna commented Aug 22, 2018

rexxars commented Aug 22, 2018

rexxars commented Aug 23, 2018

piranna Aug 23, 2018

Choose a reason for hiding this comment

rexxars Aug 23, 2018

Choose a reason for hiding this comment

piranna Aug 23, 2018

Choose a reason for hiding this comment

piranna Aug 23, 2018

Choose a reason for hiding this comment

rexxars Aug 23, 2018 • edited

Choose a reason for hiding this comment

rexxars Aug 23, 2018 •

edited