New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unicode chunking issue #108
Conversation
If that's the problem, then the solution would be as simple as work with Buffer object instead of strings, just until it can be detected the event is complete. Also, both though2 and pump are useless now, you can use instead Node.js Transform streams and pipeline function. |
I think we need to establish a node version support policy. Only supporting node 10 seems very premature to me. Node LTS seems logical to me. I'll see if I can fix this one in a cleaner way, though. |
OK, I've rewritten the PR - it now has minimal changes from the original source, basically just using buffers instead of strings where possible, and adding a check for the BOM mark at the start of the stream. No new dependencies (except in development), and maintains compatibility all the way down to node 0.12. |
var colon = 58 | ||
var space = 32 | ||
var lineFeed = 10 | ||
var carriageReturn = 13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of magic numbers, I would have used their escape codes... :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'\n'.charCodeAt(0)
, or do you mean actually comparing to \n
in code?
If you mean the first, sure, simple change.
If you mean the second, that would require converting the buffer to a string repeatedly, which seems excessive and slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was sure the second one would work, but first one would be good too. This can be left for a future refactor.
function hasBom (buf) { | ||
return bom.every(function (charCode, index) { | ||
return buf[index] === charCode | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slice + compare Buffer functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constructing a buffer for the BOM (so we have something to compare to) would require one of the following:
- Using
Buffer.from()
which isn't present in all node versions we currently support - Using
new Buffer()
which is deprecated in newer versions of node - Adding and using
buffer-from
module (introducing a dependency) - Replicate logic from
buffer-from
module, introducing node version checks in code
Performance-wise, we are only doing this for the start of the stream, so shouldn't matter. If we bump the minimum version to node 6 for this module and bump the version to 2.x, we could introduce a bunch of similar changes, but for now I think this is an acceptable fix without breaking compatibility with older versions of node.
If you send a message with unicode characters which ends up being split into two chunks on the receiving end, the text will be garbled because we are simply concatenating to a string.
I had to implement a fix for this quickly and publish it to an internal branch and couldn't (quickly) figure out how to modify the current code to be unicode safe, so I reimplemented the streaming/parsing using some higher-level modules that I find makes stream handling easier (through, split and pump).
I realize this adds dependencies to an otherwise nearly dependency-free project, so I'd be happy if anyone would take a look at an alternative solution. Still raising this as a PR to show the issue in question, and I've also added some tests to reproduce the issue.