Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode chunking issue #108

Merged
merged 1 commit into from Aug 23, 2018
Merged

Conversation

rexxars
Copy link
Member

@rexxars rexxars commented Aug 22, 2018

If you send a message with unicode characters which ends up being split into two chunks on the receiving end, the text will be garbled because we are simply concatenating to a string.

I had to implement a fix for this quickly and publish it to an internal branch and couldn't (quickly) figure out how to modify the current code to be unicode safe, so I reimplemented the streaming/parsing using some higher-level modules that I find makes stream handling easier (through, split and pump).

I realize this adds dependencies to an otherwise nearly dependency-free project, so I'd be happy if anyone would take a look at an alternative solution. Still raising this as a PR to show the issue in question, and I've also added some tests to reproduce the issue.

@piranna
Copy link
Member

piranna commented Aug 22, 2018

If that's the problem, then the solution would be as simple as work with Buffer object instead of strings, just until it can be detected the event is complete. Also, both though2 and pump are useless now, you can use instead Node.js Transform streams and pipeline function.

@rexxars
Copy link
Member Author

rexxars commented Aug 22, 2018

I think we need to establish a node version support policy. Only supporting node 10 seems very premature to me. Node LTS seems logical to me. I'll see if I can fix this one in a cleaner way, though.

@rexxars
Copy link
Member Author

rexxars commented Aug 23, 2018

OK, I've rewritten the PR - it now has minimal changes from the original source, basically just using buffers instead of strings where possible, and adding a check for the BOM mark at the start of the stream. No new dependencies (except in development), and maintains compatibility all the way down to node 0.12.

@rexxars rexxars merged commit f073e55 into EventSource:master Aug 23, 2018
@rexxars rexxars deleted the unicode-safe branch August 23, 2018 07:39
var colon = 58
var space = 32
var lineFeed = 10
var carriageReturn = 13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of magic numbers, I would have used their escape codes... :-/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'\n'.charCodeAt(0), or do you mean actually comparing to \n in code?
If you mean the first, sure, simple change.
If you mean the second, that would require converting the buffer to a string repeatedly, which seems excessive and slow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sure the second one would work, but first one would be good too. This can be left for a future refactor.

function hasBom (buf) {
return bom.every(function (charCode, index) {
return buf[index] === charCode
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slice + compare Buffer functions

Copy link
Member Author

@rexxars rexxars Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructing a buffer for the BOM (so we have something to compare to) would require one of the following:

  • Using Buffer.from() which isn't present in all node versions we currently support
  • Using new Buffer() which is deprecated in newer versions of node
  • Adding and using buffer-from module (introducing a dependency)
  • Replicate logic from buffer-from module, introducing node version checks in code

Performance-wise, we are only doing this for the start of the stream, so shouldn't matter. If we bump the minimum version to node 6 for this module and bump the version to 2.x, we could introduce a bunch of similar changes, but for now I think this is an acceptable fix without breaking compatibility with older versions of node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants