fix: utf8 -> utf16 decoding bug on surrogate pairs #1486

This fixes protobufjs#1473 The custom utf8 -> utf16 decoder appears to be subtly flawed. From my reading it appears the chunking mechanism doesn't account for surrogate pairs at the end of a chunk causing variable size chunks. A larger chunk followed by a smaller chunk leaves behind garbage that'll be included in the latter chunk. It looks like the chunking mechanism was added to prevent stack overflows when calling `formCharCode` with too many args. From some benchmarking it appears putting utf16 code units in an array and spreading that into `fromCharCode` wasn't helping performance much anyway. I simplified it significantly. Here's a repro of the existing encoding bug in a fuzzing suite https://repl.it/@turbio/oh-no-our-strings#decoder.js

Commits on Sep 22, 2020

add test case for surrogate pair bug

turbio committed Sep 22, 2020

Copy the full SHA

de742f2 View commit details

Browse the repository at this point in the history

Commits on Oct 9, 2020

Merge branch 'master' into patch-1

alexander-fenster committed Oct 9, 2020

Copy the full SHA

8b58788 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: utf8 -> utf16 decoding bug on surrogate pairs #1486

fix: utf8 -> utf16 decoding bug on surrogate pairs #1486

Commits on Sep 10, 2020

Commits on Sep 22, 2020

Commits on Oct 9, 2020