Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: utf8 -> utf16 decoding bug on surrogate pairs #1486

Merged
merged 4 commits into from Oct 9, 2020

Commits on Sep 10, 2020

  1. fix utf8 -> utf16 decoding bug on surrogate pairs

    This fixes protobufjs#1473
    
    The custom utf8 -> utf16 decoder appears to be subtly flawed. From my reading it appears the chunking mechanism doesn't account for surrogate pairs at the end of a chunk causing variable size chunks. A larger chunk followed by a smaller chunk leaves behind garbage that'll be included in the latter chunk.
    
    It looks like the chunking mechanism was added to prevent stack overflows when calling `formCharCode` with too many args. From some benchmarking it appears putting utf16 code units in an array and spreading that into `fromCharCode` wasn't helping performance much anyway. I simplified it significantly.
    
    Here's a repro of the existing encoding bug in a fuzzing suite
    https://repl.it/@turbio/oh-no-our-strings#decoder.js
    turbio committed Sep 10, 2020
    Copy the full SHA
    e798c43 View commit details
    Browse the repository at this point in the history
  2. fix lint

    turbio committed Sep 10, 2020
    Copy the full SHA
    696acac View commit details
    Browse the repository at this point in the history

Commits on Sep 22, 2020

  1. Copy the full SHA
    de742f2 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2020

  1. Copy the full SHA
    8b58788 View commit details
    Browse the repository at this point in the history