Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of surrogates on encoding #530

Merged
merged 3 commits into from Jun 1, 2022

Commits on May 30, 2022

  1. Fix handling of surrogates on encoding

    This allows surrogates anywhere in the input, compatible with the json module from the standard library.
    
    This also refactors two interfaces:
    - The `PyUnicode` to `char*` conversion is moved into its own function, separated from the `JSONTypeContext` handling, so it can be reused for other things in the future (e.g. indentation and separators) which don't have a type context.
    - Converting the `char*` output to a Python string with surrogates intact requires the string length for `PyUnicode_Decode` & Co. While `strlen` could be used, the length is already known inside the encoder, so the encoder function now also takes an extra `size_t` pointer argument to return that and no longer NUL-terminates the string. This also permits output that contains NUL bytes (even though that would be invalid JSON), e.g. if an object's `__json__` method return value were to contain them.
    
    Fixes ultrajson#156
    Fixes ultrajson#447
    Fixes ultrajson#537
    Supersedes ultrajson#284
    JustAnotherArchivist committed May 30, 2022
    Copy the full SHA
    9b9af1a View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    98321fa View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    59aa3bf View commit details
    Browse the repository at this point in the history