Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of surrogate pseudocharacters under Python 3. #284

Closed
wants to merge 3 commits into from

Commits on Aug 29, 2017

  1. Fix handling of surrogate pseudocharacters under Python 3.

    This is a situation where we have a Python unicode string which doesn't
    consist entirely of genuine Unicode characters -- some of the codepoints
    in the string are surrogate codepoints, which occur in a UTF-16 encoding
    of a string and were also repurposed in PEP 383 for losslessly encoding
    arbitrary mostly-UTF-8 bytestrings (like Unix filenames) in Python
    strings.  Currently, on Python 3, we cause a UnicodeEncodeError if we
    try to encode such a string as JSON.
    
    It's not 100% obvious what the right thing to do here is -- this
    situation seems like it must reflect a bug somewhere else in the
    program or its environment.  But
    
     * one way we can get such a string is by loading a JSON document
       (perhaps an invalid JSON document? anyway, we load it without error):
    
       >>> ujson.dumps(ujson.loads('"\\udcff"'))
       Traceback (most recent call last):
         File "<stdin>", line 1, in <module>
       UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 0: surrogates not allowed
    
     * we already pass these strings through without complaint on Python 2;
    
     * as the included test shows, passing these through matches the
       behavior of the stdlib's `json` module.
    
    So it seems best to pass them through.
    
    Fixes ultrajson#156.
    gnprice committed Aug 29, 2017
    Configuration menu
    Copy the full SHA
    7d5105e View commit details
    Browse the repository at this point in the history

Commits on Feb 25, 2020

  1. Configuration menu
    Copy the full SHA
    35d8a7e View commit details
    Browse the repository at this point in the history
  2. Fix lint

    hugovk committed Feb 25, 2020
    Configuration menu
    Copy the full SHA
    d55f38c View commit details
    Browse the repository at this point in the history