Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wrong encoding detected for empty JSON response #58

Closed
tseaver opened this issue Jul 14, 2021 · 3 comments · Fixed by #59
Closed

[BUG] Wrong encoding detected for empty JSON response #58

tseaver opened this issue Jul 14, 2021 · 3 comments · Fixed by #59
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@tseaver
Copy link

tseaver commented Jul 14, 2021

Describe the bug

requests 2.26.0 switched to using charset_normalizer by default under Python 3 (see: googleapis/python-cloud-core#117). After this change, a response constructed with an empty JSON body (b"{}) can no longer unmarshall to the empty dict in its json method.

To Reproduce

>>> import charset_normalizer
>>> empty_json_response = b"{}"
>>> detected = charset_normalizer.detect(empty_json_response)
....nox/unit-3-6/lib/python3.6/site-packages/charset_normalizer/api.py:95: UserWarning: Trying to detect encoding from a tiny portion of (2) byte(s).
  warn('Trying to detect encoding from a tiny portion of ({}) byte(s).'.format(length))
>>> detected
{'encoding': 'utf_16_be', 'language': '', 'confidence': 1.0}
>>> decoded = empty_json_response.decode(detected["encoding"])
>>> decoded
'筽'
>>> import json
>>> json.loads(decoded)
/opt/Python-3.6.10/lib/python3.6/json/__init__.py:354: in loads
    return _default_decoder.decode(s)
/opt/Python-3.6.10/lib/python3.6/json/decoder.py:339: in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <json.decoder.JSONDecoder object at 0x7fd0b5810860>, s = '筽', idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
    
        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.
    
        """
        try:
            obj, end = self.scan_once(s, idx)
        except StopIteration as err:
>           raise JSONDecodeError("Expecting value", s, err.value) from None
E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Expected behavior
I expect to be able unmarshall b"{}" into an empty dict, {}.

Desktop (please complete the following information):

  • OS: Linux
  • Python version: 3.6, 3.7, 3.8, 3.9
  • Package version: 2.0.1
@tseaver tseaver added bug Something isn't working help wanted Extra attention is needed labels Jul 14, 2021
@tseaver tseaver changed the title [BUG] Wrong encoding detected for empy JSON response [BUG] Wrong encoding detected for empty JSON response Jul 14, 2021
@Ousret
Copy link
Owner

Ousret commented Jul 14, 2021

Hi @tseaver

Thanks for the report,
I was able to reproduce this easily. The miss-detect fault happen in the plugin TooManySymbolOrPunctuationPlugin(MessDetectorPlugin).
Adjustments are to be done in this plugin.

@Ousret
Copy link
Owner

Ousret commented Jul 14, 2021

The following PR should fix your issue #59
If you have any more concerns, do not hesitate.

@Ousret
Copy link
Owner

Ousret commented Jul 14, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants