Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit size of payload in JSONDecodeError #6036

Merged
merged 1 commit into from Jan 13, 2022

Conversation

chyzzqo2
Copy link
Contributor

@chyzzqo2 chyzzqo2 commented Jan 7, 2022

2.27 introduced a change in behavior where now the exception raised by parsing invalid data as json contains the full body of the invalid response. This gets included it's string representation. This can cause problems when the response is very large. This PR tries to limit the size of the response that we store this way, to what might be around the reported error position. But we could also just return to first n bytes or remove the response altogether and let users fetch it, if needed from the error.response object.

@sigmavirus24
Copy link
Contributor

Is the problem that it gets included in the string representation or that it's included? If it's the former, I'm far more inclined to just not show it there.

@chyzzqo2
Copy link
Contributor Author

chyzzqo2 commented Jan 7, 2022

Mostly the fact that it is in the string representation. We could just override that in the class. Or we could make the doc a kwarg that we pop like response. The actual args for IOError are (errno, strerror[, filename[, winerror[, filename2]]]). I don't think that doc fits very cleanly into any of those.

@chyzzqo2
Copy link
Contributor Author

chyzzqo2 commented Jan 7, 2022

Here's an alternative the just reverses the init order so that the message comes from the base json decode error rather than the ioerror

@nateprewitt
Copy link
Member

Thanks @chyzzqo2, let's see how the tests run. I think this is closer to the right approach. doc has always been included as a str in Python 3 for JSONDecodeError but it looks like we're passing it as the wrong argument in the new exception.

@nateprewitt
Copy link
Member

Alright tests look good and I did some quick spot checking with the script below. This seems to be relatively in-line with previous behavior (2.26.0). We're no longer mis-populating fields.

We do have a number of new unpopulated fields from the multi-inheritance but I don't think that's a deal breaker for now. We'll consider this change a bugfix. If there's enough demand, we may choose to write something to appropriately populate all of the exception metadata later based on which json module is used. We can do this as an enhancement though.

import platform
import requests

print(requests.__version__)
print(platform.python_version())

r = requests.get('https://www.python.org/psf/')
try:
    r.json()
except Exception as e:
    attributes = [i for i in dir(e) if not i.startswith('_')]
    print('Attributes: {}'.format(attributes))
    if hasattr(e, 'doc'):
        err = "Error Type: {}, Doc type: {}, Doc len: {}"
        print(err.format(type(e), type(e.doc), len(e.doc)))
    else:
        print("No doc attribute present.")

    print(e)

Requests 2.26.0

Python 3.6 (stdlib)

2.26.0
3.6.15
Attributes: ['args', 'colno', 'doc', 'lineno', 'msg', 'pos', 'with_traceback']
Error Type: <class 'json.decoder.JSONDecodeError'>, Doc type: <class 'str'>, Doc len: 35177
Expecting value: line 3 column 1 (char 2)

Python 3.6 (simplejson)

2.26.0
3.6.15
Attributes: ['args', 'colno', 'doc', 'end', 'endcolno', 'endlineno', 'lineno', 'msg', 'pos', 'with_traceback']
Error Type: <class 'simplejson.errors.JSONDecodeError'>, Doc type: <class 'str'>, Doc len: 35177
Expecting value: line 3 column 1 (char 2)

Python 3.10 (std lib)

2.26.0
3.10.1
Attributes: ['args', 'colno', 'doc', 'lineno', 'msg', 'pos', 'with_traceback']
Error Type: <class 'json.decoder.JSONDecodeError'>, Doc type: <class 'str'>, Doc len: 35177
Expecting value: line 3 column 1 (char 2)

Python 3.10 (simplejson)

2.26.0
3.10.1
Attributes: ['args', 'colno', 'doc', 'end', 'endcolno', 'endlineno', 'lineno', 'msg', 'pos', 'with_traceback']
Error Type: <class 'simplejson.errors.JSONDecodeError'>, Doc type: <class 'str'>, Doc len: 35177
Expecting value: line 3 column 1 (char 2)

Requests (PR #6036)

Python 3.6 (std lib)

2.27.1
3.6.15
Attributes: ['args', 'characters_written', 'colno', 'doc', 'errno', 'filename', 'filename2', 'lineno', 'msg', 'pos', 'request', 'response', 'strerror', 'with_traceback']
Error Type: <class 'requests.exceptions.JSONDecodeError'>, Doc type: <class 'str'>, Doc len: 35177
Expecting value: line 3 column 1 (char 2)

Python 3.6 (simplejson)

2.27.1
3.6.15
Attributes: ['args', 'characters_written', 'colno', 'doc', 'end', 'endcolno', 'endlineno', 'errno', 'filename', 'filename2', 'lineno', 'msg', 'pos', 'request', 'response', 'strerror', 'with_traceback']
Error Type: <class 'requests.exceptions.JSONDecodeError'>, Doc type: <class 'str'>, Doc len: 35177
Expecting value: line 3 column 1 (char 2)

Python 3.10 (std lib)

2.27.1
3.10.1
Attributes: ['args', 'characters_written', 'colno', 'doc', 'errno', 'filename', 'filename2', 'lineno', 'msg', 'pos', 'request', 'response', 'strerror', 'with_traceback']
Error Type: <class 'requests.exceptions.JSONDecodeError'>, Doc type: <class 'str'>, Doc len: 35177
Expecting value: line 3 column 1 (char 2)

Python 3.10 (simplejson)

2.27.1
3.10.1
Attributes: ['args', 'characters_written', 'colno', 'doc', 'end', 'endcolno', 'endlineno', 'errno', 'filename', 'filename2', 'lineno', 'msg', 'pos', 'request', 'response', 'strerror', 'with_traceback']
Error Type: <class 'requests.exceptions.JSONDecodeError'>, Doc type: <class 'str'>, Doc len: 35177
Expecting value: line 3 column 1 (char 2)

Copy link
Contributor

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any tests we can add to verify this doesn't revert back to containing the entirety of doc in the __repr__? It's also unclear how this prevents that from happening on a visual inspection

requests/exceptions.py Show resolved Hide resolved
requests/exceptions.py Show resolved Hide resolved
@chyzzqo2 chyzzqo2 force-pushed the json-err-size branch 2 times, most recently from 1a0b870 to 7f02d09 Compare January 8, 2022 21:48
Copy link
Member

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few small notes but I think we're close to wrapping this up if @sigmavirus24 doesn't have more concerns.

requests/exceptions.py Outdated Show resolved Hide resolved
tests/test_requests.py Show resolved Hide resolved
assert isinstance(excinfo.value, RequestException)
assert isinstance(excinfo.value, JSONDecodeError)
assert data not in str(excinfo.value)
if is_py3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also somewhat of a nit. It may be better to just make this its own test with a @unittest.skipIf(not is_py3) at the top.

e.g.

@unittest.skipIf(not is_py3)
def test_json_decode_persists_doc_attr(self, httpbin):
    r = requests.get(httpbin('bytes/20'))
    with pytest.raises(requests.exceptions.JSONDecodeError) as e:
        r.json()
    assert e.doc == r.text

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want to test that the py2 object is also what we expect? Or if we don't can i can make it py3 only?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test doesn't assert anything about Python 2 either. So we don't lose anything with this change. I suppose we could assert doc isn't on the Python 2 exception but I'm not sure how meaningful that is.

Edit: to be clear I'm proposing two tests. The one you have and moving the Python 3 specifics to its own test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i think i updated it with what you had in mind.

@chyzzqo2 chyzzqo2 force-pushed the json-err-size branch 2 times, most recently from 42b1864 to 045a8c5 Compare January 11, 2022 00:55
Copy link
Member

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the updates @chyzzqo2! I think there were a couple more things we discussed in the last review I left comments on. Let me know if you have any questions.

requests/exceptions.py Outdated Show resolved Hide resolved
tests/test_requests.py Outdated Show resolved Hide resolved
tests/test_requests.py Outdated Show resolved Hide resolved
That way we get the formated error message
Copy link
Member

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @chyzzqo2!

Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me.

Copy link
Contributor

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced this is maintainable or correct but I also understand that it's probably a tonne of work to get compatibility on python 2 and 3 that isn't a scary nightmare

@nateprewitt nateprewitt dismissed sigmavirus24’s stale review January 13, 2022 15:41

Sigmavirus24 rereviewed but GH is still blocking on original change request :/

@nateprewitt nateprewitt merged commit fa1b0a3 into psf:main Jan 13, 2022
nijel added a commit to WeblateOrg/weblate that referenced this pull request Feb 24, 2022
This will be fixed in requests as well, see
psf/requests#6036

Fixes WEBLATE-8XX
@nateprewitt nateprewitt added this to the 2.28.0 milestone Apr 7, 2022
@nateprewitt nateprewitt mentioned this pull request Jun 8, 2022
soxofaan added a commit to Open-EO/openeo-python-driver that referenced this pull request Jun 14, 2022
Also require "requests>=2.28.0" to avoid excessive JSONDecodeError message size (psf/requests#6036)
@mykolasolodukha
Copy link

Hello! I'm facing an issue now (after upgrading to requests==2.28.0) with this update.

Basically, I want to test the try/except requests.JSONDecodeError paths in my application using pytest. At some point in tests I do mock_request.return_value.json.side_effect = requests.JSONDecodeError() which is now unable to run because

self = JSONDecodeError(), args = (), kwargs = {}

    def __init__(self, *args, **kwargs):
        """
        Construct the JSONDecodeError instance first with all
        args. Then use it's args to construct the IOError so that
        the json specific args aren't used as IOError specific args
        and the error message from JSONDecodeError is preserved.
        """
>       CompatJSONDecodeError.__init__(self, *args)
E       TypeError: JSONDecodeError.__init__() missing 3 required positional arguments: 'msg', 'doc', and 'pos'

Anything we should be concerned about at this point, after the PR is released? Or is there something I do wrong?

@nateprewitt
Copy link
Member

Hi @mykolasolodukha,

Being able to create an empty JSONDecodeError wasn't originally intended when implementing this error. The function signature was kept ambiguous due to differences between Python2/3 and these exceptions generally aren't intended to be reused externally. We may be able to make the signature more explicit now but in the interim, you can instantiate the error similar to below:

requests.JSONDecodeError('','',0)

The stdlib JSONDecodeError takes argument types str, str, int for msg, doc, pos respectively.

@mykolasolodukha
Copy link

Thanks a lot @nateprewitt !

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants