Fix test issue caused by change in Requests library #1762

MaybeNetwork · 2021-07-16T08:44:13Z

Feature Summary and Justification

Requests recently changed the library they use for the detection of content encoding. If you are running Python 3, and chardet, the old library they used, is not present on your system, the new library charset_normalizer is used. Unfortunately, charset_normalizer has some trouble detecting the content encoding when the content size is small, which causes the test TestSubreddit.test_submit_image_large to fail. The solution offered here is to specify the content-encoding in the affected test.

Here is an example of the failed test, which happened in 1760. That PR just happened to be the first one made after the test environment was updated, and the change suggested in it was not responsible for the test's failure.

In the affected test,

reddit.subreddit("test").submit_image("test", tempfile.name)

is fed a mock response, constructed as

response = requests.Response()
response._content = mock_data.encode("utf-8")
response.status_code = 400

Subreddit.submit_image calls Subreddit._submit_media, which calls Subreddit._parse_xml_response, which takes response defined above and sets

xml = response.text

Since the character encoding in response wasn't specified, Requests's text property checks the apparent encoding. The test environment doesn't have chardet, so charset_normalizer is called to check the encoding, and it incorrectly guesses that the encoding is UTF-16BE. The response is then decoded with the wrong encoding, which xml.etree.ElementTree.XML fails to parse.

Here is a script that shows more details:

import chardet
import charset_normalizer

def main():
    mock_data = (
        '<?xml version="1.0" encoding="UTF-8"?>'
        "<Error>"
        "<Code>EntityTooLarge</Code>"
        "<Message>Your proposed upload exceeds the maximum allowed size</Message>"
        "<ProposedSize>20971528</ProposedSize>"
        "<MaxSizeAllowed>21971520</MaxSizeAllowed>"
        "<RequestId>23F056D6990D87E0</RequestId>"
        "<HostId>iYEVOuRfbLiKwMgHt2ewqQRIm0NWL79uiC2rPLj9P0PwW554MhjY2/O8d9JdKTf1iwzLjwWMnGQ=</HostId>"
        "</Error>"
    )
    chardet_detection = chardet.detect(mock_data.encode('utf-8'))
    charset_normalizer_detection = charset_normalizer.detect(mock_data.encode('utf-8'))
    print(chardet_detection)
    print(mock_data.encode('utf-8').decode(chardet_detection['encoding']))
    print(charset_normalizer_detection)
    print(mock_data.encode('utf-8').decode(charset_normalizer_detection['encoding']))

if __name__ == "__main__":
    main()

which produces this output:

{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
<?xml version="1.0" encoding="UTF-8"?><Error><Code>EntityTooLarge</Code><Message>Your proposed upload exceeds the maximum allowed size</Message><ProposedSize>20971528</ProposedSize><MaxSizeAllowed>21971520</MaxSizeAllowed><RequestId>23F056D6990D87E0</RequestId><HostId>iYEVOuRfbLiKwMgHt2ewqQRIm0NWL79uiC2rPLj9P0PwW554MhjY2/O8d9JdKTf1iwzLjwWMnGQ=</HostId></Error>
{'encoding': 'utf_16_be', 'language': '', 'confidence': 0.867}
㰿硭氠癥牳楯渽∱⸰∠敮捯摩湧㴢啔䘭㠢㼾㱅牲潲㸼䍯摥㹅湴楴祔潯䱡牧攼⽃潤放㱍敳獡来㹙潵爠灲潰潳敤⁵灬潡搠數捥敤猠瑨攠浡硩浵洠慬汯睥搠獩穥㰯䵥獳慧放㱐牯灯獥摓楺放㈰㤷ㄵ㈸㰯偲潰潳敤卩穥㸼䵡硓楺敁汬潷敤㸲ㄹ㜱㔲〼⽍慸卩穥䅬汯睥搾㱒敱略獴䥤㸲㍆〵㙄㘹㤰䐸㝅〼⽒敱略獴䥤㸼䡯獴䥤㹩奅噏畒晢䱩䭷䵧䡴㉥睱兒䥭ぎ坌㜹畩䌲牐䱪㥐ぐ睗㔵㑍桪夲⽏㡤㥊摋呦ㅩ睺䱪睗䵮䝑㴼⽈潳瑉搾㰯䕲牯爾

You can see that the last line is the same bizarre Chinese string shown in the failed test.

References

bboe · 2021-07-16T15:56:02Z

I see that this gets the test to pass. Does Reddit's actual response, if the request were issued, include the actual encoding?

MaybeNetwork · 2021-07-16T22:41:37Z

I see that this gets the test to pass. Does Reddit's actual response, if the request were issued, include the actual encoding?

No, it does not. Another option is leave the test as it was before, and force the response to be decoding as UTF-8 by changing

praw/praw/models/reddit/subreddit.py

Lines 590 to 592 in c80b92c

    
           def _parse_xml_response(self, response: Response): 
        
               """Parse the XML from a response and raise any errors found.""" 
        
               xml = response.text

to

def _parse_xml_response(self, response: Response):
    """Parse the XML from a response and raise any errors found."""
    response.encoding = "utf-8"
    xml = response.text

This also makes the test pass, and would help users that don't have chardet. There may be downsides to this, though, as I don't know if Reddit always returns that particular response in UTF-8.

MaybeNetwork · 2021-07-17T10:07:00Z

Charset_normalizer updated to version 2.0.3 several hours after I submitted this, and the test no longer fails.

There may be people out there who have versions <2.0.3, which, in addition to the troubles described above, have a bug that causes the library to fail to detect any encoding at all. I encountered that bug while running tests on an actual response returned by Reddit, so it's possible that others could run into it too.

I restored the test, and added the one line that forces utf-8 instead. If that feature isn't worth adding, this PR can be closed.

github-actions · 2021-08-06T12:12:28Z

This PR is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days.

github-actions · 2021-08-16T12:14:38Z

This PR was closed because it has been stale for 10 days with no activity.

Fix test issue caused by change in Requests library

bbaa469

MaybeNetwork force-pushed the requests-encoding-issue branch from 4de8233 to bbaa469 Compare July 17, 2021 10:01

github-actions bot added the Stale Issue or pull request has been inactive for 20 days label Aug 6, 2021

github-actions bot added the Auto-closed - Stale Automatically closed due to being stale for too long label Aug 16, 2021

github-actions bot closed this Aug 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test issue caused by change in Requests library #1762

Fix test issue caused by change in Requests library #1762

MaybeNetwork commented Jul 16, 2021

bboe commented Jul 16, 2021 •

edited

MaybeNetwork commented Jul 16, 2021

MaybeNetwork commented Jul 17, 2021

github-actions bot commented Aug 6, 2021

github-actions bot commented Aug 16, 2021

Fix test issue caused by change in Requests library #1762

Fix test issue caused by change in Requests library #1762

Conversation

MaybeNetwork commented Jul 16, 2021

Feature Summary and Justification

References

bboe commented Jul 16, 2021 • edited

MaybeNetwork commented Jul 16, 2021

MaybeNetwork commented Jul 17, 2021

github-actions bot commented Aug 6, 2021

github-actions bot commented Aug 16, 2021

bboe commented Jul 16, 2021 •

edited