Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] JSON parsing of responses containing goofy Unicode characters fails #2011

Open
3 tasks done
strangelydim opened this issue Feb 1, 2024 · 3 comments
Open
3 tasks done
Assignees
Labels
Status: Needs more info Proceeding requires additional info from the author Type: Bug Errors or unexpected behavior

Comments

@strangelydim
Copy link

Checklist

  • I checked the FAQ section of the documentation
  • I looked for similar issues in the issue tracker
  • I am using the latest version of Schemathesis

Describe the bug

JSON parsing of responses containing funky Unicode characters fails.

To Reproduce

🚨 Mandatory 🚨: Steps to reproduce the behavior:

A (valid) JSON response that has some funky Unicode characters in it gets screwed up by Python's 'response.text' and then fails to parse. Here's an example response that trips up the parsing:

{"detail":"multiple errors encountered: Error at "/grant/0/resource": Error at "/name": property "name" is missing\nSchema:\n {\n "properties": {\n "name": {\n "minLength": 1,\n "type": "string"\n }\n },\n "required": [\n "name"\n ],\n "type": "object"\n }\n\nValue:\n {\n "direct_permissions": [],\n "type": "queue",\n "��áå𑄉": null\n }\n","status":400,"title":"Bad Request"}

To GET that particular response real quick in case GitHub doesn't handle the characters well either:

https://echoserver.dev/server?response=N4IgFgpghgJhBOBnEAuA2mkBhA9gOwBcJCBaAFQE8AHCEAGhCiqoBsBLAYygLfwHoq8HACMWEALYBqAFaJ8IALoKGwnDAqpQBarRQgiADwL0QMblFQhgAHVMQCUNi1spb4gK4serCAAIEQkj+eBw47oQIEDAovgCi8IG+3L7WqSB8AObwUIR8AAx88BBy7vAcEKkucQk48EkEKWl8eFDiFWkxgjg08NqNti1tlSC+bIi+4mOIbHgZqXgAyhyQ4lCu1ni+vjYbW1vDXT08xcMxO5t7+2mD7VXnl5fDk3gAMsQZBGCnvgCMdPMPPbDbQ0b7DRAEeAzOZpAEPAC+cK28P+uyBaSKAEd3GwijBvmgkeiBq1biAicoicCdGC0iJpBAOARhnDERt5gA1KAsdwQdYXe7E0y4xkEAD6PUmiGm+EQBMpaKuthBZJiw2xEF5w1RFyVIAAgfqAIcAU8AIBuASF3vnhPM40Wy8LZ-iAIdx3HLUAAWPJ5Z08AhiFy2ABCsF8ACUIBqIbZ4SAUS6HAR3agAEw++FAA

Please include a minimal API schema causing this issue:

This particular example is just a standard Problem JSON response, ala: https://opensource.zalando.com/restful-api-guidelines/models/problem-1.0.1.yaml

Proposed Fix

Apologies for not just making a PR, just don't have the time and it's a very, very small fix... For me, everything works just fine if I change this line:

return json.loads(response.text)

to:

return json.loads(response.content)

instead, so Python requests doesn't try to do any character code conversion on the response before parsing it as JSON.

@strangelydim strangelydim added Status: Needs Triage Requires initial assessment to categorize and prioritize Type: Bug Errors or unexpected behavior labels Feb 1, 2024
@Stranger6667
Copy link
Member

Hi! Thank you for reporting and providing the context! I’ll take a look at it today or tomorrow

@Stranger6667
Copy link
Member

What is your Python version?

And could you, please, post the exact error that happened inside Schemathesis?

From the response you shared, I see that the first two characters are UTF-8 representation of the U+0081 Unicode codepoint which is a control character that does not have any representation, i.e. it is not printable, so, on the representation level it is common to see U+FFFD () which is exactly how it is rendered in your comment.

I assume that first, the .text call decodes those bytes as UTF-8, then takes the printable representation of the string (with ) and not the actual string (with \u0081), but your expectation is to have the actual string, in e.g. checks, etc. Is it something along the lines of what is happening?

At the moment I can't reproduce it:

In [20]: json.loads(r.content) == json.loads(r.text)
Out[20]: True

My requests version is 2.28.1 and urllib3 is 1.26.14

@Stranger6667
Copy link
Member

Hi @strangelydim

Does my comment make sense? I'd be happy to dig deeper if I'd have some more info that will help me to reproduce the issue

@Stranger6667 Stranger6667 added Status: Needs more info Proceeding requires additional info from the author and removed Status: Needs Triage Requires initial assessment to categorize and prioritize labels Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Needs more info Proceeding requires additional info from the author Type: Bug Errors or unexpected behavior
Projects
None yet
Development

No branches or pull requests

2 participants