Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax parsing of invalid HTTP header names? #1363

Open
2 tasks
shimachao opened this issue Oct 15, 2020 · 10 comments
Open
2 tasks

Relax parsing of invalid HTTP header names? #1363

shimachao opened this issue Oct 15, 2020 · 10 comments
Labels
external Root cause pending resolution in an external dependency user-experience Ensuring that users have a good experience using the library wontfix

Comments

@shimachao
Copy link

shimachao commented Oct 15, 2020

Checklist

  • The bug is reproducible against the latest release and/or master.
  • There are no similar issues or pull requests to fix it yet.

Describe the bug

When server returns bad headers,RemoteProtocolError occurred

To reproduce

headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,"
"image/avif,image/webp,image/apng,/;q=0.8,application/"
"signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/"
"86.0.4240.75 Safari/537.36"}
client= httpx.AsyncClient(headers=headers, timeout=120,verify=False)
await client.get("https://club.huawei.com/forum.html")

Expected behavior

Actual behavior

Debugging material

Environment

  • OS: Windows 10
  • Python version: 3.8.5
  • HTTPX version: 0.16.1
  • Async environment: asyncio
  • HTTP proxy: no
  • Custom certificates: no

Additional context

bad header:
get-ban-to-cache-result/forum.php: userdata not support

@tomchristie
Copy link
Member

Sure, so here's the simplest reproduction...

>>> import httpx
>>> httpx.get("https://club.huawei.com/")

Which is occurring because the server is returning an illegal HTTP header name...

HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Security-Policy: base-uri
Content-Type: text/html; charset=utf-8
Date: Thu, 15 Oct 2020 13:19:33 GMT
Server: CloudWAF
Set-Cookie: HWWAFSESID=a74181602debc465809; path=/
Set-Cookie: HWWAFSESTIME=1602767969615; path=/
Set-Cookie: a3ps_2132_saltkey=yCXrVqdR06Nk5u2PrmLgs9eqlGIpQd9FogV2GL6bxGP3HH2XweRXIeCVny%2BrVDpoOYNLphTU9uVN1HP1%2Fav1bvV2Yrafq%2BXdJR%2BVAVPHizU92ISGAest0dKt7%2FIbdulNYXV0aGtleQ%3D%3D; path=/; secure; httponly
Set-Cookie: a3ps_2132_errorinfo=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; secure; httponly
Set-Cookie: a3ps_2132_errorcode=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; secure; httponly
Set-Cookie: a3ps_2132_auth=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; secure; httponly
Set-Cookie: a3ps_2132_lastvisit=1602764373; expires=Sat, 14-Nov-2020 13:19:33 GMT; Max-Age=2592000; path=/; secure; httponly
Set-Cookie: a3ps_2132_lastact=1602767973%09portal.php%09; expires=Fri, 16-Oct-2020 13:19:33 GMT; Max-Age=86400; path=/; secure; httponly
Set-Cookie: a3ps_2132_currentHwLoginUrl=http%3A%2F%2Fcn.club.vmall.com%2F; expires=Thu, 15-Oct-2020 15:19:33 GMT; Max-Age=7200; path=/; secure; httponly
Transfer-Encoding: chunked
X-XSS-Protection: 1; mode=block
banlist-ip: 0
banlist-uri: 0
get-ban-to-cache-result/portal.php: userdata not support
get-ban-to-cache-result62.31.28.214: userdata not support
result-ip: 0
result-uri: 0

That get-ban-to-cache-result/portal.php header isn't legal HTTP.

However it's possible that we'd like h11 to be more lax on the validation, so that we can accept invalid header names so long as they're still parsable.

@tomchristie tomchristie added external Root cause pending resolution in an external dependency user-experience Ensuring that users have a good experience using the library labels Oct 15, 2020
@tomchristie tomchristie changed the title httpx.exceptions.RemoteProtocolError: malformed data Relax parsing of invalid HTTP header names. Oct 15, 2020
@tomchristie tomchristie changed the title Relax parsing of invalid HTTP header names. Relax parsing of invalid HTTP header names? Oct 15, 2020
@shimachao
Copy link
Author

shimachao commented Oct 16, 2020

hei,I found an imperfect but useful solution.
Execute the following code before using httpx:
h11.readers.header_field_re = re.compile(b"(?P<field_name>[-!#$%&'*+.^`/|~0-9a-zA-Z]+):[ \t](?P<field_value>([^\\x00\\s]+(?:[ \t]+[^\\x00\\s]+))?)[ \t]*")

@tomchristie
Copy link
Member

Opened python-hyper/h11#113 to discuss this on the h11 side.

@Hultner
Copy link

Hultner commented Nov 24, 2021

Is it possible to disable this check for a single request?

I'm working with rewriting some sync code using requests to async httpx but currently can't go ahead due to a server outside of my control sending a "?" in one of the headers.

So right now I'm weighing my options between abandoning this entirely, picking another library to use alongside httpx for the problematic server or replacing httpx entirely.

Looked at @shimachao's solution but I feel a little uneasy about using untested patching across the board, especially considering that it's only one server that misbehaves. Either way that patch verbatim doesn't work for me as it instead chokes on other "normal" headers". The pattern is also moved to h11._readers which I suspect is a hint that they further want to discourage us to take such measures.

Edit: Patched the library to add "?" to tchar for token in the abnf instead, can't do it in runtime though so I think I need to hard fork h11 for this to work.

@stale
Copy link

stale bot commented Feb 20, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 20, 2022
@tomchristie
Copy link
Member

I think this probably still needs tracking. Thanks tho, @Stale.

@stale stale bot removed the wontfix label Feb 21, 2022
@Hultner
Copy link

Hultner commented Feb 21, 2022

I think this probably still needs tracking. Thanks tho, @Stale.

I’m still watching this :)
I currently have to proxy bad servers and drop headers.

@stale
Copy link

stale bot commented Mar 25, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Oct 15, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 15, 2022
@pich4ya
Copy link

pich4ya commented Jul 17, 2023

This is a way that web defender can use to prevent scanning from feroxbuster. Adding non-standard HTTP response headers/values lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Root cause pending resolution in an external dependency user-experience Ensuring that users have a good experience using the library wontfix
Projects
None yet
Development

No branches or pull requests

4 participants