Client should have more lenient behaviour wrt. spec violations. #767

fbexiga · 2020-01-15T17:24:59Z

Considering for example this snippet of code

def run_sync(url):
    with httpx.Client(verify=False) as client:
        response = client.get(url)
        print(response)

async def run_async(url):
    async with httpx.AsyncClient(verify=False) as client:
        response = await client.get(url)
        print(response)

>>> run_sync('http://100.33.56.173')
<Response [200 OK]>
>>> trio.run(run_async, 'http://100.33.56.173')
httpx.exceptions.ProtocolError: multiple Content-Length headers

>>> run_sync('http://220.181.136.243')
<Response [600 ]>
>>> trio.run(run_async, 'http://220.181.136.243')
httpx.exceptions.ProtocolError: Response status_code should be in range [200, 600), not 600

>>> run_sync('http://122.147.128.158')
<Response [200 OK]>
>>> trio.run(run_async, 'http://122.147.128.158')
httpx.exceptions.ProtocolError: malformed data

>>> run_sync('http://217.86.148.177')
<Response [200 OK]>
>>> trio.run(run_async, 'http://217.86.148.177')
httpx.exceptions.ProtocolError: Receive buffer too long

These issues seem to come from the underlying h11. I get that these are ill-configured servers, sometimes even against protocol specs, but it should work nonetheless as most HTTP clients don't make these kinds of restrictions.

The text was updated successfully, but these errors were encountered:

florimondmanca · 2020-01-15T17:46:37Z

Marked this as a question because I don’t know if there’s anything we can/should do here. The differences in behavior are expected, and come from us using urllib3 in the sync case, while the async case uses our own dispatcher implementation with h11/h2. This is planned to change for 1.0 when we will be using our own logic across sync and async.

In similar cases of faulty servers we turned it down as « it’s a faulty server, we’re not handling our-of-spec responses ». I don’t know if at some point we’ll want to start handling weird edge cases for the sake of allowing users to access data that’s there, just presented in a slightly out of spec form...

tomchristie · 2020-01-15T18:26:33Z

We probably want to provide as-graceful-as-possible behaviour, when protocol errors occur, so these are a really useful set of cases for us to consider, thanks.

There’s a few different classes present here, it might be that we want to deal with some gracefully, and others as hard errors, but will need to have a look over each case.

For context: How did you dig these cases out? Are you able to gauge how often your seeing each case?

fbexiga · 2020-01-15T18:34:44Z

I work as a security researcher and these are cases that I happened to catch while using your (awesome) lib to perform some analysis at scale. From what I can immediately see, these happen a lot. Of course these are edge cases, but there are many many servers with bad configurations out there. For instance, the response status error is very rare, but the others actually happen quite often.

fbexiga · 2020-01-17T11:49:34Z

I tried to replicate all the above errors using only h11 since the errors appear to come from there. But the "malformed data" and "Receive buffer too long" didn't happen. I wonder if these happen due to some configuration.

yeraydiazdiaz · 2020-05-07T10:48:11Z

The differences between async and sync should now be minimal starting from 0.13.0.dev1.

We do rely on h11 both cases now so at the very least the errors should be consistent.

tomchristie · 2020-05-07T11:23:31Z

I broke these cases down in python-hyper/h11#96

Multiple Content-Length headers.
Status code outside of valid range. (Returned a "600")
Was triggered by a Server: header with no value.
Was an oversized header.

They're all cases where h11 was following the spec correctly, but we'd actually like it to have more lenient behaviours in each case.

tomchristie · 2020-05-07T12:15:56Z

I've retitled this to be more descriptive. Also, because as of 0.13.* the "where httpx.Client works" is no longer correct. We'll have the same behaviour in both cases. (Although the URLLib3Dispatcher is also available.)

acdha · 2020-11-02T21:54:57Z

I've run into the “Receive buffer too long” while testing what I believe is just a poorly-written web app but it's in the middle of a flurry of requests so it's a little hard to confirm exactly what's causing it because the httpx.RemoteProtocolError doesn't have an obvious way to see what portion of the response had been retrieved by the time the error is triggered. The app uses somewhat large cookies but they're normally not on the order of 16KB.

Headers({'host': 'example.org', 'accept': '*/*', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'user-agent': 'python-httpx/0.16.1', 'range': 'bytes=5160960-5177344', 'cookie': '…, 'content-length': '163'})

tomchristie · 2020-11-03T10:42:45Z

it's a little hard to confirm exactly what's causing it because the httpx.RemoteProtocolError doesn't have an obvious way to see what portion of the response had been retrieved by the time the error is triggered.

That's a good observation. Perhaps there's something we could do there to help, by ensuring that h11 is passing sufficient information up the chain, and that we're preserving that information in our own exceptions.

nirvana-msu · 2020-11-30T01:51:03Z

+1 for as lenient behaviour as possible (or at least make it possible with an optional flag, or something). Like @fbexiga said, for security research (and e.g. bug bounty hunting) one needs to be able to interact with a service on HTTP whenever possible, disregarding most kind of errors (SSL errors, malformed responses and what not). Simply put, malicious actor wouldn't care if the server is HTTP spec-compliant. If there's a way to interact with a server over HTTP and exploit it somehow - that's what they'll do - and that's what security researchers need to do as well.

I was attempting to migrate away from aiohttp (which follows the desired lenient behaviour) but currently seem to be hitting many similar issues. I think this is a real blocker for any security- or bug bounty-related project to adopt your (otherwise awesome!) library.

tomchristie · 2020-11-30T10:26:30Z

but currently seem to be hitting many similar issues.

Sure thing. If you're able to be specific it's much more likely we'll be able to prioritise tweaking any relevant behaviours.

stale · 2022-02-20T15:13:48Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tomchristie · 2022-02-21T13:30:25Z

Thanks @Stale - let's leave this open for now - I know that at least one or two of these are resolved. Be good to identify the others into more clear standalone issues. Let's keep this open to track that.

stale · 2022-03-25T07:30:55Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2022-10-15T19:40:33Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tomchristie · 2022-12-29T18:04:49Z

The different cases that were raised in this ticket...

httpx.exceptions.ProtocolError: multiple Content-Length headers
Now closed via Content length handling python-hyper/h11#109
httpx.exceptions.ProtocolError: Response status_code should be in range [200, 600), not 600
Now closed via Expand the allowed status codes to 999 python-hyper/h11#140
httpx.exceptions.ProtocolError: malformed data
Tracked in Update header value validation to match WHAT-WG fetch spec python-hyper/h11#97 (header value validation) and Should we support looser header name validation? python-hyper/h11#113 (header name validation)
Now tracked in this more specific issue... Relax parsing of invalid HTTP header names? #1363
httpx.exceptions.ProtocolError: Receive buffer too long
Defaults to 16kB or configured on the h11 connection instance. Nathaniel notes that curl defaults to 100kB - see https://curl.se/mail/lib-2019-09/0023.html - perhaps we should do the same?
Now closed via Accept http headers up to 100kB httpcore#647

I think if we had a ticket on httpcore tracking the last item on this list (eg. support large cookie values on the order of a little under 100kB), then we could close this issue off as we've got more specific tickets for each case instead.

tomchristie · 2023-07-31T16:08:28Z

Okay, I've just updated the comment above. 3 of the 4 issues presented have been closed.

The other issue has a more specific ticket currently tracking it, so let's close this now in favour of #1363.

florimondmanca added the question Further information is requested label Jan 15, 2020

fbexiga mentioned this issue Jan 15, 2020

h11 fails on multiple targets where other HTTP clients work python-hyper/h11#95

Open

tomchristie mentioned this issue Jan 16, 2020

On recoverable protocol errors from real-world usage. python-hyper/h11#96

Closed

tomchristie changed the title ~~httpxAsyncClient fails on multiple targets where httpx.Client works~~ Client should have more lenient behaviour wrt. spec violations. May 7, 2020

tomchristie added the external Root cause pending resolution in an external dependency label Jul 28, 2020

stale bot added the wontfix label Feb 20, 2022

stale bot removed the wontfix label Feb 21, 2022

stale bot added the wontfix label Mar 25, 2022

tomchristie removed the wontfix label Mar 28, 2022

stale bot added the wontfix label Oct 15, 2022

tomchristie closed this as completed Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client should have more lenient behaviour wrt. spec violations. #767

Client should have more lenient behaviour wrt. spec violations. #767

fbexiga commented Jan 15, 2020 •

edited

florimondmanca commented Jan 15, 2020

tomchristie commented Jan 15, 2020

fbexiga commented Jan 15, 2020

fbexiga commented Jan 17, 2020

yeraydiazdiaz commented May 7, 2020

tomchristie commented May 7, 2020

tomchristie commented May 7, 2020

acdha commented Nov 2, 2020

tomchristie commented Nov 3, 2020

nirvana-msu commented Nov 30, 2020 •

edited

tomchristie commented Nov 30, 2020

stale bot commented Feb 20, 2022

tomchristie commented Feb 21, 2022

stale bot commented Mar 25, 2022

stale bot commented Oct 15, 2022

tomchristie commented Dec 29, 2022 •

edited

tomchristie commented Jul 31, 2023 •

edited

Client should have more lenient behaviour wrt. spec violations. #767

Client should have more lenient behaviour wrt. spec violations. #767

Comments

fbexiga commented Jan 15, 2020 • edited

florimondmanca commented Jan 15, 2020

tomchristie commented Jan 15, 2020

fbexiga commented Jan 15, 2020

fbexiga commented Jan 17, 2020

yeraydiazdiaz commented May 7, 2020

tomchristie commented May 7, 2020

tomchristie commented May 7, 2020

acdha commented Nov 2, 2020

tomchristie commented Nov 3, 2020

nirvana-msu commented Nov 30, 2020 • edited

tomchristie commented Nov 30, 2020

stale bot commented Feb 20, 2022

tomchristie commented Feb 21, 2022

stale bot commented Mar 25, 2022

stale bot commented Oct 15, 2022

tomchristie commented Dec 29, 2022 • edited

tomchristie commented Jul 31, 2023 • edited

fbexiga commented Jan 15, 2020 •

edited

nirvana-msu commented Nov 30, 2020 •

edited

tomchristie commented Dec 29, 2022 •

edited

tomchristie commented Jul 31, 2023 •

edited