Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow byte values in HTTPHeaderDict #3279

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

zawan-ila
Copy link
Contributor

Closes 3072

Overview

I've followed the same path that supports byte keys. Open to any other suggestions.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I realize we have a bounty for this issue, I'm not sure that blindly assuming latin-1 in 2024 is the right thing to do. In RFC 9110 (started in 2018 and published in 2022 and which applies to all HTTP versions), Section 5.5. Field Values states (emphasis mine):

Field values are usually constrained to the range of US-ASCII characters [USASCII]. Fields needing a greater range of characters can use an encoding, such as the one defined in [RFC8187]. Historically, HTTP allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. Specifications for newly defined fields SHOULD limit their values to visible US-ASCII octets (VCHAR), SP, and HTAB. A recipient SHOULD treat other allowed octets in field content (i.e., obs-text) as opaque data.

What we could do, if the user insists in sending bytes, is to join values using b", ", which I guess could work? Or we could refuse bytes altogether. (The current exception is not helpful and I'd like to fix that somehow.)

Also, would you mind checking what encoding is used when "Schönefeld/1.18.0" is passed as a string?

@zawan-ila
Copy link
Contributor Author

zawan-ila commented Jan 25, 2024

Also, would you mind checking what encoding is used when "Schönefeld/1.18.0" is passed as a string?

Do you mean passed as a header value? If yes, it is encoded as latin-1. Apparently courtesy of http.client.

What we could do, if the user insists in sending bytes, is to join values using b", ", which I guess could work? Or we could refuse bytes altogether.

What do you suggest here? I feel like It'd be better to be able to send bytes. However, it seems like it is not possible right now to access raw response headers? Because we rely on http.client for them, which appears to decode them as 'latin-1'. This state seems slightly odd. i.e to be able to send raw byte values but not be able to receive them.

[*] Edited for clarity

@pquentin
Copy link
Member

Yes, http.client limits us for HTTP/1.1, but we try to fix bugs when it's possible. We can't fix decoding, but can fix encoding. And we can do the right thing for HTTP/2, using the header_encoding h2 parameter, whatever the right thing is!

As pointed out in #2397 (comment), it makes sense to be able to send headers as bytes, because the user may not want to send latin-1. As mentioned in the HTTP spec referenced above, A recipient SHOULD treat other allowed octets in field content (i.e., [obs-text](https://www.rfc-editor.org/rfc/rfc9110#fields.values)) as opaque data. This means that if a user wants to use UTF-8, it's perfectly OK, and the receiving server should treat this as opaque data.

So my current thinking is that we should indeed join byte values using b", " and forbid mixed encoding and bytes (to be able to join). And not try to decode bytes to latin-1. @sethmlarson would know what's best here.

@sethmlarson By the way, did your opinion change since 2021? Should we explicitly allow bytes in the API as was done in this pull request?

@zawan-ila zawan-ila force-pushed the zawan-ila/allow-byte-header-vals branch from 04da19e to 4000221 Compare February 5, 2024 18:19
@zawan-ila
Copy link
Contributor Author

@pquentin I have updated the PR. What do you think now?

PS: There's some lint issues. Plus we need to update some type annotations. Do let me know if they are blockers for a review.

@pquentin
Copy link
Member

@zawan-ila Yes, type annotation failures usually block review, especially for a typing change and with so many errors:

nox > python -m pip install -r mypy-requirements.txt
nox > mypy --version
mypy 1.8.0 (compiled: yes)
nox > mypy -p dummyserver -m noxfile -p urllib3 -p test
src/urllib3/_collections.py:202: error: Argument 2 to "_has_value_for_header" of "HTTPHeaderDict" has incompatible type "str | bytes"; expected "str"  [arg-type]
src/urllib3/_collections.py:261: error: Argument 1 to "join" of "bytes" has incompatible type "list[str]"; expected "Iterable[Buffer]"  [arg-type]
src/urllib3/_collections.py:339: error: Unsupported operand types for + ("str" and "bytes")  [operator]
src/urllib3/_collections.py:454: error: Argument 1 to "join" of "bytes" has incompatible type "list[str]"; expected "Iterable[Buffer]"  [arg-type]
test/test_collections.py:221: error: Incompatible types in assignment (expression has type "bytes", target has type "str")  [assignment]
test/test_collections.py:222: error: Non-overlapping equality check (left operand type: "str", right operand type: "Literal[b'foo, bar']")  [comparison-overlap]
test/test_collections.py:295: error: Argument 2 to "add" of "HTTPHeaderDict" has incompatible type "bytes"; expected "str"  [arg-type]
test/test_collections.py:296: error: Argument 2 to "add" of "HTTPHeaderDict" has incompatible type "bytes"; expected "str"  [arg-type]
test/test_collections.py:337: error: Incompatible types in assignment (expression has type "bytes", target has type "str")  [assignment]
test/test_collections.py:338: error: Argument 2 to "add" of "HTTPHeaderDict" has incompatible type "bytes"; expected "str"  [arg-type]
test/test_collections.py:339: error: Argument "rawcookie" to "HTTPHeaderDict" has incompatible type "bytes"; expected "str"  [arg-type]
test/test_collections.py:340: error: Argument "rawcookie" to "NonMappingHeaderContainer" has incompatible type "bytes"; expected "str"  [arg-type]
test/test_collections.py:348: error: Incompatible types in assignment (expression has type "bytes", target has type "str")  [assignment]
test/test_collections.py:349: error: Argument 2 to "add" of "HTTPHeaderDict" has incompatible type "bytes"; expected "str"  [arg-type]
test/test_collections.py:350: error: Argument "rawcookie" to "HTTPHeaderDict" has incompatible type "bytes"; expected "str"  [arg-type]
test/test_collections.py:351: error: Argument "rawcookie" to "NonMappingHeaderContainer" has incompatible type "bytes"; expected "str"  [arg-type]
test/test_collections.py:399: error: Incompatible types in assignment (expression has type "bytes", target has type "str")  [assignment]
test/test_collections.py:400: error: Non-overlapping container check (element type: "tuple[str, bytes]", container item type: "tuple[str, str]")  [comparison-overlap]
test/test_collections.py:401: error: Non-overlapping container check (element type: "tuple[str, bytes]", container item type: "tuple[str, str]")  [comparison-overlap]
test/test_collections.py:480: error: Argument 2 to "add" of "HTTPHeaderDict" has incompatible type "bytes"; expected "str"  [arg-type]
test/with_dummyserver/test_socketlevel.py:1957: error: Argument 2 to "add" of "HTTPHeaderDict" has incompatible type "bytes"; expected "str"  [arg-type]
Found 21 errors in 3 files (checked 80 source files)
nox > Command mypy -p dummyserver -m noxfile -p urllib3 -p test failed with exit code 1
nox > Session lint failed.

That said, #3343 made this pull request much more important so I'll take a look anyway! Sorry for not answering before.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work!

Can you please fix the mypy errors before I can review in more detail? I suspect this will involve explicitly typing self._container as Mapping[str, str | bytes].

You will also need to add a changelog entry.

And finally, we need to test with -bb to check for string/bytes comparisons, but the use of the h2 library makes that more difficult.

Comment on lines 332 to 334
assert type(vals[-1]) is type(
val
), "Can not mix strings and bytes in header values"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be an assert. asserts are reserved for logic error in our own code and can be eliminated in production. You should raise a TypeError here for mixed values.

sock.sendall(
b"HTTP/1.1 200 OK\r\n"
b"Server: example.com\r\n"
b"Content-Length: 0\r\n" + request_headers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please send request headers in the body to avoid the decoding that is done by httplib on responses.

Comment on lines 1961 to 1962
# Note that b"\xc2\x80" is Â\x80 when decoded as latin-1
assert r.headers["rawcookie"] == "Â\x80"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above change, you should ultimately be able to write assert r.data == b"rawcookie: \xc2\x80"

@zawan-ila
Copy link
Contributor Author

zawan-ila commented Feb 17, 2024

@pquentin I've updated the types. However I am not too happy with my work in this regard. There are too many ignores. The main difficulties appear to me to be the following:-

  1. While request header values can be bytes, response header values are not.
  2. The values in self._container are lists such that the first value is the header string and the rest are the values. Hence one is forced to type them as list[str | bytes]
  3. This is my first time using mypy.

It would be really nice if you could have a look and provide some helpful suggestions.

[PS] I also now understand why type annotations ought to be a blocker for review :-)

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While request header values can be bytes, response header values are not.

Saying that all header values can be bytes is not a problem in itself, but if it requires adding type ignores everywhere, then we could reconsider. I think this depends on what we want to do with HTTP/2 headers, which we haven't defined yet.

HTTP/2 headers passed to the h2 library have to be bytes, but we can ask configure h2 to decode our headers. If we do that, then we could use generics to have one HTTPHeaderDict for requests (support str | bytes) and one for responses (only supporting str), which would help.

@sethmlarson @illia-v What do you think?

The values in self._container are lists such that the first value is the header string and the rest are the values. Hence one is forced to type them as list[str | bytes]

Yeah, that's a hack to keep the original case of the key. But at least it's self-contained in HTTPHeaderDictView. We could solve this by introducing a second dict, but I would need to see microbenchmarks and a memory profile to see the performance impact.

This is my first time using mypy.

That's OK! We're here to help, and you're doing great anyway. I'm happy that you're also trying to find a clean solution. Also, sorry for the delay between reviews.

[PS] I also now understand why type annotations ought to be a blocker for review :-)

:-)

@@ -155,7 +155,7 @@ def keys(self) -> set[_KT]: # type: ignore[override]
return set(self._container.keys())


class HTTPHeaderDictItemView(typing.Set[typing.Tuple[str, str]]):
class HTTPHeaderDictItemView(typing.Set[typing.Tuple[str, typing.Union[str, bytes]]]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error do you get when using str | bytes? Union types are supported by Python 3.10+ (https://docs.python.org/3/library/stdtypes.html#types-union), and I can see other places (line 240) where they appear to work just fine.

@@ -629,7 +629,7 @@ def test_request_with_json(self, headers: HTTPHeaderDict) -> None:
content_type = HTTPHeaderDict(old_headers).get(
"Content-Type", "application/json"
)
assert content_type in r.headers["Content-Type"].replace(" ", "").split(",")
assert content_type in r.headers["Content-Type"].replace(" ", "").split(",") # type: ignore[arg-type]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the mypy error here?

@@ -339,7 +341,7 @@ def __init__(
self.chunked = False
tr_enc = self.headers.get("transfer-encoding", "").lower()
# Don't incur the penalty of creating a list and then discarding it
encodings = (enc.strip() for enc in tr_enc.split(","))
encodings = (enc.strip() for enc in tr_enc.split(",")) # type: ignore[arg-type]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the mypy error here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpHeaderDict does not support bytes object
2 participants