Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom headers in multipart/form-data requests #1936

Merged
merged 15 commits into from Jan 13, 2022
Merged
30 changes: 23 additions & 7 deletions httpx/_multipart.py
Expand Up @@ -78,23 +78,39 @@ def __init__(self, name: str, value: FileTypes) -> None:

fileobj: FileContent

headers: typing.Dict[str, str] = {}
content_type: typing.Optional[str] = None

if isinstance(value, tuple):
try:
filename, fileobj, content_type = value # type: ignore
filename, fileobj, content_type, headers = value # type: ignore
except ValueError:
filename, fileobj = value # type: ignore
content_type = guess_content_type(filename)
try:
filename, fileobj, content_type = value # type: ignore
except ValueError:
filename, fileobj = value # type: ignore
else:
headers = {k.title(): v for k, v in headers.items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should .title() case here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... I see the comparison case. Huh. Fiddly.

if "Content-Type" in headers:
raise ValueError(
"Content-Type cannot be included in multipart headers"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we don't need to do this check.

It's odd behaviour for the developer to set the content_type and then override it with the actual value provided in the custom headers. But it's not broken.

My preference would be that we don't do the explicit check here. In the case of conflicts I'd probably have header values take precedence.

I'm not absolute on this one, but slight preference.

Copy link
Member Author

@adriangb adriangb Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I thought it'd be a good idea to check what requests does here. It looks like it silently ignores the header in the header. That is:

requests.post("http://example.com", files=[("test", ("test_filename", b"data", "text/plain", {"Content-Type": "text/csv"}))])

Gets sent as text/plain.

Digging into why this is the case, it seems like it's just an implementation detail in urllib3. It happens here.

I'm not sure what the right thing to do here is, but if you feel like it's best to go with no error and making header values take precedence, I'm happy to implement that.

Another alternative would be to have the 3rd parameter be either a string representing the content type or a headers dict. We can't really make the 3rd parameter always be a headers dict because that would be a breaking change for httpx.
This would eliminate the edge case, but deviates from requests' API. It seems pretty reasonable that if I'm specifying headers I'm doing advanced stuff and so specifying the content type in the headers directly would not be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the right thing to do here is, but if you feel like it's best to go with no error and making header values take precedence, I'm happy to implement that.

I reckon let's do that, yeah.

Another alternative would be to have the 3rd parameter be either a string representing the content type or a headers dict. We can't really make the 3rd parameter always be a headers dict because that would be a breaking change for httpx.

I actually quite like that yes, neat idea. The big-tuples API is... not helpful really. But let's probably just go with the path of least resistance here. Perhaps one day we'll want an httpx 2.0, where we gradually start deprecating the various big-tuples bits of API in favour of a neater style.

Copy link
Member Author

@adriangb adriangb Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon let's do that, yeah.

👍 donzo

Another alternative would be to have the 3rd parameter be either a string representing the content type or a headers dict. We can't really make the 3rd parameter always be a headers dict because that would be a breaking change for httpx.

I actually quite like that yes, neat idea. The big-tuples API is... not helpful really. But let's probably just go with the path of least resistance here. Perhaps one day we'll want an httpx 2.0, where we gradually start deprecating the various big-tuples bits of API in favour of a neater style.

Agreed! I added a comment in the code explaining the reasoning behind the big tuple API (inherited from requests) and how we might want to change it in the future.

else:
filename = Path(str(getattr(value, "name", "upload"))).name
fileobj = value

if content_type is None:
content_type = guess_content_type(filename)

if content_type is not None:
headers["Content-Type"] = content_type

if isinstance(fileobj, (str, io.StringIO)):
raise TypeError(f"Expected bytes or bytes-like object got: {type(fileobj)}")

self.filename = filename
self.file = fileobj
self.content_type = content_type
self.headers = headers
self._consumed = False

def get_length(self) -> int:
Expand Down Expand Up @@ -122,9 +138,9 @@ def render_headers(self) -> bytes:
if self.filename:
filename = format_form_param("filename", self.filename)
parts.extend([b"; ", filename])
if self.content_type is not None:
content_type = self.content_type.encode()
parts.extend([b"\r\nContent-Type: ", content_type])
for header_name, header_value in self.headers.items():
key, val = f"\r\n{header_name}: ".encode(), header_value.encode()
parts.extend([key, val])
parts.append(b"\r\n\r\n")
self._headers = b"".join(parts)

Expand Down
2 changes: 2 additions & 0 deletions httpx/_types.py
Expand Up @@ -89,6 +89,8 @@
Tuple[Optional[str], FileContent],
# (filename, file (or bytes), content_type)
Tuple[Optional[str], FileContent, Optional[str]],
# (filename, file (or bytes), content_type, headers)
Tuple[Optional[str], FileContent, Optional[str], Mapping[str, str]],
]
RequestFiles = Union[Mapping[str, FileTypes], Sequence[Tuple[str, FileTypes]]]

Expand Down
47 changes: 47 additions & 0 deletions tests/test_multipart.py
Expand Up @@ -94,6 +94,53 @@ def test_multipart_file_tuple():
assert multipart["file"] == [b"<file content>"]


@pytest.mark.parametrize("content_type", [None, "text/plain"])
def test_multipart_file_tuple_headers(content_type: typing.Optional[str]):
file_name = "test.txt"
expected_content_type = "text/plain"
headers = {"Expires": "0"}

files = {"file": (file_name, io.BytesIO(b"<file content>"), content_type, headers)}
with mock.patch("os.urandom", return_value=os.urandom(16)):
boundary = os.urandom(16).hex()

headers, stream = encode_request(data={}, files=files)
assert isinstance(stream, typing.Iterable)

content = (
f'--{boundary}\r\nContent-Disposition: form-data; name="file"; '
f'filename="{file_name}"\r\nExpires: 0\r\nContent-Type: '
f"{expected_content_type}\r\n\r\n<file content>\r\n--{boundary}--\r\n"
"".encode("ascii")
)
assert headers == {
"Content-Type": f"multipart/form-data; boundary={boundary}",
"Content-Length": str(len(content)),
}
assert content == b"".join(stream)


@pytest.mark.parametrize("content_type", [None, "text/plain"])
@pytest.mark.parametrize(
"headers",
[
{"content-type": "text/plain"},
{"Content-Type": "text/plain"},
{"CONTENT-TYPE": "text/plain"},
],
)
def test_multipart_headers_include_content_type(
content_type: typing.Optional[str], headers: typing.Dict[str, str]
) -> None:
"""Including contet-type in the multipart headers should not be allowed"""
client = httpx.Client(transport=httpx.MockTransport(echo_request_content))

files = {"file": ("test.txt", b"content", content_type, headers)}
pat = "Content-Type cannot be included in multipart headers"
with pytest.raises(ValueError, match=pat):
client.post("http://127.0.0.1:8000/", files=files)


def test_multipart_encode(tmp_path: typing.Any) -> None:
path = str(tmp_path / "name.txt")
with open(path, "wb") as f:
Expand Down