Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming multipart/form-data encoder #3361

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

alexwlchan
Copy link
Contributor

@alexwlchan alexwlchan commented Mar 11, 2024

This is an attempt to get the streaming multipart/form-data encoder landed again (for #51, #624). This is something that was previously worked on in #2331 and has stalled; I'd like to have a go at getting it across the line.

Status: still fairly experimental. I wanted to open a PR to get it running in the proper CI environment, but I'm still wrapping my head around the code – a bunch of the comments are me talking to myself saying "hmm, look at this further". But I think I'm gradually moving forward! 🥰


So far:

  • I've merged the latest main into sigmavirus24’s branch. This was mostly clean; the only issues were:

    • The project now uses import typing; typing.X rather than from typing import X; X, so I had to change all of those (but reading the comments on the original PR, I think that was the preference anyway?)
    • The definition of _TYPE_BODY has moved from connection.py to _base_connection.py

    Otherwise I haven't touched the original code.

  • Run nox -rs lint, nox -rs mypy and nox -rs docs to fix various style/linting issues. Again I don't think these are especially meaningful.

  • I've added some new tests to increase the coverage. I think I'm up to 100% for the decoder, but not the encoder.

  • I read the review comments on the original PR. I couldn't see anything outstanding, but if I've missed something I'm happy to incorporate those changes into this new branch.

  • This code comes from requests-toolbelt. I had a quick look at the code in the requests-toolbelt repo to see if there are any changes between the original PR and now that we should incorporate into this urllib3 copy, but there aren't any.

Still to do:

  • Increase the coverage to 100%. I don't think I'm quite there yet, but I want to get it running in CI and hopefully find what still needs doing
  • Any amendments from code review in 2024

Because the original code already had several rounds of review, I'll add some inline comments to highlight my new changes, to make it easier for reviewers.

sigmavirus24 and others added 14 commits January 2, 2022 10:05
I initially implemented this for Requests in the requests-toolbelt but
this was already pretty generic because Requests just passes file-like
objects (which is how the streaming encoder behaves) directly to
urllib3. All that needed to change was what we were relying on from the
requests namespace and imports and such.

This also adds the decoder in ther same breath because it's easier to
ensure that's all working together properly in one and it all fits
together nicely.

One thing we _could_ do is consolidate a bunch of the logic too and make
`encode_multipart_formdata` rely on the streaming encoder and call
`getall` instead so that we don't have 2 implementations of the same
logic.
Type annotations made it clear that a dangling comma was added by
mistake, converting a value into a tuple.
It's incorrect in the context of the streaming encoder changes.
…a-encoder-2024

# Conflicts:
#	docs/conf.py
#	src/urllib3/connection.py
#	src/urllib3/fields.py
#	src/urllib3/filepost.py
#	test/with_dummyserver/test_connectionpool.py
Previously this test was failing with a warning that was being raised
as an error:

_____________________________________________ TestMultipartEncoder.test_regression_1 ______________________________________________
Traceback (most recent call last):
  File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 155, in _multicall
    teardown[0].send(outcome)
  File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/_pytest/unraisableexception.py", line 88, in pytest_runtest_call
    yield from unraisable_exception_runtest_hook()
  File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/_pytest/unraisableexception.py", line 78, in unraisable_exception_runtest_hook
    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
pytest.PytestUnraisableExceptionWarning: Exception ignored in: <_io.FileIO [closed]>

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/unittest/case.py", line 589, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
ResourceWarning: unclosed file <_io.BufferedReader name='/Users/alexwlchan/repos/urllib3/test/multipart/test_encoder.py'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 341, in from_call
    result: Optional[TResult] = func()
                                ^^^^^^
  File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 262, in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_hooks.py", line 501, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_manager.py", line 119, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 159, in _multicall
    _warn_teardown_exception(hook_name, teardown[1], e)
  File "/Users/alexwlchan/repos/urllib3/.nox/test-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 49, in _warn_teardown_exception
    warnings.warn(PluggyTeardownRaisedWarning(msg), stacklevel=5)
pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Plugin: unraisableexception, Hook: pytest_runtest_call
PytestUnraisableExceptionWarning: Exception ignored in: <_io.FileIO [closed]>

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/unittest/case.py", line 589, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
ResourceWarning: unclosed file <_io.BufferedReader name='/Users/alexwlchan/repos/urllib3/test/multipart/test_encoder.py'>
Comment on lines +149 to +157
def test_from_response_needs_content_type(self) -> None:
response = mock.NonCallableMagicMock(spec=urllib3.response.HTTPResponse)
response.headers = {}
response.data = b""

with pytest.raises(
ValueError, match="Cannot determine content-type header from response"
):
MultipartDecoder.from_response(response)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is new, to add coverage for this error.

Comment on lines +102 to +103
def test_content_length(self) -> None:
assert self.instance.content_length == str(len(self.instance))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is new, to add coverage for the content_length property.

@alexwlchan alexwlchan force-pushed the add-multipart-formdata-encoder-2024 branch from 8b6cbfe to 5dc22c2 Compare March 11, 2024 07:53
Comment on lines +242 to +260
def test_accepts_custom_content_type_as_bytes(self) -> None:
"""Verify that the Encoder handles custom content-types which
are bytes.

See https://github.com/requests/toolbelt/issues/52
"""
fields = [
(
b"test".decode("utf-8"),
(
b"filename".decode("utf-8"),
b"filecontent",
b"application/json",
),
)
]
m = MultipartEncoder(fields=fields)
output = m.read().decode("utf-8")
assert output.index("Content-Type: application/json\r\n") > 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is new, to add coverage for a couple of lines in the multipart encoder:

if isinstance(file_type, bytes):
    file_type = file_type.decode("utf-8")

@alexwlchan
Copy link
Contributor Author

I'm getting a CI failure for Python 3.13 on Windows, but I don't think it's related to this patch?

Failed to build cffi
ERROR: Could not build wheels for cffi, which is required to install pyproject.toml-based projects

I can see a similar failure in the latest build on main.

mypy-requirements.txt Outdated Show resolved Hide resolved
Comment on lines +80 to +81
elif isinstance(data, typing.BinaryIO):
body.write(data.read())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line isn't currently tested, and I need to think more about what the intention is here, because I haven't found a way to trigger this branch.

In particular neither an open file nor an io.BytesIO return True here, so I wonder if a different check is what's meant (if this branch is even needed at all?):

import io
import typing

with open(__file__, 'rb') as f:
    print(isinstance(f, typing.BinaryIO))  # False

bytes_io = io.BytesIO(b'Hello, World!')
print(isinstance(bytes_io, typing.BinaryIO))  # False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe bytearray? I can't remember honestly

Comment on lines +343 to +344
print(resp.json()["form"])
# {"field": "value", "myfile": "..."}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note to investigate: if I run this locally, I just get {"field": "value"} as the output; filename.txt is uploaded as a file rather than a form-encoded field. 🤔

@sigmavirus24
Copy link
Contributor

❤️ Thank you @alexwlchan ! I don't have the time with the kiddo and hopefully you can claim the bounty (since I couldn't even if I got the PR over the line)

@alexwlchan
Copy link
Contributor Author

(Having picked this up I've immediately stalled due to some stuff in my personal life, but I do want to continue this after this week is over!

@sigmavirus24
Copy link
Contributor

By the way, requests/toolbelt#316 is a reasonable addition to the enclosed but will complicate this a smidge

Copy link
Contributor Author

@alexwlchan alexwlchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a more thorough read of decoder.py and left some comments, which I'll push a commit to address shortly.

src/urllib3/multipart/decoder.py Outdated Show resolved Hide resolved
src/urllib3/multipart/decoder.py Outdated Show resolved Hide resolved
for item in ct_info[1:]:
attr, _, value = item.partition("=")
if attr.lower() == "boundary":
self.boundary = encode_with(value.strip('"'), self.encoding)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what happens if you receive a value which isn't properly quoted – e.g. if value starts with an open quote but there's no matching close quote.

from urllib3.multipart.encoder import MultipartEncoder, encode_with


class TestBodyPart(unittest.TestCase):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the test case in test_encoder.py are the only tests in the project that subclass unittest.TestCase.

I'm going to leave them as-is for now, but I wonder if they should be rewritten to use pytest fixtures or similar, to match the rest of the project.

@alexwlchan
Copy link
Contributor Author

I've made a couple more tweaks, and I think the decoder is now working. In particular I created a small Flask server that uses requests-toolbelt to send multipart-encoded responses, so I can test the code in my urllib3 fork:

from flask import Flask, Response
from requests_toolbelt import MultipartEncoder

app = Flask(__name__)

@app.route('/downloads')
def downloads():
    m = MultipartEncoder(
           fields={'field0': 'value', 'field1': 'value',
                   'field2': ('filename', open('file.py', 'rb'), 'text/plain')}
        )

    s = m.to_string()

    return Response(s, mimetype=m.content_type, headers={"content-type": m.content_type})

app.run(debug=True)

Next up is diving into the encoder code and seeing what else needs to be done there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants