Skip to content

Commit

Permalink
Add multipart/form-data streaming encoder
Browse files Browse the repository at this point in the history
I initially implemented this for Requests in the requests-toolbelt but
this was already pretty generic because Requests just passes file-like
objects (which is how the streaming encoder behaves) directly to
urllib3. All that needed to change was what we were relying on from the
requests namespace and imports and such.

This also adds the decoder in ther same breath because it's easier to
ensure that's all working together properly in one and it all fits
together nicely.

One thing we _could_ do is consolidate a bunch of the logic too and make
`encode_multipart_formdata` rely on the streaming encoder and call
`getall` instead so that we don't have 2 implementations of the same
logic.
  • Loading branch information
sigmavirus24 committed Jan 2, 2022
1 parent c96cf40 commit 112bcb9
Show file tree
Hide file tree
Showing 15 changed files with 1,388 additions and 12 deletions.
45 changes: 45 additions & 0 deletions changelog/624.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Added support for reduced memory ``multipart/form-data`` uploads.

Previously, one would need to specify ``fields=...`` with
``encode_multipart=True`` and ``multipart_boundary=...`` to send a
``multipart/form-data`` request to a server via urllib3's ``PoolManager``.
This would encode all of the data into a buffer in memory to send it to the
server.

Now, one can use the ``urllib3.multipart.MultipartEncoder``. It accepts the
same format of values as the ``fields=`` argument and accepts an optional
``boundary=`` argument like ``PoolManager.urlopen`` and
``PoolManager.request``. This will not load the entirety of your data into
memory but will instead conservatively stream it by implementing a subset of
the file API for ``http.client`` to use.

Example:

.. code-block:: python
import urllib3
from urllib3 import multipart
encoder = multipart.MultipartEncoder({
"field_1": "value_1",
"field_2": "value_2",
"field_3": (
"my_super_large_file.tar.gz",
open("my_super_large_file.tar.gz", "rb"),
"application/tar+gzip",
),
"field_4": (
"big_data_dump.psql.gz",
open("big_data_dump.psql.gz", "rb"),
"application/gzip",
),
})
response = urllib3.request(
"POST",
"https://httpbin.org/post",
headers=encoder.headers,
body=encoder,
)
This also adds support for efficiently decoding ``multipart/form-data``
responses too via ``urllib3.multipart.MultipartDecoder``.
1 change: 1 addition & 0 deletions docs/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,6 @@ API Reference
urllib3.exceptions
urllib3.response
urllib3.fields
urllib3.multipart
urllib3.util
contrib/index
26 changes: 26 additions & 0 deletions docs/reference/urllib3.multipart.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Multipart Encoders and Decoders
===============================

Streaming Request Encoding
--------------------------

.. autoclass:: urllib3.multipart.MultipartEncoder
:members: content_type, content_length, headers

.. automethod:: read


Response Decoding
-----------------

.. autoclass:: urllib3.multipart.MultipartDecoder
:members: content_type, encoding, parts

.. automethod:: from_response

.. autoclass:: urllib3.multipart.BodyPart
:members: content, encoding, headers, text

.. autoexception:: urllib3.multipart.ImproperBodyPartContentError

.. autoexception:: urllib3.multipart.NonMultipartContentTypeError
26 changes: 25 additions & 1 deletion docs/user-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -310,10 +310,34 @@ dictionary in the ``fields`` argument provided to
"https://httpbin.org/post",
fields={"field": "value"}
)
print(resp.json()["form"])
# {"field": "value"}
If you have a large amount of data, you may prefer to avoid loading all of
that into memory. To do so, urllib3 provides a ``MultipartEncoder`` that
handles generating the request data on demand without loading everything in
memory.

.. code-block:: python
import urllib3
import urllib3.multipart
encoder = urllib3.multipart.MultipartEncoder({
"field": "value",
"myfile": ("filename.txt", open("filename.txt", "rb"), "text/plain"),
})
resp = urllib3.request(
"POST",
"https://httpbin.org/post",
body=encoder,
headers=encoder.headers,
)
print(resp.json()["form"])
# {"field": "value", "myfile": "..."}
.. _json:

JSON
Expand Down
1 change: 1 addition & 0 deletions mypy-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ pytest>=6.2
trustme==0.9.0
types-backports
types-requests
types-mock
nox
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@
"urllib3",
"urllib3.contrib",
"urllib3.contrib._securetransport",
"urllib3.multipart",
"urllib3.util",
],
package_data={
Expand Down
3 changes: 2 additions & 1 deletion src/urllib3/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

if TYPE_CHECKING:
from typing_extensions import Literal
from .multipart import MultipartEncoder

from .util.proxy import create_proxy_ssl_context
from .util.timeout import _DEFAULT_TIMEOUT, _TYPE_TIMEOUT, Timeout
Expand Down Expand Up @@ -75,7 +76,7 @@ class BaseSSLError(BaseException): # type: ignore[no-redef]
_CONTAINS_CONTROL_CHAR_RE = re.compile(r"[^-!#$%&'*+.^_`|~0-9a-zA-Z]")


_TYPE_BODY = Union[bytes, IO[Any], Iterable[bytes], str]
_TYPE_BODY = Union[bytes, IO[Any], Iterable[bytes], str, "MultipartEncoder"]


class ProxyConfig(NamedTuple):
Expand Down
19 changes: 10 additions & 9 deletions src/urllib3/fields.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import email.utils
import mimetypes
from typing import (
BinaryIO,
Callable,
Dict,
Iterable,
Expand All @@ -12,7 +13,7 @@
cast,
)

_TYPE_FIELD_VALUE = Union[str, bytes]
_TYPE_FIELD_VALUE = Union[str, bytes, BinaryIO]
_TYPE_FIELD_VALUE_TUPLE = Union[
_TYPE_FIELD_VALUE, Tuple[str, _TYPE_FIELD_VALUE], Tuple[str, _TYPE_FIELD_VALUE, str]
]
Expand Down Expand Up @@ -82,7 +83,7 @@ def format_header_param_rfc2231(name: str, value: Union[str, bytes]) -> str:
return value


def format_multipart_header_param(name: str, value: _TYPE_FIELD_VALUE) -> str:
def format_multipart_header_param(name: str, value: Union[str, bytes]) -> str:
"""
Format and quote a single multipart header parameter.
Expand Down Expand Up @@ -120,7 +121,7 @@ def format_multipart_header_param(name: str, value: _TYPE_FIELD_VALUE) -> str:
return f'{name}="{value}"'


def format_header_param_html5(name: str, value: _TYPE_FIELD_VALUE) -> str:
def format_header_param_html5(name: str, value: Union[str, bytes]) -> str:
"""
.. deprecated:: 2.0.0
Renamed to :func:`format_multipart_header_param`. Will be
Expand All @@ -138,7 +139,7 @@ def format_header_param_html5(name: str, value: _TYPE_FIELD_VALUE) -> str:
return format_multipart_header_param(name, value)


def format_header_param(name: str, value: _TYPE_FIELD_VALUE) -> str:
def format_header_param(name: str, value: Union[str, bytes]) -> str:
"""
.. deprecated:: 2.0.0
Renamed to :func:`format_multipart_header_param`. Will be
Expand Down Expand Up @@ -181,7 +182,7 @@ def __init__(
data: _TYPE_FIELD_VALUE,
filename: Optional[str] = None,
headers: Optional[Mapping[str, str]] = None,
header_formatter: Optional[Callable[[str, _TYPE_FIELD_VALUE], str]] = None,
header_formatter: Optional[Callable[[str, Union[str, bytes]], str]] = None,
):
self._name = name
self._filename = filename
Expand Down Expand Up @@ -209,7 +210,7 @@ def from_tuples(
cls,
fieldname: str,
value: _TYPE_FIELD_VALUE_TUPLE,
header_formatter: Optional[Callable[[str, _TYPE_FIELD_VALUE], str]] = None,
header_formatter: Optional[Callable[[str, Union[str, bytes]], str]] = None,
) -> "RequestField":
"""
A :class:`~urllib3.fields.RequestField` factory from old-style tuple parameters.
Expand Down Expand Up @@ -251,7 +252,7 @@ def from_tuples(

return request_param

def _render_part(self, name: str, value: _TYPE_FIELD_VALUE) -> str:
def _render_part(self, name: str, value: Union[str, bytes]) -> str:
"""
Override this method to change how each multipart header
parameter is formatted. By default, this calls
Expand All @@ -270,8 +271,8 @@ def _render_part(self, name: str, value: _TYPE_FIELD_VALUE) -> str:
def _render_parts(
self,
header_parts: Union[
Dict[str, Optional[_TYPE_FIELD_VALUE]],
Sequence[Tuple[str, Optional[_TYPE_FIELD_VALUE]]],
Dict[str, Optional[Union[str, bytes]]],
Sequence[Tuple[str, Optional[Union[str, bytes]]]],
],
) -> str:
"""
Expand Down
4 changes: 3 additions & 1 deletion src/urllib3/filepost.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import codecs
import os
from io import BytesIO
from typing import Iterable, Mapping, Optional, Sequence, Tuple, Union
from typing import BinaryIO, Iterable, Mapping, Optional, Sequence, Tuple, Union

from .fields import _TYPE_FIELD_VALUE_TUPLE, RequestField

Expand Down Expand Up @@ -74,6 +74,8 @@ def encode_multipart_formdata(

if isinstance(data, str):
writer(body).write(data)
elif isinstance(data, BinaryIO):
body.write(data.read())
else:
body.write(data)

Expand Down
16 changes: 16 additions & 0 deletions src/urllib3/multipart/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
"""Multipart support for urllib3."""

from .decoder import ImproperBodyPartContentError as ImproperBodyPartContentError
from .decoder import MultipartDecoder as MultipartDecoder
from .decoder import NonMultipartContentTypeError as NonMultipartContentTypeError
from .encoder import MultipartEncoder as MultipartEncoder

__authors__ = "Ian Stapleton Cordasco, Cory Benfield"
__copyright__ = "Copyright 2014 Ian Stapleton Cordasco, Cory Benfield"

__all__ = [
"MultipartEncoder",
"MultipartDecoder",
"ImproperBodyPartContentError",
"NonMultipartContentTypeError",
]

0 comments on commit 112bcb9

Please sign in to comment.