Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the file argument to UploadFile required #1413

Merged
merged 44 commits into from Feb 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
b621cae
Make the file argument to UploadFile required
adriangb Jan 14, 2022
fab8d41
lint
adriangb Jan 14, 2022
d5d339f
add test for rolled uploadfile
adriangb Jan 14, 2022
ab97325
add tests for rolled/unrolled uploadfile
adriangb Jan 14, 2022
3bf4d15
close form
adriangb Jan 14, 2022
a7b8e05
close request
adriangb Jan 14, 2022
315b292
remove max_file_size arg in favor of class var
adriangb Jan 17, 2022
428ba3b
Merge branch 'master' into make-file-strict-argument
adriangb Jan 17, 2022
b509fa1
Merge branch 'master' into make-file-strict-argument
adriangb Jan 18, 2022
c012aa1
make file the only required argument
adriangb Jan 18, 2022
3a75692
Merge remote-tracking branch 'upstream/master' into make-file-strict-…
adriangb Jan 18, 2022
01c781c
Merge branch 'make-file-strict-argument' of https://github.com/adrian…
adriangb Jan 18, 2022
1b9568e
pull content-type from headers
adriangb Jan 19, 2022
655c4be
rename local from file_ to tempfile
adriangb Jan 19, 2022
e01ecef
Merge branch 'master' into make-file-strict-argument
adriangb Jan 19, 2022
4cbb66f
Merge branch 'master' into make-file-strict-argument
adriangb Jan 21, 2022
2a312a8
Merge branch 'master' into make-file-strict-argument
adriangb Jan 22, 2022
0982127
Merge branch 'master' into make-file-strict-argument
adriangb Jan 25, 2022
0bea64d
Merge branch 'master' into make-file-strict-argument
adriangb Jan 26, 2022
e5320e5
Merge branch 'master' into make-file-strict-argument
adriangb Jan 27, 2022
1c13abb
join tests into one via parametrization
adriangb Jan 27, 2022
6f6b3a0
clarify docstring and parametrization ids
adriangb Jan 27, 2022
6f608e1
Merge branch 'master' into make-file-strict-argument
adriangb Jan 31, 2022
1e51edd
lint
adriangb Jan 31, 2022
75ca723
wrap line
adriangb Jan 31, 2022
18213b5
Merge branch 'master' into make-file-strict-argument
adriangb Feb 3, 2022
419d8aa
Merge branch 'master' into make-file-strict-argument
adriangb Feb 8, 2022
7a39f30
Merge branch 'master' into make-file-strict-argument
adriangb Feb 8, 2022
1a8f11c
Merge branch 'master' into make-file-strict-argument
adriangb Feb 16, 2022
4a1325e
Merge branch 'master' into make-file-strict-argument
adriangb Mar 27, 2022
bece2af
Merge branch 'master' into make-file-strict-argument
adriangb May 24, 2022
2b12b51
Merge branch 'master' into make-file-strict-argument
adriangb May 24, 2022
9b2cb49
Merge branch 'master' into make-file-strict-argument
adriangb Jul 2, 2022
fa11ecb
Merge branch 'master' into make-file-strict-argument
adriangb Jul 20, 2022
76519cd
Merge branch 'master' into make-file-strict-argument
adriangb Oct 6, 2022
1baf994
Update test_formparsers.py
adriangb Oct 6, 2022
1fb8b08
Update datastructures.py
adriangb Oct 6, 2022
b959261
Merge branch 'master' into make-file-strict-argument
adriangb Oct 12, 2022
f053b8d
Merge branch 'master' into make-file-strict-argument
Kludex Dec 8, 2022
d84a7d5
Merge branch 'master' into make-file-strict-argument
adriangb Jan 23, 2023
d9cfaa2
Update starlette/datastructures.py
adriangb Feb 4, 2023
8babda3
Merge branch 'master' into make-file-strict-argument
Kludex Feb 4, 2023
5d796bb
Merge branch 'master' into make-file-strict-argument
adriangb Feb 4, 2023
9831c89
Use default None instead of empty string on UploadFile.content_type i…
Kludex Feb 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
21 changes: 8 additions & 13 deletions starlette/datastructures.py
@@ -1,4 +1,3 @@
import tempfile
import typing
from collections.abc import Sequence
from shlex import shlex
Expand Down Expand Up @@ -428,28 +427,24 @@ class UploadFile:
An uploaded file included as part of the request data.
"""

spool_max_size = 1024 * 1024
file: typing.BinaryIO
headers: "Headers"

def __init__(
self,
filename: str,
file: typing.Optional[typing.BinaryIO] = None,
content_type: str = "",
file: typing.BinaryIO,
Copy link
Member Author

@adriangb adriangb Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make this strictly file: SpooledTemporaryFile since that's the only way it's ever used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing.BinaryIO makes more sense to me.
It's an implementation detail that we happen to use SpooledTemporaryFile.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing-wise, isn't there a better type for this situation? Also... It's not an implementation detail, it's written on our docs that the file is a SpooledTemporaryFile.

Copy link
Member Author

@adriangb adriangb Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now there's three options:

  1. Use IO[bytes]
  2. Use SpooledTemporaryFile directly
  3. Write a protocol for just the methods we need

Source: python/typing#829 (comment)

I don't think we should choose (2) and I think we should change our docs. Let's not lock ourselves into concrete implementations if we don't need to.

I think (1) has issues (as mentioned in that discussion and others) but is widely used so it's pretty defensible if a user had to add a # type: ignore to that parameter if they pass in some less common file-like object.

(3) is the correct option and would look like:

class FileLike(Protocol):
    def write(self, __data: bytes) -> None:  ...
    def read(self, __n: int) -> bytes:  ...
    def seek(self, __pos: int) -> None:  ...
    def close(self) -> None:  ...

I think the main con here is not the LOC or complexity, the main con is that if we want to start using other methods/attributes of IO in the future it would technically be a breaking change. But maybe that's a good thing?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change our docs

That's enough for me. 👍

*,
filename: typing.Optional[str] = None,
headers: "typing.Optional[Headers]" = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're changing the (internal-ish) interface at this point, should we switch it around so that file is the only required argument?...

    def __init__(
         self,
         file: typing.Optional[typing.BinaryIO],
         *,
         filename: str = "",
         content_type: str = "",
         headers: "typing.Optional[Headers]" = None,
    ):
        self.file = file
        self.filename = filename or getattr(file, 'name', '')
        self.content_type = content_type
        self.headers = headers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep makes sense. How do you feel about:

    def __init__(
         self,
         file: typing.Optional[typing.BinaryIO],
         *,
         filename: typing.Optional[str] = None,
         content_type: typing.Optional[str] = None,
         headers: typing.Optional[Headers] = None,
    ):
        ...

That way you can differentiate between a filename or content-type that was an empty string or one that was missing altogether (I know "" may not be valid within the context of a file name on disk and such, but it's still possible in other contexts).

self.filename = filename or getattr(file, 'name', '')

Hmm about this. I feel like it could lead to confusion. I would rather UploadFile just blindly accept it as a parameter and assign it. I think users already have access to .file so if they want the filename of the actual file they should do UploadFile.file.name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UploadedFile.filename was sanitized, what guarantee we provide that file.name also is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it would be best to just make it a constructor parameter and not fall back to file.name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, do we want filename and content_type at all here?

They're derived from the headers, so why not switch to...

class UploadFile:
  def __init__(
         self,
         file: typing.Optional[typing.BinaryIO],
         *,
         headers: typing.Optional[Headers] = None,
    ):
        ...

Copy link
Member Author

@adriangb adriangb Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be to make content_type a property and keep filename.

The main reason for this is a practical one: extracting the filename from the headers requires knowledge that the request was originally a multipart/form-data request (that is, as far as I know, the only way a client would include that in the content-disposition header). If we want UploadFile to be decoupled from form parsing, I don't think it should assume that the headers came from a multipart request.

Also not every implementer of BinaryIO has a .file attribute (e.g. BytesIO does not). So if we don't pass in that information at all then users might not be able to access it (I know in practice we always pass in a SpooledTemporaryFile, but the goal here is for that to be an implementation detail as much as possible).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - that's not unreasonable.

The one other practical case where I could see UploadFile being used is for raw binary uploads.

Eg. PUT https://www.example.com/uploads/my-file-upload.txt

In that kinda use-case it's useful having filename be something that can be provided explicitly, so that it can be derived from the URL path, rather than the HTTP headers.

So yeah, sounds good to me. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that is a use case I personally really want: https://github.com/adriangb/xpresso/blob/c4455b29d323df1c59808ccafb306b9e7bbd80d1/xpresso/binders/_extractors/body/file.py#L40

I'm glad you brought it up and not me, I didn't want to introduce my biases 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I already pushed these changes / implementation @tomchristie, so I think this is ready for one last round of review for any typos/nits and then maybe a ✅

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all.

It would be nice if the release could include the removal of content_type from the UploadFile, as this is a breaking change.

) -> None:
self.filename = filename
self.content_type = content_type
if file is None:
self.file = tempfile.SpooledTemporaryFile(max_size=self.spool_max_size) # type: ignore[assignment] # noqa: E501
else:
self.file = file
self.file = file
self.headers = headers or Headers()

@property
def content_type(self) -> typing.Optional[str]:
return self.headers.get("content-type", None)
adriangb marked this conversation as resolved.
Show resolved Hide resolved

@property
def _in_memory(self) -> bool:
# check for SpooledTemporaryFile._rolled
rolled_to_disk = getattr(self.file, "_rolled", True)
return not rolled_to_disk

Expand Down
12 changes: 6 additions & 6 deletions starlette/formparsers.py
@@ -1,5 +1,6 @@
import typing
from enum import Enum
from tempfile import SpooledTemporaryFile
from urllib.parse import unquote_plus

from starlette.datastructures import FormData, Headers, UploadFile
Expand Down Expand Up @@ -116,6 +117,8 @@ async def parse(self) -> FormData:


class MultiPartParser:
max_file_size = 1024 * 1024

def __init__(
self, headers: Headers, stream: typing.AsyncGenerator[bytes, None]
) -> None:
Expand Down Expand Up @@ -160,7 +163,7 @@ def on_end(self) -> None:

async def parse(self) -> FormData:
# Parse the Content-Type header to get the multipart boundary.
content_type, params = parse_options_header(self.headers["Content-Type"])
_, params = parse_options_header(self.headers["Content-Type"])
charset = params.get(b"charset", "utf-8")
if type(charset) == bytes:
charset = charset.decode("latin-1")
Expand All @@ -186,7 +189,6 @@ async def parse(self) -> FormData:
header_field = b""
header_value = b""
content_disposition = None
content_type = b""
field_name = ""
data = b""
file: typing.Optional[UploadFile] = None
Expand All @@ -202,7 +204,6 @@ async def parse(self) -> FormData:
for message_type, message_bytes in messages:
if message_type == MultiPartMessage.PART_BEGIN:
content_disposition = None
content_type = b""
data = b""
item_headers = []
elif message_type == MultiPartMessage.HEADER_FIELD:
Expand All @@ -213,8 +214,6 @@ async def parse(self) -> FormData:
field = header_field.lower()
if field == b"content-disposition":
content_disposition = header_value
elif field == b"content-type":
content_type = header_value
item_headers.append((field, header_value))
header_field = b""
header_value = b""
Expand All @@ -229,9 +228,10 @@ async def parse(self) -> FormData:
)
if b"filename" in options:
filename = _user_safe_decode(options[b"filename"], charset)
tempfile = SpooledTemporaryFile(max_size=self.max_file_size)
file = UploadFile(
file=tempfile, # type: ignore[arg-type]
filename=filename,
content_type=content_type.decode("latin-1"),
headers=Headers(raw=item_headers),
)
else:
Expand Down
38 changes: 24 additions & 14 deletions tests/test_datastructures.py
@@ -1,4 +1,6 @@
import io
from tempfile import SpooledTemporaryFile
from typing import BinaryIO

import pytest

Expand Down Expand Up @@ -269,20 +271,6 @@ def test_queryparams():
assert QueryParams(q) == q


class BigUploadFile(UploadFile):
spool_max_size = 1024


@pytest.mark.anyio
async def test_upload_file():
big_file = BigUploadFile("big-file")
await big_file.write(b"big-data" * 512)
await big_file.write(b"big-data")
await big_file.seek(0)
assert await big_file.read(1024) == b"big-data" * 128
await big_file.close()


@pytest.mark.anyio
async def test_upload_file_file_input():
"""Test passing file/stream into the UploadFile constructor"""
Expand All @@ -295,6 +283,28 @@ async def test_upload_file_file_input():
assert await file.read() == b"data and more data!"


@pytest.mark.anyio
@pytest.mark.parametrize("max_size", [1, 1024], ids=["rolled", "unrolled"])
async def test_uploadfile_rolling(max_size: int) -> None:
"""Test that we can r/w to a SpooledTemporaryFile
managed by UploadFile before and after it rolls to disk
"""
stream: BinaryIO = SpooledTemporaryFile( # type: ignore[assignment]
max_size=max_size
)
file = UploadFile(filename="file", file=stream)
assert await file.read() == b""
await file.write(b"data")
assert await file.read() == b""
await file.seek(0)
assert await file.read() == b"data"
await file.write(b" more")
assert await file.read() == b""
await file.seek(0)
assert await file.read() == b"data more"
await file.close()


def test_formdata():
stream = io.BytesIO(b"data")
upload = UploadFile(filename="file", file=stream)
Expand Down