Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add size attribute to UploadFile #1405

Merged
merged 6 commits into from Feb 6, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/requests.md
Expand Up @@ -123,6 +123,7 @@ multidict, containing both file uploads and text input. File upload items are re
* `content_type`: A `str` with the content type (MIME type / media type) (e.g. `image/jpeg`).
* `file`: A <a href="https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile" target="_blank">`SpooledTemporaryFile`</a> (a <a href="https://docs.python.org/3/glossary.html#term-file-like-object" target="_blank">file-like</a> object). This is the actual Python file that you can pass directly to other functions or libraries that expect a "file-like" object.
* `headers`: A `Headers` object. Often this will only be the `Content-Type` header, but if additional headers were included in the multipart field they will be included here. Note that these headers have no relationship with the headers in `Request.headers`.
* `size`: An `int` with uploaded file's size in bytes. This value is calculated from request's contents, making it better choice to find uploaded file's size than `Content-Length` header. `None` if not set.

`UploadFile` has the following `async` methods. They all call the corresponding file methods underneath (using the internal `SpooledTemporaryFile`).

Expand Down
5 changes: 5 additions & 0 deletions starlette/datastructures.py
Expand Up @@ -431,11 +431,13 @@ def __init__(
self,
file: typing.BinaryIO,
*,
size: typing.Optional[int] = None,
filename: typing.Optional[str] = None,
headers: "typing.Optional[Headers]" = None,
) -> None:
self.filename = filename
self.file = file
self.size = size
self.headers = headers or Headers()

@property
Expand All @@ -449,6 +451,9 @@ def _in_memory(self) -> bool:
return not rolled_to_disk

async def write(self, data: bytes) -> None:
if self.size is not None:
self.size += len(data)
Comment on lines +461 to +462
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is interesting. I know I just suggested making it a nullable parameter but I don't like how that creates complexity here. It makes me think that ultimately FormParser should not be relying on UploadFile to write the data; we should have a standalone async def write_in_thread(file: IO[bytes], data: bytes) -> None, there's just no reason to expose that in the public API and then have to deal with these sorts of weird knock-on effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH if we make it default to 0, we should have other way to communicate to the user that size attribute is potentially not trustworthy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don’t think I really understood what you mean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With current implementation size is None means "there's no calculated information about the size of uploaded file". If we replace default size from None to 0, then calls to write() will increase it, even if initial value 0 was incorrect, potentially giving people wrong (too small) impression about real size of uploaded file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with that. So do you think what we have here is ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Extra complexity is minimal, but it carries value for people dealing with file uploads.


if self._in_memory:
self.file.write(data)
else:
Expand Down
1 change: 1 addition & 0 deletions starlette/formparsers.py
Expand Up @@ -231,6 +231,7 @@ async def parse(self) -> FormData:
tempfile = SpooledTemporaryFile(max_size=self.max_file_size)
file = UploadFile(
file=tempfile, # type: ignore[arg-type]
size=0,
filename=filename,
headers=Headers(raw=item_headers),
)
Expand Down
24 changes: 22 additions & 2 deletions tests/test_datastructures.py
Expand Up @@ -275,10 +275,26 @@ def test_queryparams():
async def test_upload_file_file_input():
"""Test passing file/stream into the UploadFile constructor"""
stream = io.BytesIO(b"data")
file = UploadFile(filename="file", file=stream, size=4)
assert await file.read() == b"data"
assert file.size == 4
await file.write(b" and more data!")
assert await file.read() == b""
assert file.size == 19
await file.seek(0)
assert await file.read() == b"data and more data!"


@pytest.mark.anyio
async def test_upload_file_without_size():
"""Test passing file/stream into the UploadFile constructor without size"""
stream = io.BytesIO(b"data")
file = UploadFile(filename="file", file=stream)
assert await file.read() == b"data"
assert file.size is None
await file.write(b" and more data!")
assert await file.read() == b""
assert file.size is None
await file.seek(0)
assert await file.read() == b"data and more data!"

Expand All @@ -292,22 +308,26 @@ async def test_uploadfile_rolling(max_size: int) -> None:
stream: BinaryIO = SpooledTemporaryFile( # type: ignore[assignment]
max_size=max_size
)
file = UploadFile(filename="file", file=stream)
file = UploadFile(filename="file", file=stream, size=0)
assert await file.read() == b""
assert file.size == 0
await file.write(b"data")
assert await file.read() == b""
assert file.size == 4
await file.seek(0)
assert await file.read() == b"data"
await file.write(b" more")
assert await file.read() == b""
assert file.size == 9
await file.seek(0)
assert await file.read() == b"data more"
assert file.size == 9
await file.close()


def test_formdata():
stream = io.BytesIO(b"data")
upload = UploadFile(filename="file", file=stream)
upload = UploadFile(filename="file", file=stream, size=4)
form = FormData([("a", "123"), ("a", "456"), ("b", upload)])
assert "a" in form
assert "A" not in form
Expand Down
13 changes: 13 additions & 0 deletions tests/test_formparsers.py
Expand Up @@ -29,6 +29,7 @@ async def app(scope, receive, send):
content = await value.read()
output[key] = {
"filename": value.filename,
"size": value.size,
"content": content.decode(),
"content_type": value.content_type,
}
Expand All @@ -51,6 +52,7 @@ async def multi_items_app(scope, receive, send):
output[key].append(
{
"filename": value.filename,
"size": value.size,
"content": content.decode(),
"content_type": value.content_type,
}
Expand All @@ -71,6 +73,7 @@ async def app_with_headers(scope, receive, send):
content = await value.read()
output[key] = {
"filename": value.filename,
"size": value.size,
"content": content.decode(),
"content_type": value.content_type,
"headers": list(value.headers.items()),
Expand Down Expand Up @@ -112,6 +115,7 @@ def test_multipart_request_files(tmpdir, test_client_factory):
assert response.json() == {
"test": {
"filename": "test.txt",
"size": 14,
"content": "<file content>",
"content_type": "text/plain",
}
Expand All @@ -129,6 +133,7 @@ def test_multipart_request_files_with_content_type(tmpdir, test_client_factory):
assert response.json() == {
"test": {
"filename": "test.txt",
"size": 14,
"content": "<file content>",
"content_type": "text/plain",
}
Expand All @@ -152,11 +157,13 @@ def test_multipart_request_multiple_files(tmpdir, test_client_factory):
assert response.json() == {
"test1": {
"filename": "test1.txt",
"size": 15,
"content": "<file1 content>",
"content_type": "text/plain",
},
"test2": {
"filename": "test2.txt",
"size": 15,
"content": "<file2 content>",
"content_type": "text/plain",
},
Expand Down Expand Up @@ -185,6 +192,7 @@ def test_multipart_request_multiple_files_with_headers(tmpdir, test_client_facto
"test1": "<file1 content>",
"test2": {
"filename": "test2.txt",
"size": 15,
"content": "<file2 content>",
"content_type": "text/plain",
"headers": [
Expand Down Expand Up @@ -220,11 +228,13 @@ def test_multi_items(tmpdir, test_client_factory):
"abc",
{
"filename": "test1.txt",
"size": 15,
"content": "<file1 content>",
"content_type": "text/plain",
},
{
"filename": "test2.txt",
"size": 15,
"content": "<file2 content>",
"content_type": "text/plain",
},
Expand Down Expand Up @@ -261,6 +271,7 @@ def test_multipart_request_mixed_files_and_data(tmpdir, test_client_factory):
assert response.json() == {
"file": {
"filename": "file.txt",
"size": 14,
"content": "<file content>",
"content_type": "text/plain",
},
Expand Down Expand Up @@ -291,6 +302,7 @@ def test_multipart_request_with_charset_for_filename(tmpdir, test_client_factory
assert response.json() == {
"file": {
"filename": "文書.txt",
"size": 14,
"content": "<file content>",
"content_type": "text/plain",
}
Expand Down Expand Up @@ -318,6 +330,7 @@ def test_multipart_request_without_charset_for_filename(tmpdir, test_client_fact
assert response.json() == {
"file": {
"filename": "画像.jpg",
"size": 14,
"content": "<file content>",
"content_type": "image/jpeg",
}
Expand Down