Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add size attribute to UploadFile #1405

Merged
merged 6 commits into from Feb 6, 2023
Merged

Feature: Add size attribute to UploadFile #1405

merged 6 commits into from Feb 6, 2023

Conversation

rafalp
Copy link
Member

@rafalp rafalp commented Jan 11, 2022

This PR adds size attribute to UploadFile, enabling for easy retrieval of uploaded file's size in bytes without need for seeking file's end and telling its size. This is achieved by setting initial size = 0 attribute on file and increasing it within write via self.size += len(data).

UploadFile also supports file argument for bytes stream. This seems like its used in tests only so I've resolved to writing naive size resolver that seeks stream's end and telling its size. This implementation is unlikely to work in production though, so maybe we should have size argument on constructor or make async def size(self) -> int as getter instead?

@rafalp rafalp added the feature New feature or request label Jan 11, 2022
@rafalp rafalp self-assigned this Jan 11, 2022
@Kludex
Copy link
Sponsor Member

Kludex commented Jan 11, 2022

UploadFile also supports file argument for bytes stream. This seems like its used in tests only so I've resolved to writing naive size resolver that seeks stream's end and telling its size. This implementation is unlikely to work in production though, so maybe we should have size argument on constructor of make async def size(self) -> int as getter instead?

That's what I was thinking before reading the description (I've jumped into the code before reading the description). 🤔

@rafalp rafalp requested a review from a team January 11, 2022 21:07
@adriangb
Copy link
Member

adriangb commented Jan 11, 2022

Would there be any cons to making it a constructor parameter? Alternatively, do we maybe want to remove the unused file constructor parameter?

Edit: I don't think that would work. UploadFile is used to store streaming data in multipart requests. We don't know the size of the field until after we create the UploadFile.

What we could do is have:

class UploadFile:
    def __init__(
        self,
       file: Optional[BinaryIO] = None,
       file_size: Optional[int] = None,
    ) -> None:
         if file:
             self.size = file_size or 0
        else:
             self.size = 0

    async def write(self, data: bytes) -> None:
         self.size += len(data)
         ...

So for the edge case where the user provides a file like object, if they provide the size we use that as a starting point, otherwise we assume 0.

@rafalp
Copy link
Member Author

rafalp commented Jan 11, 2022

Seeing that size is calculated attribute, I would rather drop file and update those few tests that use it to always write() like how formparser does it anyway.

It shouldn't be a problem seeing how file is not documented anywhere as public API?

@adriangb
Copy link
Member

Agreed, I'm +1 for dropping the file parameter. But I think that should be its own PR, and should have input from Tom. It's not documented but it is part of the public constructor, so I think it would technically be a breaking change (low risk though)?

@rafalp
Copy link
Member Author

rafalp commented Jan 11, 2022

Only scenario for end users doing this that comes to my mind is people creating custom UploadFile instances in their app's tests to see how their upload handling logic is handling them, eg. tests for value validators.

I'm actually doing that but I've resorted to writeing test payload as file was not documented and known to me prior of working on this PR.

@adriangb
Copy link
Member

Yep that's exactly the only scenario I can think of. So it wouldn't even break their production, it would break their tests, if anything at all?

@adriangb
Copy link
Member

I opened #1406 to evaluate that as an option

@tomchristie
Copy link
Member

What's an example use-case?
What do other Python frameworks do here?
(I usually ask for Flask and Django as the two points of comparison)

@rafalp
Copy link
Member Author

rafalp commented Jan 12, 2022

@tomchristie

Other frameworks:

Django: UploadedFile has size attribute that contains uploaded file's size in bytes. ref
Flask: FileStorage has content_length attribute that was extracted from upload's header. ref

Use-case for having size on upload makes it easy to implement and enforce different uploaded file size limits within application's logic. Eg. internet message board may have different upload limit for user profile image and different one for message attachments. Those size limits may be changed further depending on user's trust level. I'm also storing that size as part of attachment's metadata in database for displaying in app's UI.

This logic is relatively painless to implement in Django forms, but requires some research in Starlette.

Naturally this isn't here to protect app from huge uploads. There's NGINX in front of app with 32M payload limit in front of app, but if user uploads 2.1M and portrait photo limit is 2M, I'll rather handle this gracefully displaying them an error in app instead of NGINX's error page.

@tomchristie
Copy link
Member

I suppose we want to defer this to after #1406 - deps. on #1406 (comment) for instance.

@Kludex Kludex mentioned this pull request Oct 3, 2022
11 tasks
@NeurlAP
Copy link

NeurlAP commented Jan 3, 2023

What's the conclusion?

@rafalp
Copy link
Member Author

rafalp commented Jan 18, 2023

@NeurlAP its on the agenda for next meeting, which should happen next week. I'll post an update when there is one.

@rafalp
Copy link
Member Author

rafalp commented Jan 23, 2023

@NeurlAP we've discussed the course of action for this issue and we've decided the following:

  1. Verify that Make the file argument to UploadFile required #1413 will not break the FastAPI
  2. Merge the Make the file argument to UploadFile required #1413
  3. Update this PR so we accept size as required __init__ argument next to the file - this size will come from the HTTP request parser or whoever which already knows it.
  4. We increase the size when you are writing to the file, to keep it in sync.

So if you want calculated file size you will do the upload.size, but if you want the declared file size from the uploader, you will do the upload.headers["content-length"]

@Kludex
Copy link
Sponsor Member

Kludex commented Feb 4, 2023

#1413 should be merged soon. Step 3 above needs to happen afterward, jfyk.

@adriangb
Copy link
Member

adriangb commented Feb 4, 2023

@rafalp I think you can work on this now :)

@rafalp
Copy link
Member Author

rafalp commented Feb 4, 2023

I've updated the PR. Not sure if in tests we want to set initial size to literal value, or do stream.getbuffer().nbytes. But I've went with literal.

@adriangb
Copy link
Member

adriangb commented Feb 4, 2023

Can we make size optional? It gives us a bit more flexibility as a toolkit.

@rafalp
Copy link
Member Author

rafalp commented Feb 5, 2023

I've updated UploadFile to make size optional. If initial value is not provided, then size is no longer calculated.

Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment but I'm approving either way because we can always change Optional[int] to int without breaking anything.

Comment on lines +454 to +455
if self.size is not None:
self.size += len(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is interesting. I know I just suggested making it a nullable parameter but I don't like how that creates complexity here. It makes me think that ultimately FormParser should not be relying on UploadFile to write the data; we should have a standalone async def write_in_thread(file: IO[bytes], data: bytes) -> None, there's just no reason to expose that in the public API and then have to deal with these sorts of weird knock-on effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH if we make it default to 0, we should have other way to communicate to the user that size attribute is potentially not trustworthy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don’t think I really understood what you mean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With current implementation size is None means "there's no calculated information about the size of uploaded file". If we replace default size from None to 0, then calls to write() will increase it, even if initial value 0 was incorrect, potentially giving people wrong (too small) impression about real size of uploaded file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with that. So do you think what we have here is ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Extra complexity is minimal, but it carries value for people dealing with file uploads.

@Kludex Kludex added this to the Version 0.24.0 milestone Feb 5, 2023
@Kludex
Copy link
Sponsor Member

Kludex commented Feb 6, 2023

I'm going to wait for this PR to release 0.24.0, jfyk.

@rafalp
Copy link
Member Author

rafalp commented Feb 6, 2023

@Kludex ok. I guess only thing missing before we can merge this is rebase on master. Unless we want to wait for more reviews?

@Kludex Kludex enabled auto-merge (squash) February 6, 2023 12:00
@Kludex
Copy link
Sponsor Member

Kludex commented Feb 6, 2023

I've set the auto-merge - once the pipeline passes, it will be merged.

I'm going to release Starlette 0.24.0 tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants