Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add whence parameter to UploadFile.seek() #1084

Closed
wants to merge 7 commits into from

Conversation

dvarrazzo
Copy link

The second parameter of the seek() method is missing. See https://docs.python.org/3/library/io.html#io.IOBase.seek

This makes UploadFile not compatible with e.g. zipfile.is_zipfile():

  if not is_zipfile(cast(BinaryIO, payload)):
File "/usr/local/lib/python3.8/zipfile.py", line 206, in is_zipfile
  result = _check_zipfile(fp=filename)
File "/usr/local/lib/python3.8/zipfile.py", line 192, in _check_zipfile
  if _EndRecData(fp):
File "/usr/local/lib/python3.8/zipfile.py", line 264, in _EndRecData
  fpin.seek(0, 2)
TypeError: seek() takes 2 positional arguments but 3 were given

Also improve tests for seek(), testing with and without the whence
parameter.
await run_in_threadpool(self.file.seek, offset, whence)

def tell(self) -> int:
return self.file.tell()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this one run_in_threadpool if not in memory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe technically this should just be reading the position of the file descriptor from memory, but there's nothing that guarantees that this doesn't block, and aiofiles itself marks the method as needing to be run in an executor: https://github.com/Tinche/aiofiles/blob/v0.6.0/aiofiles/threadpool/binary.py#L48

@JayH5
Copy link
Member

JayH5 commented Nov 9, 2020

Adding the whence parameter seems reasonable, but I'm not sure what you're trying to achieve? The call cast(BinaryIO, payload) in your code suggests you're trying to treat an async method as sync. This class doesn't attempt to work like a synchronous file.

Given the questions about tell() above, I'd prefer that wasn't added in this PR, if possible.

@dvarrazzo
Copy link
Author

@JayH5 You are right that, 5 minutes later, I stumbled in the problem of is_zipfile() requires a sync file, so I had to roll my poor man's version of is_zipfile:

    async def is_zipfile(cls, f: UploadFile):
        """A poor man's async version of zipfile.is_zipfile()."""
        head = await f.read(4)
        await f.seek(0)
        return head == b"PK\x03\x04"

do you prefer to just close this MR or would you prefer me to make seek() async? I'm not using the feature so I'm ok either way.

@dvarrazzo
Copy link
Author

I've made tell() async too in case you are still interested in having whence and tell implemented.

@JayH5
Copy link
Member

JayH5 commented Nov 14, 2020

do you prefer to just close this MR or would you prefer me to make seek() async? I'm not using the feature so I'm ok either way.

IMO, there's nothing wrong with this, but the change needs a use case to justify its addition. If there isn't a clear one I think we can close this.

@dacevedo12
Copy link

I can see a clear use case for this. Implementing it allows for a method to get the uploaded file size.

Based on django's approach (https://github.com/django/django/blob/master/django/core/files/base.py#L32) I'm doing the following:

async def get_file_size(file_object: UploadFile) -> int:
    file = file_object.file

    # Needed while upstream starlette implements a size method
    # pylint: disable=protected-access
    if file_object._in_memory:
        current_position = file.tell()
        file.seek(0, os.SEEK_END)
        size = file.tell()
        file.seek(current_position)
    else:
        current_position = await run_in_threadpool(file.tell)
        await run_in_threadpool(file.seek, 0, os.SEEK_END)
        size = await run_in_threadpool(file.tell)
        await run_in_threadpool(file.seek, current_position)

    return size

Copy link

@tales-aparecida tales-aparecida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be nice to update the seek description, under UploadFile, at docs/requests.md, as well.

@@ -1,4 +1,5 @@
import io
import os

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to use os' constants instead of io's?

@adriangb adriangb added the feature New feature or request label Feb 2, 2022
@tomchristie
Copy link
Member

Let's close this off. The original motivation was to use it with zipfile.is_zipfile(), but it's an async interface, so not suitable for that either way around.

We've also got an alternate ticket to consider making the size of the uploaded file accessible. #1405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants