Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix upload for large BytesIO #1780

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sveinnfannar
Copy link

@sveinnfannar sveinnfannar commented May 1, 2024

Describe your changes

Uploading a BytesIO over 4MiB results in a ValueError: I/O operation on closed file due to the stream being closed twice. Once when checking the file size and again when uploading. This PR make sure the stream is not closed when checking the file size. The PR also includes a test to reproduce the error.

Slack thread about the issue.

Changelog

@erikbern
Copy link
Contributor

erikbern commented May 3, 2024

Nice!

@mwaskom
Copy link
Contributor

mwaskom commented May 3, 2024

Hey, super thanks for the PR and especially for the new test! I am a little concerned that simply removing the context manager will cause other problems though — this code path is also used when uploading files from disk, and I think that this change risks leaving those file handles open. I need to look a little bit more closely though — It's definitely a little trick to follow the code here.

@sveinnfannar
Copy link
Author

You a re right @mwaskom, good catch!

How do we proceed?
I don't mind updating the PR after bouncing a few ideas with you.
If you'd rather solve this within your team at Modal that's fine with me too.

@sveinnfannar sveinnfannar reopened this May 7, 2024
@sveinnfannar
Copy link
Author

sveinnfannar commented May 7, 2024

Ooops didn't mean to close/re-open 😅

What makes this tricky is that when the user passes Path or str we are responsible for opening and closing the file, while for BytesIO we get an open file handle which we then close and can't re-open.

The only solution I can think of that doesn't involve copying the whole payload into memory (which would not be great when a user passes a large BinaryIO that is backed by a file on disk) would be to read the stream in one place instead of two. That means computing size and hash only when the file is uploaded instead of when the FileUploadSpec is created.

@mwaskom can you think of a better solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants