Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a max_size parameter to save and _copy_file #1326

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

06180339
Copy link
Contributor

This is a proposal inspired by a requirement I had myself and by the following comment on Stack Overflow: “Bottle stores file uploads in request.files as FileUpload instances, along with some metadata about the upload. FileUpload has a very handy save method that perform the "file slurping" for you. It does not check the maximum file upload though, but IMHO it's better to have nginx in front to perform this kind of limitation check.”

If you don’t use nginx or so and want to have an upper limit for the file size, it would be nice if the save method had an additional parameter max_size to set a maximum size beyond which the writing is stopped.

The changes in this branch hopefully enable that and I wanna propose to merge them.

@Jwink3101
Copy link

I like this idea.

It's super minor, but why use a keyword argument on a private method? It certainly doesn't hurt though and, like I said, this is minor.

@06180339
Copy link
Contributor Author

Thank you. That is a good point – the function signature of a private method can be controlled within this library. OTOH, if max_size in _copy_file is made an arg instead of a kwarg, …

  • either it has to come before chunk_size
  • or chunk_size itself is also made an arg instead of a kwarg (which it is now).

The former has the downside that then the order in the signature differs from the order in the signature of save. And save itself should probably not be changed.

The latter might have the downside that then the default value of 2 ** 16 has to go – that may be insignificant, but maybe the creator of this method preferred to have a default value ingrained here also, not only in save itself.

So, I am obviously fine with anything that seems acceptable to the maintainers.

@06180339
Copy link
Contributor Author

Further possible points of critique may be (in no prioritizing order):

  • Performance: The difference to the version so far would be the line if max_size is not None per every chunk. That is not nothing, but is probably rather negligeable – if there are a lot of chunks, the time should depend on IO anyway, I guess. A compromise is to have no comparatively costly function call in this line – otherwise if isinstance(max_size, int) comes to mind.
  • if isinstance(max_size, int) instead of if max_size is not None? See above. Or is it significantly more robust.
  • If max_size is exceeded, an IOError is thrown – that is also the error for the “File exists”-case. The errors are distinguished by their message each; but it might be better to have a stronger distinction. I am not sure, however, which other Error could or maybe should be used for exceeding the maximum size.

@knro
Copy link

knro commented Apr 7, 2022

This is somewhat related but I couldn't find information on whether there is some hard limitation to file upload size? Can this method support upload for 2GB+ files? This appears to fail in my case on Raspberry PI. So is the current method using RAM? if not, why would it fail on raspberry PI?

@defnull
Copy link
Member

defnull commented Apr 7, 2022

Probably because your TMPDIR is an in-memory file system (e.g. tmpfs) and thus limited to your available RAM.

@knro
Copy link

knro commented Apr 7, 2022

It's not, this is on Raspberry PI 16GB SD card and /tmp is just not size restricted as far as I know.

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       14989948 6395892   7963776  45% /
devtmpfs         3787940       0   3787940   0% /dev
tmpfs            3952804    8356   3944448   1% /dev/shm
tmpfs            3952804   34904   3917900   1% /run
tmpfs               5120       4      5116   1% /run/lock
tmpfs            3952804       0   3952804   0% /sys/fs/cgroup
/dev/mmcblk0p1    258095   49336    208760  20% /boot
tmpfs             790560       8    790552   1% /run/user/1000

@knro
Copy link

knro commented Apr 7, 2022

oh I see, which of the tmpfs above is used by the Python/Bottle framework? is it the last entry which 790MB available size? this is why it fails? What's the workaround for such a case? Thanks!!

@defnull
Copy link
Member

defnull commented Apr 7, 2022

Bottle uses cgi.FieldStorage which uses tempfile.TemporaryFile which in turn stores temp files in the directory returned by https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir

Bottle does not do any in-memory buffering of file uploads. The whole body is buffered, but only for requests smaller than BaseRequest.MEMFILE_MAX which is only 100kb by default. If your temp directory is not size-constrained and you are copying the uploaded file chunk by chunk and not in one step, it should not fail. Please open a new issue with a stack traces.

@knro
Copy link

knro commented Apr 14, 2022

Thanks. Under Linux, gettempdir is indeed /tmp. /tmp does not appear to be mounted to any tmpfs as per the df -h command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants