Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using email module to parse multipart insteal of the deprecated cgi module #1437

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aisk
Copy link

@aisk aisk commented Nov 28, 2023

fix: #1403

Since cgi will be removed, the Python change log recommends to using email.message or the PYPI package multipart, and bottle does not allow to use external dependencies, and vendoring multipart is not a good practice, so I think the email package is a better way.

I don't check too much about the compatibilities, if some maintainer think this way is okay, I'll invest more time to do it. But the test_multipart passed on my local machine (some other tests failed because I'm using Windows and they failed in the master branch).

@defnull
Copy link
Member

defnull commented Nov 28, 2023

Unfortunately, all data parsed by email.parser.FeedParser will end up in memory buffered Messages. Uploading large files (or many small ones) would likely trigger MemoryError on a busy server. The parser needs a way to offload large file uploads into temporary files to be useful in a web context. Not sure if the email.parser package supports that use case.

@aisk
Copy link
Author

aisk commented Nov 28, 2023

The email.parser.FeedParser has an optional argument _factory, which can specify which Message class will be used in the parsed result. So we can subclass the email.message.Message, and override the set_payload and any other methods to offload the large file to the disk.

I didn't take too much to see if this will work, if you didn't check this too, I want to investigate on it.

@defnull
Copy link
Member

defnull commented Nov 28, 2023

Does not really help for large uploads, as those are still collected as a list of strings in memory before set_payload is even called. That behavior is hard-coded in the parser. The parser is also string-based, binary data is passed in as data.decode('ascii', 'surrogateescape') and copied multiple times. It was designed for emails (where you need an error tolerant and lax parser) and not for the internet (where you need a fast and strict parser that bails immediately if it sees something fishy). I would love to use that parser, that would be my first choice if that was an option. But I do not think it is suitable for this use case.

@aisk
Copy link
Author

aisk commented Nov 29, 2023

Thanks for the kindly reply, I've got the point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"cgi" standard library module used by bottle is deprecated in Python 3.11, to be removed in 3.13
2 participants