Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submitting a form with a blank file input differs to real browser behaviour #250

Closed
the-allanc opened this issue Nov 22, 2018 · 6 comments
Closed
Labels
easy? Probably easy to implement, or WIP almost complete

Comments

@the-allanc
Copy link

First of all, I'm really enjoying using MechanicalSoup after moving away from mechanize, so thanks for all of the work on it!

So I've encountered what I think is a bug - or certainly a deviation from the behaviour of regular browser behaviour.

If you have a form which contains a file input, and you submit it without selecting a file:

  • Doing this via a browser will submit an empty file for the input
  • Doing this via MechanicalSoup will not submit a value for the input at all

To recreate it, here's a very simple CherryPy app that will show the behaviour.

import cherrypy

class OptionalFile(object):

    @cherrypy.expose
    def index(self, name=None, somefile=None):
        if name is not None:
            cherrypy.response.headers['Content-Type'] = 'text/plain'
            result = 'Name: {!r}, File: {!r}'.format(name, somefile)
            cherrypy.log(result)
            return result
        return '''
        <html>
        <body>
        <form method="POST" enctype="multipart/form-data">
        Enter a name: <input type="text" name="name" /><br />
        Enter a file: <input type="file" name="somefile" /><br />
        <input type="submit" />
        </form>
        </body>
        </html>
        '''
        
if __name__ == '__main__':
    cherrypy.quickstart(OptionalFile(), '/', {'/': {'tools.encode.on': True}})

It prints the submitted repsonse to the log, as well as in the content of the response.

Here's how I'm testing what MechanicalSoup does - I set the "name" input element, but leave the "somefile" one blank.

Python 3.6.5 (default, Jul 26 2018, 12:10:08) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from mechanicalsoup import StatefulBrowser
>>> b = StatefulBrowser()
>>> b.open('http://127.0.0.1:8080')
<Response [200]>
>>> form = b.select_form()
>>> form['name'] = 'hello'
>>> print(b.submit_selected().text)
Name: 'hello', File: None

And here's output from the app's logs - I tested with Chromium, then MechanicalSoup, and then Firefox - as you can see, the "file" input is processed as a _cpreqbody.Part object when submitted via a browser (even though no file was selected), while it is None for MechanicalSoup's interaction.

127.0.0.1 - - [22/Nov/2018:14:58:01] "GET / HTTP/1.1" 200 306 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/67.0.3396.99 Chrome/67.0.3396.99 Safari/537.36"
[22/Nov/2018:14:58:04]  Name: 'hello', File: <cherrypy._cpreqbody.Part object at 0x7f9986d6b668>
127.0.0.1 - - [22/Nov/2018:14:58:04] "POST / HTTP/1.1" 200 72 "http://127.0.0.1:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/67.0.3396.99 Chrome/67.0.3396.99 Safari/537.36"
127.0.0.1 - - [22/Nov/2018:14:59:26] "GET / HTTP/1.1" 200 306 "" "python-requests/2.20.1 (MechanicalSoup/0.11.0)"
[22/Nov/2018:15:00:59]  Name: 'hello', File: None
127.0.0.1 - - [22/Nov/2018:15:00:59] "POST / HTTP/1.1" 200 25 "http://127.0.0.1:8080/" "python-requests/2.20.1 (MechanicalSoup/0.11.0)"
127.0.0.1 - - [22/Nov/2018:15:06:14] "GET / HTTP/1.1" 200 306 "" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
127.0.0.1 - - [22/Nov/2018:15:06:18] "GET /favicon.ico HTTP/1.1" 200 1406 "" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
[22/Nov/2018:15:06:25]  Name: 'hello', File: <cherrypy._cpreqbody.Part object at 0x7f99844f6860>
127.0.0.1 - - [22/Nov/2018:15:06:25] "POST / HTTP/1.1" 200 72 "http://127.0.0.1:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
@moy
Copy link
Collaborator

moy commented Nov 22, 2018

The MechanicalSoup behavior makes more sense to me, but since the goal is to mimick browsers, we should definitely change that.

Should be straightforward to fix: https://github.com/MechanicalSoup/MechanicalSoup/blob/master/mechanicalsoup/browser.py#L179-L180

Do you want to try a patch for that?

@the-allanc
Copy link
Author

Do you want to try a patch for that?
Depends what you mean.

Do I want to try a patch that you've written? Sure.
Do I want to write my own patch? Err... I could do, but not entirely sure when I'd get round to it.

@moy
Copy link
Collaborator

moy commented Nov 22, 2018

I meant "try writing a patch". The code itself should be straightforward, I pointed you to the place where we just skip (continue) when the file is not given. The harder part is writing a test. No problem if you don't want to work on that, but if you're interested in writing a bit of code for MechanicalSoup, it's a good one to start.

@the-allanc
Copy link
Author

So I made an attempt to fix it - and writing a test was the more complicated part. In particular, it's harder to process multiform data in a sensible way.

As a suggestion (after a quick look here), I'm thinking that it would be best to make it easier to process multipart data in one of two ways:

  1. Use requests_toolbelt to inspect the submitted data; or
  2. Make it easier to write basic WSGI apps for us to use in the test suite (so we can see how that app handles the submitted output). I'm thinking of using something like Pyriform (full disclosure - I'm the author of that library) to make it easier to connect MechanicalSoup to the WSGI app directly. Then we could use any web framework to accept responses that we send to it, and we would have to explicitly mock less.

Either approach would require adding new dependencies - so I'm interested in what way you think is best to proceed.

@moy
Copy link
Collaborator

moy commented Nov 26, 2018

I don't think you need any new dependency. We already have tests to check the content of the file, using httpbin. See eg. https://github.com/MechanicalSoup/MechanicalSoup/blob/master/tests/test_browser.py#L84-L109

The new test should be very similar, but without the pic_path related code to submit the file, and then checking that the content is "" instead of ":-)".

(Writing this, I realize that we don't actually check that the file is found, which should be fixed in #251)

@moy moy added the easy? Probably easy to implement, or WIP almost complete label Feb 4, 2019
@moy
Copy link
Collaborator

moy commented Mar 26, 2019

Discussion on the flask issue here: pallets/flask#3130

@moy moy closed this as completed in 6293694 Apr 3, 2019
jsm28 added a commit to jsm28/matholymp-py that referenced this issue Jan 12, 2020
When used with a browser, the support for uploading photos as part of
bulk registration resulted in errors when a ZIP file was not uploaded.
This did not show up in testing because MechanicalSoup versions 0.11
and earlier do not handle missing file uploads the same way as a
browser does
<MechanicalSoup/MechanicalSoup#250>.
Similar issues apply when no CSV file is uploaded (should produce an
error, did not with a browser).

This change fixes the code handling uploads to handle empty filenames,
as used by browsers when no file is uploaded, the same as the form
field for the upload being missing, with consequent changes to handle
the case of empty zip_ref following a missing ZIP upload.  Tests are
added for the browser-like case of missing CSV uploads, though those
are only properly effective with newer MechanicalSoup.  This passes
testing with both older and newer MechanicalSoup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy? Probably easy to implement, or WIP almost complete
Projects
None yet
Development

No branches or pull requests

2 participants