Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submit an empty file when leaving a file input blank #270

Merged
merged 2 commits into from Apr 3, 2019
Merged

Submit an empty file when leaving a file input blank #270

merged 2 commits into from Apr 3, 2019

Conversation

senabIsShort
Copy link
Contributor

This is in regards to issue #250

For the tests, I followed @moy 's train of thought :

  • they are basically a copy+paste without the creation of a temp file
  • assert value["doc"] == "" checks that the response contains an empty file

Thought a different test definition was necessary, was I right to assume so ?

In browser.py, I changed the continue around line 179 to something similar to what has been done in test__request_file here

There are 2 Add no file input submit test commits : the second one is simply a clean up of some commented code. Will avoid it next time !

I was unable to run test_browser.py due to some weird Import module error on modules that are installed, so I'm kind of Pull Requesting blindly. Does it matter that I say I'm confident in the changes though ?

Copy link
Collaborator

@moy moy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include "fixes #250" in the commit message so that merging the PR closes the bug.

Continuous integration is broken, I didn't investigate yet.

# create a temporary file for upload
file_path = tempfile.mkstemp()[1]
with open(file_path, "w") as f:
f.write("")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a temporary file is a very heavyweight way to submit an empty value. In the end, we don't send a "file" but it's content. It should be possible to submit an empty file without actually creating it locally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I've tested this, and so long as we are sending the empty string (e.g. files[name] = ""), then it will be part of the submitted request. None doesn't work as a value, but I think we don't have to worry about that since we use value = tag.get("value", ""), which should always be a string.

form = BeautifulSoup(form_html, "lxml").form

browser = mechanicalsoup.Browser()
response = browser._request(form)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unit testing an internal method. I think it would be better to have an end to end test using httpbin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably am misunderstanding this, but doesn't this already send a request to httpbin using a form manufactured and then checking the content of the response ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Personally, I think it's fine to leave as a unit test instead of an end-to-end test, especially since it should be converted into a parameterization of the existing test__request_file test (@moy can overrule me here if he disagrees).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to a parameterized test__request_file is a good argument. But as a standalone test, since we're already paying the performance price for a network round-trip (possibly local), the cost/benefit ratio of testing the user-facing logic together with it would be good.

Copy link
Contributor

@hemberger hemberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request! I think this is on the right track, but there are a few changes we probably want to make. See the details below for more information.

Also, being able to run the tests locally makes development much easier. Let's help you get this set up! Did you run these commands (from the contributing page)?

python3 -m venv .virtual-py3 && source .virtual-py3/bin/activate
pip install -r requirements.txt -r tests/requirements.txt

@@ -181,7 +181,11 @@ def _request(self, form, url=None, **kwargs):
# (or submit button) enctype attribute is set to
# "multipart/form-data". We don't care, simplify.
if not value:
continue
# create a temporary file for upload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to be making temporary files here. So long as we are passing the name to requests, then we will get the multipart form data we want in the response, even if the value is empty.

I'd recommend something like this instead:

                    if value != "": 
                        value = open(value, "rb")
                    # If value is the empty string, we still pass it for consistency
                    # with browsers (see #250).
                    files[name] = value

@@ -112,6 +112,31 @@ def test__request_file(httpbin):
assert "multipart/form-data" in response.request.headers["Content-Type"]


def test__request_file_none(httpbin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding a test is great! However, I think this can be accomplished with less code duplication by using the pytest.mark.parametrize decorator. For example, see tests/test_stateful_browser.py line 430 (test_follow_link_arg).

One possible way to use parameters would be to set an expected_content (":-)" or "") and set_value (a bool to determine whether or not to write the expected_content to a temporary file and set the form value for "pic").

@senabIsShort
Copy link
Contributor Author

senabIsShort commented Feb 14, 2019

Thanks for the pull request! I think this is on the right track, but there are a few changes we probably want to make. See the details below for more information.

Agreed will work on them more this weekend (in 36h from now. oops, life got in the way, lesson learned), prioritizing your second point below !

Also, being able to run the tests locally makes development much easier. Let's help you get this set up! Did you run these commands (from the contributing page)?

python3 -m venv .virtual-py3 && source .virtual-py3/bin/activate
pip install -r requirements.txt -r tests/requirements.txt

I ran those 2 and I now understand why it's preferable to use venv ! Didn't run into any problems during install (apart from upgrading pip because it was unable to find bdist_wheel).
Running pytest in venv though, that pointed to 3 failures (didn't quite understand them, couldn't look thoroughly)

@moy
Copy link
Collaborator

moy commented Feb 14, 2019

I ran those 2 and I now understand why it's preferable to use venv ! Didn't run into any problems during install (apart from upgrading pip because it was unable to find bdist_wheel).

IIRC, python setup.py test does the venv things for you alternatively.

Running pytest in venv though, that pointed to 3 failures (didn't quite understand them, couldn't look thoroughly)

You get them also on Travis, ran when you submit a PR. You should see some red warnings below this message, and the details are here:

https://travis-ci.org/MechanicalSoup/MechanicalSoup/builds/492922831?utm_source=github_status&utm_medium=notification

@senabIsShort
Copy link
Contributor Author

With this force push, I tried to adress :

  • The messy commit history ; might get messier in the future, but it doesn't hurt to clean it up
  • Submitting the empty value without creating a tempfile ; felt like the if isinstance(value, string_types) was still necessary, though I think I am changing the behavior when value is not empty and not an instance of string-types

Didn't work on the test yet.

@moy
Copy link
Collaborator

moy commented Mar 2, 2019

Much better indeed, but there's a new test failure in Travis-CI. Do go you also get it locally? Can you investigate?

@senabIsShort
Copy link
Contributor Author

senabIsShort commented Mar 4, 2019

Much better indeed, but there's a new test failure in Travis-CI. Do go you also get it locally? Can you investigate?

The build error is an assertion failure :
AssertionError: assert 'application/x-www-form-urlencoded' in 'multipart/form-data; boundary=18681345e388f6b641307a603af755c0'

So from what I understand, we have multipart/form-data type where we should be having an application/x-www-form-urlencoded

I just don't see which of my changes could have impacted this 🤷‍♂️

@hemberger
Copy link
Contributor

I believe the issue is that Requests decides what the content-type should be dynamically based on the type of object you pass it. If it is a file-like object, it uses multipart/form-data; if it is a string (as is the case here where we use an empty string to "submit an empty file"), it apparently uses application/x-www-form-urlencoded.

The Requests.request interface allows you to explicitly pass a content-type as part of the file argument, but this looks like a relatively new feature. We should identify what version it was added in and decide if we want to bump the minimum requirement for this feature.

See the files argument of http://docs.python-requests.org/en/master/api/#requests.request for more info.

@moy
Copy link
Collaborator

moy commented Mar 5, 2019

I believe the issue is that Requests decides what the content-type should be dynamically based on the type of object you pass it. If it is a file-like object, it uses multipart/form-data; if it is a string (as is the case here where we use an empty string to "submit an empty file"), it apparently uses application/x-www-form-urlencoded.

I think it actually uses multipart/form-data whenever there is a file argument. Before the change, we were not submitting the file because it was not filled-in, but with the change, we pass "" as file.

Let's look at what happens when submitting a file with curl:

$ curl -X POST -F 'image=@/tmp/filename.txt' -F name1=value1 -F name2=value2 http://httpbin.org/post --trace-ascii -
== Info:   Trying 52.71.234.219...
== Info: TCP_NODELAY set
== Info: Connected to httpbin.org (52.71.234.219) port 80 (#0)
=> Send header, 187 bytes (0xbb)
0000: POST /post HTTP/1.1
0015: Host: httpbin.org
0028: User-Agent: curl/7.61.0
0041: Accept: */*
004e: Content-Length: 399
0063: Content-Type: multipart/form-data; boundary=--------------------
00a3: ----528785e66ba5b1dc
00b9: 
=> Send data, 399 bytes (0x18f)
0000: --------------------------528785e66ba5b1dc
002c: Content-Disposition: form-data; name="image"; filename="filename
006c: .txt"
0073: Content-Type: text/plain
008d: 
008f: content.
0099: --------------------------528785e66ba5b1dc
00c5: Content-Disposition: form-data; name="name1"
00f3: 
00f5: value1
00fd: --------------------------528785e66ba5b1dc
0129: Content-Disposition: form-data; name="name2"
0157: 
0159: value2
0161: --------------------------528785e66ba5b1dc--
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 40 bytes (0x28)
0000: Access-Control-Allow-Credentials: true
<= Recv header, 32 bytes (0x20)
0000: Access-Control-Allow-Origin: *
<= Recv header, 32 bytes (0x20)
0000: Content-Type: application/json
<= Recv header, 37 bytes (0x25)
0000: Date: Tue, 05 Mar 2019 06:50:18 GMT
<= Recv header, 15 bytes (0xf)
0000: Server: nginx
<= Recv header, 21 bytes (0x15)
0000: Content-Length: 466
<= Recv header, 24 bytes (0x18)
0000: Connection: keep-alive
<= Recv header, 2 bytes (0x2)
0000: 
<= Recv data, 466 bytes (0x1d2)
0000: {.  "args": {}, .  "data": "", .  "files": {.    "image": "conte
0040: nt\n".  }, .  "form": {.    "name1": "value1", .    "name2": "va
0080: lue2".  }, .  "headers": {.    "Accept": "*/*", .    "Content-Le
00c0: ngth": "399", .    "Content-Type": "multipart/form-data; boundar
0100: y=------------------------528785e66ba5b1dc", .    "Host": "httpb
0140: in.org", .    "User-Agent": "curl/7.61.0".  }, .  "json": null, 
0180: .  "origin": "91.68.56.209, 91.68.56.209", .  "url": "https://ht
01c0: tpbin.org/post".}.
{
  "args": {}, 
  "data": "", 
  "files": {
    "image": "content\n"
  }, 
  "form": {
    "name1": "value1", 
    "name2": "value2"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "399", 
    "Content-Type": "multipart/form-data; boundary=------------------------528785e66ba5b1dc", 
    "Host": "httpbin.org", 
    "User-Agent": "curl/7.61.0"
  }, 
  "json": null, 
  "origin": "91.68.56.209, 91.68.56.209", 
  "url": "https://httpbin.org/post"
}
== Info: Connection #0 to host httpbin.org left intact

Note that both the filename (more precisely, the basename) and the file's content is sent. Unfortunately, the response from httpbin.org contains only the file's content (it says image="content" where image is the field name in HTML, content is the content of the file, but filename.txt appears nowhere). That's bad news, because this means we can't fully test our behavior with httpbin. What would be nice would be to contribute to httpbin a feature to expose the filename in the json response.

Now, it's not clear to me what the browser's behavior is: submit an empty string as filename, content, or both? My guess is "both", but I didn't check.

To send both a file name and content, we can't just pass a string to requests, I guess we need to pass a file-like object (doesn't need to be a file, just a dummy class that returns "" when queried for the filename and "" when queried for the content should do it).

In short, it's much less trivial than I anticipated...

@hemberger
Copy link
Contributor

Ah, I think you're right. I followed the rabbit hole of Requests when file is specified, and it ultimately leads to urllib3.filepost.encode_multipart_formdata, which is pretty unambiguous about the content-type it sets.

But if this is true, why aren't we seeing "multipart/form-data" in this test? Everything you've said seems to indicate that we should.

@moy
Copy link
Collaborator

moy commented Mar 5, 2019

But if this is true, why aren't we seeing "multipart/form-data" in this test? Everything you've said seems to indicate that we should.

We are seeing it, but the assertion is written in a slightly counter-intuitive way: the header is multipart/form-data; boundary=fbfdf6eb526d8951e6cbe849ff2ffd35' and the assertion complains that is does not contain application/x-www-form-urlencoded.

@moy
Copy link
Collaborator

moy commented Mar 5, 2019

I've setup a minimalist test page to see what the server gets on file upload: http://matthieu-moy.fr/tmp/2019/tmp.php

Apparently my guess was right: it submits an empty file name. According to Firefox's inspector, the portion related to pic in the request when I select no file is:

-----------------------------1565619466406037082297189337
Content-Disposition: form-data; name="pic"; filename=""
Content-Type: application/octet-stream


-----------------------------1565619466406037082297189337

name="pic" is the field's name, filename="" means an empty file basename, and there's no content below.

@senabIsShort
Copy link
Contributor Author

senabIsShort commented Mar 5, 2019

So the behavior still stands to be corrected. How is the debate.

Should test__request be modified in that case ?
Especially the assertion that raised the error : since we're going to upload a file even if empty, requests will define the data type as multipart/form-data, as it should be, right?

Edit
as it should be in the case of a form containing a file input*

As for the test, I'm leaning towards a parametrized test__request_file, but correct me if I'm wrong : it would still need to be modified in order to account for filename testing.

@moy
Copy link
Collaborator

moy commented Mar 5, 2019

So the behavior still stands to be corrected. How is the debate.

Should test__request be modified in that case ?

I think it should, yes. One option is to change the failing assertion. The other is to remove the file input field and consider that file inputs have to be tested somewhere else.

Especially the assertion that raised the error : since we're going to upload a file even if empty, requests will define the data type as multipart/form-data, as it should be, right?

Yes.

As for the test, I'm leaning towards a parametrized test__request_file, but correct me if I'm wrong : it would still need to be modified in order to account for filename testing.

Yes, and more importantly, I'm not 100% sure that passing "" as file parameter works, since we need to set both the file name and content to "", and httpbin doesn't allow us to test that (it only gives the content). So, the code is probably OK, but I'd like to be sure it is before we merge.

@senabIsShort
Copy link
Contributor Author

senabIsShort commented Mar 5, 2019

On the behavior, the Requests 1.0.4 doc (hasn't changed ever since AFAIK) states that you can set filenames and a string of content as a dict element.
Is this towards what might be a fix ?


I think it should, yes. One option is to change the failing assertion. The other is to remove the file input field and consider that file inputs have to be tested somewhere else.

Do you want me to do a simple commit in this PR for it ?
I would tend to go with Option 2, since wherever there's a file input, it'll be sent as multipart/form-data anyway, and it would allow us to test application/x-www-form-urlencoded on simple forms somewhere.


Yes, and more importantly, I'm not 100% sure that passing "" as file parameter works, since we need to set both the file name and content to "", and httpbin doesn't allow us to test that (it only gives the content). So, the code is probably OK, but I'd like to be sure it is before we merge.

Should I move on to the parametrized test__request_file in the meantime (without taking into account the filename in the test) just to have a basis to work on ?
Won't do it today, but it'll be a bullet point on my task list.

@hemberger
Copy link
Contributor

I can confirm that the request sends the filename.

I ran the current file upload test (that uploads a tempfile with content ":-)") and printed the response.request.body:

--aced827538a3c1b6a26afb6d84cf86b5
Content-Disposition: form-data; name="pic"; filename="tmpps8_kar3"

:-)
--aced827538a3c1b6a26afb6d84cf86b5--

If I modify the test as per this PR (filename is passed as the empty string), then repsponse.request.body looks like:

--319fd95615d843cbb7d1c158e0bbafd5
Content-Disposition: form-data; name="pic"; filename="pic"


--319fd95615d843cbb7d1c158e0bbafd5--

The fact that it uses the name attribute for the filename is an explicit choice in requests.models if it cannot guess the filename from the fileobj (where, in this case, fileobj is the empty string, so guess_filename returns None!).

However, if we don't want it to make this choice, we can explicitly set the filename by passing files={name: (filename, fileobj)} instead of files={name: fileobj} (see the request API). Something like:

                    filename = value
                    if value != "" and isinstance(value, string_types):
                        value = open(value, "rb")
                    files[name] = (filename, value)

This works great, except for one probable bug: if you use files={name: (filename, fileobj)} and filename is the empty string (fileobj can be the empty string or a real file), it creates the content body as we want:

--319fd95615d843cbb7d1c158e0bbafd5
Content-Disposition: form-data; name="pic"; filename=""

:-)
--319fd95615d843cbb7d1c158e0bbafd5--

but it populates the form response instead of files, which definitely seems wrong.

>>> print(response.json())
{'args': {}, 'data': '', 'files': {}, 'form': {'pic': ':-)'}, 'headers': {...}}

(normally the content shown in form is in files). I'd like to investigate this further, as it may be a bug in requests.

@moy
Copy link
Collaborator

moy commented Mar 5, 2019

This works great, except for one probable bug: if you use files={name: (filename, fileobj)} and filename is the empty string (fileobj can be the empty string or a real file), it creates the content body as we want:

In real-life, the filename cannot be empty (well, at least on POSIX a filename can't be empty), so an empty filename should be a marker for an absence of file, and then the content should be empty too.

So, it's likely a bug (perhaps in httpbin rather than requests?), but it doesn't sound like a harmful one. IOW, it would be nice to get it fixed, but it shouldn't disturb us if it isn't.

@hemberger
Copy link
Contributor

hemberger commented Mar 5, 2019

Well, we have a couple options for empty files. Which request would you like to see us make?

Returns data in files, but has fake filename:
1. files['pic'] = ""Content-Disposition: form-data; name="pic"; filename="pic"

Has empty or no filename, but returns data in form:
2. files['pic'] = (None, "")Content-Disposition: form-data; name="pic";
3. files['pic'] = ("", "")Content-Disposition: form-data; name="pic"; filename=""

Has a multipart content-type, but the content body is empty:
4. files['pic'] = None → (empty)


My biggest concern is related to how servers will handle the data. If it's something about the request that's making httpbin put the data in form instead of files, then I worry that, for example, PHP might put the pic field in $_POST instead of $_FILES.

As a quick test, I ran the above 4 cases on a simple HTML form, posting to a PHP script that dumped $_POST and $_FILES.

<form method="POST" action="test_processing.php" enctype="multipart/form-data">
    <input type="file" name="pic">
    <input type="submit" name="action" value="Submit" />
</form>

For reference, here's what the browser outputs:

FILES:
test_processing.php:4:
array (size=1)
  'pic' => 
    array (size=5)
      'name' => string '' (length=0)
      'type' => string '' (length=0)
      'tmp_name' => string '' (length=0)
      'error' => int 4
      'size' => int 0
POST:
test_processing.php:6:
array (size=1)
  'action' => string 'Submit' (length=6)

Note that 'error' => int 4 is the code that maps to UPLOAD_ERR_NO_FILE: No file was uploaded, which is clearly an appropriate error. :)

Now with the 4 variations of MechanicalSoup:
1. files['pic'] = ""

FILES:
test_processing.php:4:
array (size=1)
  'pic' => 
    array (size=5)
      'name' => string 'pic' (length=3)
      'type' => string '' (length=0)
      'tmp_name' => string '/tmp/phpmFqtXB' (length=14)
      'error' => int 0
      'size' => int 0
POST:
test_processing.php:6:
array (size=1)
  'action' => string 'Submit' (length=6)

2. files['pic'] = (None, "")

FILES:
test_processing.php:4:
array (size=0)
  empty
POST:
test_processing.php:6:
array (size=2)
  'action' => string 'Submit' (length=6)
  'pic' => string '' (length=0)

3. files['pic'] = ("", "")

FILES:
test_processing.php:4:
array (size=1)
  'pic' => 
    array (size=5)
      'name' => string '' (length=0)
      'type' => string '' (length=0)
      'tmp_name' => string '' (length=0)
      'error' => int 4
      'size' => int 0
POST:
test_processing.php:6:
array (size=1)
  'action' => string 'Submit' (length=6)

4. files['pic'] = None

FILES:
test_processing.php:4:
array (size=0)
  empty
POST:
test_processing.php:6:
array (size=1)
  'action' => string 'Submit' (length=6)

Now that I've worked through this a bit, I would say that 3. files['pic'] = ("", "") is the correct choice (even though httpbin places the data in form instead of files), primarily because it matches the Content-Disposition that @moy saw with cURL and is the only option that returns the UPLOAD_ERR_NO_FILE error code in PHP.

@moy
Copy link
Collaborator

moy commented Mar 6, 2019

Now that I've worked through this a bit, I would say that 3. files['pic'] = ("", "") is the correct choice

I get to the same conclusion.

even though httpbin places the data in form instead of files

Indeed, from your previous message I thought that the content was sent to form when it was non-empty, but the bug in httpbin happens also with empty content.

I've made a simple form that submits to httpbin:

https://matthieu-moy.fr/tmp/2019/tmp-httpbin.php

I do get the bug when using my browser (Firefox) too.

Unfortunately, there seem to be no obvious bug in httpbin (I was hoping for an if filename: send_to_files; else: send_to_form instead of an if filename is None somewhere):

https://github.com/postmanlabs/httpbin/blob/master/httpbin/core.py#L414

Calling:

https://github.com/postmanlabs/httpbin/blob/master/httpbin/helpers.py#L171

So my guess is that the bug is in the underlying framework, http://flask.pocoo.org/.

@senabIsShort: interested in investigating this a bit, and report and/or fix the bug upstream?

@senabIsShort
Copy link
Contributor Author

I'm really not sure I'd be able to see through such a huge code base and attempt to fix it. Espescially since I still am not really confident in my ability with Python in general.

I could report the bug but I'm not sure where and how to word it properly, which points I should put emphasis on.

@moy
Copy link
Collaborator

moy commented Mar 9, 2019

I can reproduce the issue with flask alone:

from flask import Flask, flash, request, redirect, url_for, escape

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def upload_file():
    if request.method == 'POST':
        return '<pre>{}</pre>'.format(escape(str(request.form) + '\n' + str(request.files)))
    return '''
    <!doctype html>
    <title>Upload new Files</title>
    <h1>Upload new Files</h1>
    <form method=post enctype=multipart/form-data>
      <input name="normal_field" value="field_value" />
      <input type=file name=file>
      <input type=file name=secondfile>
      <input type=submit value=Upload>
    </form>
    '''

Submitting with only one file set in my browser gives:

ImmutableMultiDict([('normal_field', 'field_value'), ('secondfile', '')])
ImmutableMultiDict([('file', <FileStorage: 'tmp-rg1.xpi' ('application/x-xpinstall')>)])

The first is the form's field, the second the actual files uploaded. I see no way to distinguish between an empty text field and a non-uploaded file from the request object, so it's very likely a bug (but I'm not sure what's the intended behavior). I didn't find the source of the issue in the code.

@senabIsShort : fixing the bug is probably too involved, but a clean bug report would be nice. Mention me (@moy) in the bug if you report it.

@senabIsShort
Copy link
Contributor Author

I'll look into it !

Regarding this PR, do you want me to work towards applying the changes discussed in here ?

  • removing the file field in test__request,
  • parametrizing test__request_file,
  • modifying brower behavior to match files['pic'] = ("", "")

@moy
Copy link
Collaborator

moy commented Mar 9, 2019

I'll look into it !

Regarding this PR, do you want me to work towards applying the changes discussed in here ?

* removing the file field in test__request,

* parametrizing test__request_file,

* modifying brower behavior to match `files['pic'] = ("", "")`

Yes, this is the way to go. Obviously, you'll hit the "empty file sent to the form field" bug when you test this properly, but you can deal with it with stg like

# One would expect to find 'filename' in files, but as of writing,
# httpbin puts it in form when the filename is empty:
assert files['fieldname'] == "" or form['fieldname'] == ""

@senabIsShort
Copy link
Contributor Author

On this one, I applied all the changes discussed since the last push.
Is it okay with you guys ?

Copy link
Collaborator

@moy moy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue is the order of commits. Each commit must pass tests and as much as possible have 100% coverage. It's better to keep the tests and code change in the same commit: when reviewing history, the diff on the tests explain how the behavior is changed.
You may want to add an entry in ChangeLog.txt too, but if you don't, we'll do it before the next release.

@@ -56,7 +56,6 @@ def test__request(httpbin):
<p><input type=checkbox name="topping" value="onion" checked>Onion</p>
<p><input type=checkbox name="topping" value="mushroom">Mushroom</p>
</fieldset>
<input name="pic" type="FiLe">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every commit must pass tests (bisectable history), so this change should come before the commit that actually changes the behavior.

About the commit message: no line longer than 80 characters (preferably <= 72 characters) please.

@@ -103,8 +107,15 @@ def test__request_file(httpbin):
found = False
for key, value in response.json().items():
if key == "files":
assert value["pic"] == ":-)"
found = True
if set_value is True:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if set_value is True:
if set_value:

# One would expect to find "pic" in files, but as of writing,
# httpbin puts it in form when the filename is empty:
elif key == "form":
if set_value is False:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if set_value is False:
if not set_value:

pic_path = tempfile.mkstemp()[1]
with open(pic_path, "w") as f:
f.write(":-)")
if set_value is True:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if set_value is True:
if set_value:

tests/test_browser.py Show resolved Hide resolved
This is to test the `application/x-www-form-urlencoded` data-type
when the form contains no file input field.

Otherwise, the data-type is set to `multipart/form-data`,
which will be tested in `test__request_file`.
@senabIsShort
Copy link
Contributor Author

I changed up the parametrization to make it simpler and easier to read as per @moy 's legitimate request.

I also added the much needed test on filename="".
For this I used :

  • the Response object returned by request() contains the exact PreparedRequest that it responds to
  • PreparedRequest.body is a member containing the body of the request in binary format, printable, but not usable
  • using .decode("utf-8"), I decoded the binary content into a simple string object
  • a simple assert "filename=\"\" in X does the check

I also added an entry in the ChangeLog, followed the previous entry in 1.0 (btw, not sure if that's where I shoulda put it, but I guessed so from the Currently under development).
Didn't think a separate commit was a good idea since it is 100% related to the changes made.

Copy link
Contributor

@hemberger hemberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shaping up very nicely! Thanks so much for all your hard work -- especially since the issue turned out to be much more complicated than we originally expected. I have a couple more minor comments below, but this looks nearly ready to merge.

f.write(":-)")
if set_value:
# create a temporary file for testing file upload
pic_path = tempfile.mkstemp()[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will leave a temporary file around. Perhaps we should delete it at the end of this test (or use NamedTemporaryFile instead if it fits into the logic in a nice way, which it might not due to the if set_value statement).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pytest also has some temporary directory fanciness, as another option for you! https://docs.pytest.org/en/latest/tmpdir.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ! I completely missed it !
I'll try to add a simple file deletion before ending the test

continue
if isinstance(value, string_types):
filename = value
if value != "" and isinstance(value, string_types):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I wrote it out this way myself, but when freshly reviewing this statement, the overloaded use of value really confused me! Perhaps something like this instead:

filename = value
if filename != "" and isinstance(filename, string_types):
    value = open(filename, "rb")

Or to be much more explicit at the cost of a couple more lines of code:

filename = value
if filename != "" and isinstance(filename, string_types):
    content = open(filename, "rb")
else:
    content = ""
files[name] = (filename, content)

Or something similar, if you have a better way to write it -- just so that it's more clear that we're working with a "filename" :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood and agreed !
I'd side with your second suggestion : more explicit is always nice to avoid unnecessary headaches in the future !

@senabIsShort
Copy link
Contributor Author

Quick push for changes requested by @hemberger !
Will only be able to check back in the morning !

@moy
Copy link
Collaborator

moy commented Apr 3, 2019

There were several minor issues with the code. mkstemp was already opening the file and we were re-opening it (fixed in 2258685, a bit longer than I had expected because of the bytes/str difference), the found = not found was a bit too optimistic, and you didn't need to .encode('utf-8') to check the presence of filename="".

I've pushed 3 commits on top of your branch, if you're OK with them just pull, squash them into yours (rebase -i), and force-push.

Fixes #250
Pass both filename and content to match browser behavior.

Add test for lack of file input :
Test that an empty filename is sent in such cases.
Parametrized test__request_file() in order to avoid code duplication
@senabIsShort
Copy link
Contributor Author

Pushed ! Thanks for the help !

@moy moy merged commit 6293694 into MechanicalSoup:master Apr 3, 2019
@moy
Copy link
Collaborator

moy commented Apr 3, 2019

Thanks everyone. It was harder than we thought, but it's merged, now.

@moy moy mentioned this pull request Apr 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants