New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submit an empty file when leaving a file input blank #270
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include "fixes #250" in the commit message so that merging the PR closes the bug.
Continuous integration is broken, I didn't investigate yet.
mechanicalsoup/browser.py
Outdated
# create a temporary file for upload | ||
file_path = tempfile.mkstemp()[1] | ||
with open(file_path, "w") as f: | ||
f.write("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a temporary file is a very heavyweight way to submit an empty value. In the end, we don't send a "file" but it's content. It should be possible to submit an empty file without actually creating it locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I've tested this, and so long as we are sending the empty string (e.g. files[name] = ""
), then it will be part of the submitted request. None
doesn't work as a value, but I think we don't have to worry about that since we use value = tag.get("value", "")
, which should always be a string.
tests/test_browser.py
Outdated
form = BeautifulSoup(form_html, "lxml").form | ||
|
||
browser = mechanicalsoup.Browser() | ||
response = browser._request(form) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unit testing an internal method. I think it would be better to have an end to end test using httpbin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably am misunderstanding this, but doesn't this already send a request to httpbin using a form manufactured and then checking the content of the response ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. Personally, I think it's fine to leave as a unit test instead of an end-to-end test, especially since it should be converted into a parameterization of the existing test__request_file
test (@moy can overrule me here if he disagrees).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving to a parameterized test__request_file
is a good argument. But as a standalone test, since we're already paying the performance price for a network round-trip (possibly local), the cost/benefit ratio of testing the user-facing logic together with it would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pull request! I think this is on the right track, but there are a few changes we probably want to make. See the details below for more information.
Also, being able to run the tests locally makes development much easier. Let's help you get this set up! Did you run these commands (from the contributing page)?
python3 -m venv .virtual-py3 && source .virtual-py3/bin/activate
pip install -r requirements.txt -r tests/requirements.txt
mechanicalsoup/browser.py
Outdated
@@ -181,7 +181,11 @@ def _request(self, form, url=None, **kwargs): | |||
# (or submit button) enctype attribute is set to | |||
# "multipart/form-data". We don't care, simplify. | |||
if not value: | |||
continue | |||
# create a temporary file for upload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need to be making temporary files here. So long as we are passing the name
to requests, then we will get the multipart form data we want in the response, even if the value is empty.
I'd recommend something like this instead:
if value != "":
value = open(value, "rb")
# If value is the empty string, we still pass it for consistency
# with browsers (see #250).
files[name] = value
tests/test_browser.py
Outdated
@@ -112,6 +112,31 @@ def test__request_file(httpbin): | |||
assert "multipart/form-data" in response.request.headers["Content-Type"] | |||
|
|||
|
|||
def test__request_file_none(httpbin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding a test is great! However, I think this can be accomplished with less code duplication by using the pytest.mark.parametrize
decorator. For example, see tests/test_stateful_browser.py
line 430 (test_follow_link_arg
).
One possible way to use parameters would be to set an expected_content
(":-)" or "") and set_value
(a bool to determine whether or not to write the expected_content
to a temporary file and set the form value for "pic").
Agreed will work on them more this weekend (
I ran those 2 and I now understand why it's preferable to use venv ! Didn't run into any problems during install (apart from upgrading pip because it was unable to find bdist_wheel). |
IIRC,
You get them also on Travis, ran when you submit a PR. You should see some red warnings below this message, and the details are here: |
With this force push, I tried to adress :
Didn't work on the test yet. |
Much better indeed, but there's a new test failure in Travis-CI. Do go you also get it locally? Can you investigate? |
The build error is an assertion failure : So from what I understand, we have I just don't see which of my changes could have impacted this 🤷♂️ |
I believe the issue is that Requests decides what the content-type should be dynamically based on the type of object you pass it. If it is a file-like object, it uses The See the |
I think it actually uses Let's look at what happens when submitting a file with
Note that both the filename (more precisely, the basename) and the file's content is sent. Unfortunately, the response from httpbin.org contains only the file's content (it says Now, it's not clear to me what the browser's behavior is: submit an empty string as filename, content, or both? My guess is "both", but I didn't check. To send both a file name and content, we can't just pass a string to requests, I guess we need to pass a file-like object (doesn't need to be a file, just a dummy class that returns "" when queried for the filename and "" when queried for the content should do it). In short, it's much less trivial than I anticipated... |
Ah, I think you're right. I followed the rabbit hole of Requests when But if this is true, why aren't we seeing "multipart/form-data" in this test? Everything you've said seems to indicate that we should. |
We are seeing it, but the assertion is written in a slightly counter-intuitive way: the header is |
I've setup a minimalist test page to see what the server gets on file upload: http://matthieu-moy.fr/tmp/2019/tmp.php Apparently my guess was right: it submits an empty file name. According to Firefox's inspector, the portion related to pic in the request when I select no file is:
|
So the behavior still stands to be corrected. How is the debate. Should
As for the test, I'm leaning towards a parametrized |
I think it should, yes. One option is to change the failing assertion. The other is to remove the
Yes.
Yes, and more importantly, I'm not 100% sure that passing |
On the behavior, the Requests 1.0.4 doc (hasn't changed ever since AFAIK) states that you can set filenames and a string of content as a dict element.
Do you want me to do a simple commit in this PR for it ?
Should I move on to the parametrized |
I can confirm that the request sends the filename. I ran the current file upload test (that uploads a tempfile with content ":-)") and printed the
If I modify the test as per this PR (filename is passed as the empty string), then
The fact that it uses the However, if we don't want it to make this choice, we can explicitly set the filename by passing filename = value
if value != "" and isinstance(value, string_types):
value = open(value, "rb")
files[name] = (filename, value) This works great, except for one probable bug: if you use
but it populates the
(normally the content shown in |
In real-life, the filename cannot be empty (well, at least on POSIX a filename can't be empty), so an empty filename should be a marker for an absence of file, and then the content should be empty too. So, it's likely a bug (perhaps in httpbin rather than requests?), but it doesn't sound like a harmful one. IOW, it would be nice to get it fixed, but it shouldn't disturb us if it isn't. |
Well, we have a couple options for empty files. Which request would you like to see us make? Returns data in Has empty or no filename, but returns data in Has a multipart content-type, but the content body is empty: My biggest concern is related to how servers will handle the data. If it's something about the request that's making httpbin put the data in As a quick test, I ran the above 4 cases on a simple HTML form, posting to a PHP script that dumped <form method="POST" action="test_processing.php" enctype="multipart/form-data">
<input type="file" name="pic">
<input type="submit" name="action" value="Submit" />
</form> For reference, here's what the browser outputs:
Note that Now with the 4 variations of MechanicalSoup:
2.
3.
4.
Now that I've worked through this a bit, I would say that 3. |
I get to the same conclusion.
Indeed, from your previous message I thought that the content was sent to I've made a simple form that submits to httpbin: https://matthieu-moy.fr/tmp/2019/tmp-httpbin.php I do get the bug when using my browser (Firefox) too. Unfortunately, there seem to be no obvious bug in httpbin (I was hoping for an https://github.com/postmanlabs/httpbin/blob/master/httpbin/core.py#L414 Calling: https://github.com/postmanlabs/httpbin/blob/master/httpbin/helpers.py#L171 So my guess is that the bug is in the underlying framework, http://flask.pocoo.org/. @senabIsShort: interested in investigating this a bit, and report and/or fix the bug upstream? |
I'm really not sure I'd be able to see through such a huge code base and attempt to fix it. Espescially since I still am not really confident in my ability with Python in general. I could report the bug but I'm not sure where and how to word it properly, which points I should put emphasis on. |
I can reproduce the issue with flask alone:
Submitting with only one file set in my browser gives:
The first is the form's field, the second the actual files uploaded. I see no way to distinguish between an empty text field and a non-uploaded file from the @senabIsShort : fixing the bug is probably too involved, but a clean bug report would be nice. Mention me (@moy) in the bug if you report it. |
I'll look into it ! Regarding this PR, do you want me to work towards applying the changes discussed in here ?
|
Yes, this is the way to go. Obviously, you'll hit the "empty file sent to the
|
On this one, I applied all the changes discussed since the last push. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue is the order of commits. Each commit must pass tests and as much as possible have 100% coverage. It's better to keep the tests and code change in the same commit: when reviewing history, the diff on the tests explain how the behavior is changed.
You may want to add an entry in ChangeLog.txt too, but if you don't, we'll do it before the next release.
@@ -56,7 +56,6 @@ def test__request(httpbin): | |||
<p><input type=checkbox name="topping" value="onion" checked>Onion</p> | |||
<p><input type=checkbox name="topping" value="mushroom">Mushroom</p> | |||
</fieldset> | |||
<input name="pic" type="FiLe"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every commit must pass tests (bisectable history), so this change should come before the commit that actually changes the behavior.
About the commit message: no line longer than 80 characters (preferably <= 72 characters) please.
tests/test_browser.py
Outdated
@@ -103,8 +107,15 @@ def test__request_file(httpbin): | |||
found = False | |||
for key, value in response.json().items(): | |||
if key == "files": | |||
assert value["pic"] == ":-)" | |||
found = True | |||
if set_value is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if set_value is True: | |
if set_value: |
tests/test_browser.py
Outdated
# One would expect to find "pic" in files, but as of writing, | ||
# httpbin puts it in form when the filename is empty: | ||
elif key == "form": | ||
if set_value is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if set_value is False: | |
if not set_value: |
tests/test_browser.py
Outdated
pic_path = tempfile.mkstemp()[1] | ||
with open(pic_path, "w") as f: | ||
f.write(":-)") | ||
if set_value is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if set_value is True: | |
if set_value: |
This is to test the `application/x-www-form-urlencoded` data-type when the form contains no file input field. Otherwise, the data-type is set to `multipart/form-data`, which will be tested in `test__request_file`.
I changed up the parametrization to make it simpler and easier to read as per @moy 's legitimate request. I also added the much needed test on
I also added an entry in the ChangeLog, followed the previous entry in 1.0 (btw, not sure if that's where I shoulda put it, but I guessed so from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is shaping up very nicely! Thanks so much for all your hard work -- especially since the issue turned out to be much more complicated than we originally expected. I have a couple more minor comments below, but this looks nearly ready to merge.
tests/test_browser.py
Outdated
f.write(":-)") | ||
if set_value: | ||
# create a temporary file for testing file upload | ||
pic_path = tempfile.mkstemp()[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will leave a temporary file around. Perhaps we should delete it at the end of this test (or use NamedTemporaryFile instead if it fits into the logic in a nice way, which it might not due to the if set_value
statement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pytest also has some temporary directory fanciness, as another option for you! https://docs.pytest.org/en/latest/tmpdir.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ! I completely missed it !
I'll try to add a simple file deletion before ending the test
mechanicalsoup/browser.py
Outdated
continue | ||
if isinstance(value, string_types): | ||
filename = value | ||
if value != "" and isinstance(value, string_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I wrote it out this way myself, but when freshly reviewing this statement, the overloaded use of value
really confused me! Perhaps something like this instead:
filename = value
if filename != "" and isinstance(filename, string_types):
value = open(filename, "rb")
Or to be much more explicit at the cost of a couple more lines of code:
filename = value
if filename != "" and isinstance(filename, string_types):
content = open(filename, "rb")
else:
content = ""
files[name] = (filename, content)
Or something similar, if you have a better way to write it -- just so that it's more clear that we're working with a "filename" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood and agreed !
I'd side with your second suggestion : more explicit is always nice to avoid unnecessary headaches in the future !
Quick push for changes requested by @hemberger ! |
There were several minor issues with the code. mkstemp was already opening the file and we were re-opening it (fixed in 2258685, a bit longer than I had expected because of the bytes/str difference), the I've pushed 3 commits on top of your branch, if you're OK with them just pull, squash them into yours (rebase -i), and force-push. |
Fixes #250 Pass both filename and content to match browser behavior. Add test for lack of file input : Test that an empty filename is sent in such cases. Parametrized test__request_file() in order to avoid code duplication
Pushed ! Thanks for the help ! |
Thanks everyone. It was harder than we thought, but it's merged, now. |
This is in regards to issue #250
For the tests, I followed @moy 's train of thought :
assert value["doc"] == ""
checks that the response contains an empty fileThought a different test definition was necessary, was I right to assume so ?
In
browser.py
, I changed thecontinue
around line 179 to something similar to what has been done intest__request_file
hereThere are 2
Add no file input submit test
commits : the second one is simply a clean up of some commented code. Will avoid it next time !I was unable to run
test_browser.py
due to some weird Import module error on modules that are installed, so I'm kind of Pull Requesting blindly. Does it matter that I say I'm confident in the changes though ?