debug _find_link_internal #256

tiboroche · 2019-01-16T16:16:52Z

If you called the function with url_regex='/something' and not the
link argument, it would set the url_regex to None, and thus the result
would be impredictable.

moy

The travis build fails (test failure), so there seems to be an issue somewhere, this obviously needs fixing before merging.

Also, it would be nice to have a test demonstrating the failure to illustrate the issue that is fixed by the commit.

moy · 2019-01-16T17:31:25Z

mechanicalsoup/stateful_browser.py

+                                 'url_regex because url_regex is already '
+                                 'present in keyword arguments')
+            else:
+                kwargs['url_regex'] = link


I'd rather keep the if ... and ... as is and turn else: into else if link: ... for readability.

hemberger · 2019-01-16T17:59:03Z

Hi, and thanks for the pull request!

So if I understand this correctly, the case you want to fix is this:

br.find_link(None, url_regex="/something")

However, according to the docstring of find_link():

If link doesn't have a href-attribute or is None, treat link as a url_regex

It looks like the intention is to not specify a url_regex as one of the kwargs of find_link(), but to instead overload the link argument for that purpose (as a convenience?).

I think your patch makes sense on its own, but I'm not 100% sure what the intended behavior of find_link() is. I see two possible ways to go forward:

Change the docstring to be consistent with your new behavior
Keep the current behavior, but add better argument validation to avoid unexpected results

hemberger · 2019-01-16T18:27:34Z

By the way, the test failure is a separate issue. I just updated all my pip packages locally and I'm now seeing the same issue without this patch. (Looks like it's a list ordering issue...)

tiboroche · 2019-01-17T09:54:26Z

Hi, and thanks for the pull request!

So if I understand this correctly, the case you want to fix is this:
br.find_link(None, url_regex="/something")

Not exactly. In my code I had a line like this :

 browser.download_link(url_regex='/admin/https/csr', file='/vagrant/local.csr')

which would call

 self._find_link_internal(None, url_regex='/admin/https/csr')

in this case, the code would set up the url_regex keyword to None (the value of link), thus downloading a "random" link from the page instead of the one I wanted.

I think that the two following calls should have the same result, which seems coherent to the documentation in my opinion.

 self.follow_link(url_regex='/admin/https/csr')
 self._find_link_internal(None, url_regex='/admin/https/csr')

 self.follow_link('/admin/https/csr')
 self._find_link_internal('/admin/https/csr')

And this is what is fixed with my patch.
I can make the change with the elif and add a test if you agree with my fix.

The `test_link_arg_regex` test was accidentally passing. By adding another link to the test html, we can see that it was selecting the first link instead of the link that matched the supplied regex. This new version of the test fails, but it will be fixed by MechanicalSoup#256. Also added a case for no arguments and reduced code duplication by changing it to a parameterized test.

hemberger · 2019-01-17T21:16:59Z

I agree with the change (but to clarify, it would grab the first link, not a random one).

This should have been caught by test_link_arg_regex, but the test happened to be passing accidentally (see #261).

Please go ahead and make any requested changes, and then I think this is ready to be merged. It would be worth checking that the code passes the improved tests in #261, but I wouldn't worry about that yourself.

The `test_link_arg_regex` test was accidentally passing. By adding another link to the test html, we can see that it was selecting the first link instead of the link that matched the supplied regex. This new version of the test fails, but it will be fixed by MechanicalSoup#256. Also added a case for no arguments and reduced code duplication by changing it to a parameterized test.

hemberger · 2019-01-18T22:14:41Z

@tiboroche #261 has been resolved, so please rebase your branch against master when updating and remove the pytest.mark.xfail mark from test_follow_link_arg in tests/test_stateful_browser.py (since your patch should now fix that test!).

Thanks!

tiboroche · 2019-02-28T13:52:50Z

I made the requested changes.

moy · 2019-02-28T15:26:22Z

Thanks!

Can you rebase on top of master and squash all commits into one? Ideally, the commit message can contain a mention of the commit introducing the xfail and explain why it's fixed.

If you don't have time, I can do it while merging, no problem.

hemberger

Thanks again for the PR!

CSS selectors in bs4 now return elements in page order, whereas they did not previously. This requires us to re-order some of our expected test output, and to perform an order-independent comparison if tested with a bs4 version before 4.7.0. Tested and passing with bs4 4.6.0 and 4.7.1. Closes MechanicalSoup#257.

The `message` parameter to `pytest.raises` was deprecated in pytest version 4.1. > PytestDeprecationWarning: The 'message' parameter is deprecated. > (did you mean to use `match='some regex'` to check the exception message?) Yes! We did mean to use `match`, and now we do! Thanks pytest.

Fixes MechanicalSoup#245.

FAQ: Update on alternatives mechanize is back on track, RoboBrowser seems clearly abandoned.

The tag types were being checked in two ways: 1. tag.get("type", "").lower() == X 2. tag.get("type") is not None and tag.get("type").lower() == X Since these should be identical, change all to the first option, which is simpler and faster. This is a follow-up to MechanicalSoup#246.

The `test_link_arg_regex` test was accidentally passing. By adding another link to the test html, we can see that it was selecting the first link instead of the link that matched the supplied regex. This new version of the test fails, but it will be fixed by MechanicalSoup#256. Also added a case for no arguments and reduced code duplication by changing it to a parameterized test.

Closes MechanicalSoup#248. MechanicalSoup was incorrectly ignoring `disabled` attributes in form elements.

Also, point to the bug in the new GitHub repository, the one we pointed to is obsolete.

If you called the function with url_regex='/something' and not the link argument, it would set the url_regex to None, and thus the result would be impredictable. This is tested by the test added by MechanicalSoup#261

tiboroche · 2019-03-01T15:34:46Z

I messed up the commits squashing, so I opened a new PR #272 with a single "clean" commit. Therefore I close this one.

moy requested changes Jan 16, 2019

View reviewed changes

hemberger mentioned this pull request Jan 17, 2019

Improve test_link_arg_* tests #261

Merged

moy added the easy? Probably easy to implement, or WIP almost complete label Feb 14, 2019

moy approved these changes Feb 28, 2019

View reviewed changes

hemberger approved these changes Feb 28, 2019

View reviewed changes

hemberger and others added 11 commits March 1, 2019 16:02

browser: accept non-lowercase type="radio" and "checkbox"

354934a

Fixes MechanicalSoup#245.

form: case-insensitive search for radio and checkboxes

c0a3203

Accept type=submit, button, reset and file case-insensitively

2f3dc51

FAQ: Update on alternatives

144aee4

FAQ: Update on alternatives mechanize is back on track, RoboBrowser seems clearly abandoned.

Do not submit disabled form elements

f70d4e7

Closes MechanicalSoup#248. MechanicalSoup was incorrectly ignoring `disabled` attributes in form elements.

Mechanize {is -> was} incompatible with Python 3

6d6670a

Also, point to the bug in the new GitHub repository, the one we pointed to is obsolete.

debug _find_link_internal and remove xfail on related test

d748184

If you called the function with url_regex='/something' and not the link argument, it would set the url_regex to None, and thus the result would be impredictable. This is tested by the test added by MechanicalSoup#261

tiboroche force-pushed the debug-find-link branch from 0a3d593 to d748184 Compare March 1, 2019 15:29

tiboroche mentioned this pull request Mar 1, 2019

debug _find_link_internal and remove xfail on related test #272

Merged

tiboroche closed this Mar 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debug _find_link_internal #256

debug _find_link_internal #256

tiboroche commented Jan 16, 2019

moy left a comment

moy Jan 16, 2019

hemberger commented Jan 16, 2019

hemberger commented Jan 16, 2019

tiboroche commented Jan 17, 2019

hemberger commented Jan 17, 2019

hemberger commented Jan 18, 2019

tiboroche commented Feb 28, 2019

moy commented Feb 28, 2019

hemberger left a comment

tiboroche commented Mar 1, 2019

debug _find_link_internal #256

debug _find_link_internal #256

Conversation

tiboroche commented Jan 16, 2019

moy left a comment

Choose a reason for hiding this comment

moy Jan 16, 2019

Choose a reason for hiding this comment

hemberger commented Jan 16, 2019

hemberger commented Jan 16, 2019

tiboroche commented Jan 17, 2019

hemberger commented Jan 17, 2019

hemberger commented Jan 18, 2019

tiboroche commented Feb 28, 2019

moy commented Feb 28, 2019

hemberger left a comment

Choose a reason for hiding this comment

tiboroche commented Mar 1, 2019