Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug _find_link_internal #256

Closed
wants to merge 11 commits into from

Conversation

tiboroche
Copy link

If you called the function with url_regex='/something' and not the
link argument, it would set the url_regex to None, and thus the result
would be impredictable.

Copy link
Collaborator

@moy moy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The travis build fails (test failure), so there seems to be an issue somewhere, this obviously needs fixing before merging.

Also, it would be nice to have a test demonstrating the failure to illustrate the issue that is fixed by the commit.

'url_regex because url_regex is already '
'present in keyword arguments')
else:
kwargs['url_regex'] = link
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep the if ... and ... as is and turn else: into else if link: ... for readability.

@hemberger
Copy link
Contributor

Hi, and thanks for the pull request!

So if I understand this correctly, the case you want to fix is this:

br.find_link(None, url_regex="/something")

However, according to the docstring of find_link():

If link doesn't have a href-attribute or is None, treat link as a url_regex

It looks like the intention is to not specify a url_regex as one of the kwargs of find_link(), but to instead overload the link argument for that purpose (as a convenience?).

I think your patch makes sense on its own, but I'm not 100% sure what the intended behavior of find_link() is. I see two possible ways to go forward:

  1. Change the docstring to be consistent with your new behavior
  2. Keep the current behavior, but add better argument validation to avoid unexpected results

@hemberger
Copy link
Contributor

By the way, the test failure is a separate issue. I just updated all my pip packages locally and I'm now seeing the same issue without this patch. (Looks like it's a list ordering issue...)

@tiboroche
Copy link
Author

Hi, and thanks for the pull request!

So if I understand this correctly, the case you want to fix is this:

br.find_link(None, url_regex="/something")

Not exactly. In my code I had a line like this :

 browser.download_link(url_regex='/admin/https/csr', file='/vagrant/local.csr')

which would call

 self._find_link_internal(None, url_regex='/admin/https/csr')

in this case, the code would set up the url_regex keyword to None (the value of link), thus downloading a "random" link from the page instead of the one I wanted.

I think that the two following calls should have the same result, which seems coherent to the documentation in my opinion.

 self.follow_link(url_regex='/admin/https/csr')
 self._find_link_internal(None, url_regex='/admin/https/csr')

 self.follow_link('/admin/https/csr')
 self._find_link_internal('/admin/https/csr')

And this is what is fixed with my patch.
I can make the change with the elif and add a test if you agree with my fix.

hemberger added a commit to hemberger/MechanicalSoup that referenced this pull request Jan 17, 2019
The `test_link_arg_regex` test was accidentally passing. By adding
another link to the test html, we can see that it was selecting the
first link instead of the link that matched the supplied regex.
This new version of the test fails, but it will be fixed by MechanicalSoup#256.

Also added a case for no arguments and reduced code duplication
by changing it to a parameterized test.
@hemberger
Copy link
Contributor

I agree with the change (but to clarify, it would grab the first link, not a random one).

This should have been caught by test_link_arg_regex, but the test happened to be passing accidentally (see #261).

Please go ahead and make any requested changes, and then I think this is ready to be merged. It would be worth checking that the code passes the improved tests in #261, but I wouldn't worry about that yourself.

hemberger added a commit to hemberger/MechanicalSoup that referenced this pull request Jan 17, 2019
The `test_link_arg_regex` test was accidentally passing. By adding
another link to the test html, we can see that it was selecting the
first link instead of the link that matched the supplied regex.
This new version of the test fails, but it will be fixed by MechanicalSoup#256.

Also added a case for no arguments and reduced code duplication
by changing it to a parameterized test.
@hemberger
Copy link
Contributor

@tiboroche #261 has been resolved, so please rebase your branch against master when updating and remove the pytest.mark.xfail mark from test_follow_link_arg in tests/test_stateful_browser.py (since your patch should now fix that test!).

Thanks!

@moy moy added the easy? Probably easy to implement, or WIP almost complete label Feb 14, 2019
@tiboroche
Copy link
Author

I made the requested changes.

@moy
Copy link
Collaborator

moy commented Feb 28, 2019

Thanks!

Can you rebase on top of master and squash all commits into one? Ideally, the commit message can contain a mention of the commit introducing the xfail and explain why it's fixed.

If you don't have time, I can do it while merging, no problem.

Copy link
Contributor

@hemberger hemberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the PR!

hemberger and others added 11 commits March 1, 2019 16:02
CSS selectors in bs4 now return elements in page order, whereas
they did not previously.

This requires us to re-order some of our expected test output,
and to perform an order-independent comparison if tested with
a bs4 version before 4.7.0.

Tested and passing with bs4 4.6.0 and 4.7.1.

Closes MechanicalSoup#257.
The `message` parameter to `pytest.raises` was deprecated in pytest
version 4.1.

> PytestDeprecationWarning: The 'message' parameter is deprecated.
>  (did you mean to use `match='some regex'` to check the exception message?)

Yes! We did mean to use `match`, and now we do! Thanks pytest.
FAQ: Update on alternatives

mechanize is back on track, RoboBrowser seems clearly abandoned.
The tag types were being checked in two ways:

1. tag.get("type", "").lower() == X

2. tag.get("type") is not None and tag.get("type").lower() == X

Since these should be identical, change all to the first option,
which is simpler and faster.

This is a follow-up to MechanicalSoup#246.
The `test_link_arg_regex` test was accidentally passing. By adding
another link to the test html, we can see that it was selecting the
first link instead of the link that matched the supplied regex.
This new version of the test fails, but it will be fixed by MechanicalSoup#256.

Also added a case for no arguments and reduced code duplication
by changing it to a parameterized test.
Closes MechanicalSoup#248.

MechanicalSoup was incorrectly ignoring `disabled` attributes
in form elements.
Also, point to the bug in the new GitHub repository, the one we
pointed to is obsolete.
If you called the function with url_regex='/something' and not the
link argument, it would set the url_regex to None, and thus the result
would be impredictable.

This is tested by the test added by MechanicalSoup#261
@tiboroche
Copy link
Author

I messed up the commits squashing, so I opened a new PR #272 with a single "clean" commit. Therefore I close this one.

@tiboroche tiboroche closed this Mar 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy? Probably easy to implement, or WIP almost complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants