Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #3771: Add fallback css and xpath selectors #3772

Closed
wants to merge 3 commits into from

Conversation

ejulio
Copy link
Contributor

@ejulio ejulio commented May 10, 2019

Fixes #3771

So, scrapy already handled loader.add_css('my_field', ['selector1', 'selector2']).
Though, I added this behavior to the docs.

I added a new kwarg called stop_on_first_match to both add_css and add_xpath.
This way we can control if the field is a composite of various selectors or if it is a set of fallbacks.

I'm not sure about the name stop_on_first_match.
processor.Compose has a stop_on_none argument (https://docs.scrapy.org/en/latest/topics/loaders.html#scrapy.loader.processors.Compose).
Maybe we can use another that is similar, or even stop_on_none it is checking explicitly for None right now.

@@ -385,12 +385,26 @@ ItemLoader objects
:param xpath: the XPath to extract data from
:type xpath: str

:param stop_on_first_match: when ``xpath`` is a list of selectors, should it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether or not this parameter should be added, since I believe the same effect can be achieved already with TakeFirst.

Or are you adding it for performance reasons?

Copy link
Contributor Author

@ejulio ejulio May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truth be told, I forgot that we could get the same behavior with TakeFirst.
Not sure if performance is a problem at all.

My concern was related to semantics.

loader.add_css('field', ['sel1', 'sel2']) gives the idea of a composition, field is a composition of both 'sel1' and 'sel2'.

loader.add_css('field', ['sel1', 'sel2'], stop_on_first_match=True) would mean they are fallbacks because of different page layouts. If 'sel1' fails, then it try 'sel2'.

@ejulio ejulio closed this Jun 29, 2020
@ejulio ejulio deleted the feat-3771 branch June 29, 2020 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add fallback selectors to ItemLoader
2 participants