Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ItemLoader: improve handling of initial item #4036
ItemLoader: improve handling of initial item #4036
Changes from all commits
846e021
685484a
d53d14b
39e77af
91e2453
12dc720
5c37377
7c2ec90
e9671b3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is old commit, but I'm now in the process of understanding this whole issue, and chain of events started with this #3819 and this is still a breaking change for many projects, people were certainly getting different data in many cases after all these changes. See below code as example, I have several spiders relying on old behavior and now looking for workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using a “TakeLast” output processor (e.g.
lambda v: v[-1]
) work for your scenario?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have TakeFirst processor defined for this specific field where I spotted this, and different spiders are doing different things, some of them are fine after this change, but some are not. Changing output processors will change several things. I wonder if we could add some option for loader to keep old behavior? It seems dirty, but would make things easier. I could also refactor spiders, but there is many of them and even finding which one is affected is not that easy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of change do you have in mind? Something like a parameter to
ItemLoader
that sets the old behavior? Our would it work for you if we simply extended theItemLoader
API with something likeset_value
to replace any value added so far?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about parameter or attribute of Loader. I could set this when defining my subclass of ItemLoader. But I see potential problem from library point of view - it can be hard to maintain in the future, there will be 2 different logics, not sure how many people will even think about this attribute or parameter. If I will be only project using it - it is not much sense, I can just subclass ItemLoader my side and change some things. I could also stick to old ItemLoader library but there is no easy way, because itemloaders were moved to separate repo, so I cannot simply use some old version of itemLoader with Scrapy 2.4.1. First release of itemloaders is after this change and scrapy 2.4.1 imports itemloaders.
There is already replace_value, which is doing what I would need here. I guess the problem is because in the past in many cases loader.add_value was actually replacing value when people intended to add_value, now code relies on this and this creates mess when situation is cleared up.