New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid missing base item fields in item loaders #3047
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3047 +/- ##
==========================================
+ Coverage 84.51% 84.67% +0.15%
==========================================
Files 164 164
Lines 9270 9389 +119
Branches 1380 1404 +24
==========================================
+ Hits 7835 7950 +115
- Misses 1177 1181 +4
Partials 258 258
|
Argh, item loaders! As I understand the code, this change affects not only get_output_value, but output processors as well: if an output processor returns an empty list, after the change load_item will be returning a default value instead of this empty list. No idea how large is the issue. Example use case, rather theoretical: MapCompose output processor which drops some values from the result; when all results are dropped, after this change a default value is returned in .load_item() instead of an empty list. That said, the way lists play with ItemLoaders is weird anyways. For example: ld = ItemLoader({'colors': ['white', 'black']})
ld.replace_value('colors', ['red', 'yellow'])
ld.load_item()
# {'colors': ['red', 'yellow']}
ld = ItemLoader({'colors': ['white', 'black']})
ld.replace_value('colors', [])
ld.load_item()
# {'colors': ['white', 'black']}
ld = ItemLoader({'colors': ['white', 'black']})
ld.replace_value('colors', 'blue')
ld.load_item()
# {'colors': ['blue']} I can't find in ItemLoader docs that So I'm not against merging this PR, as it fixes a real-world issue @stummjr had, and there is undocumented item loader behavior anyways. But at the same time, this PR seems to add more undocumented behavior to ItemLoaders. |
What if >>> item = {'colors': ['white', 'black'], 'foo': 'bar'}
>>> ld = ItemLoader(item)
>>> ld.get_output_value('colors')
[]
>>> ld.load_item()
{'colors': ['white', 'black'], 'foo': 'bar'}
>>> ld.replace_value('colors', [])
>>> ld.get_output_value('colors')
[]
>>> ld.load_item()
{'colors': [], 'foo': 'bar'} |
@@ -113,7 +113,7 @@ def load_item(self): | |||
item = self.item | |||
for field_name in tuple(self._values): | |||
value = self.get_output_value(field_name) | |||
if value is not None: | |||
if value is not None and value != []: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine cases where someone expects a failed load to still populate an empty list, and this change might break things for them. Perhaps instead the code should check whether adding an empty list would clobber an existing field? Because, per your issue in #3046 I think that's the more bug-like behaviour?
Closing given #3046 has been fixed. |
This is an attempt to fix the behavior described in #3046.
Instead of just checking if the value inside the loader is not None in order to decide if a field from the initial item should be overwritten or not,
load_item()
should also make sure that the value returned byget_output_value()
is not an empty list.That is because
self._local_values
, which stores the new values included viaadd_*
orreplace_*
methods, is adefaultdict(list)
. Then, when we callget_output_value()
for a field only available in the initial item, an empty list will be set for that field inself._local_values
(because of this).This way, we make sure we don't miss fields from the initial item, in case
get_output_value()
gets called for one of the pre-populated fields beforeload_item()
, as described on #3046.