Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow None values in Itemloaders/Items #40

Open
nikchha opened this issue Nov 11, 2020 · 5 comments
Open

Allow None values in Itemloaders/Items #40

nikchha opened this issue Nov 11, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@nikchha
Copy link

nikchha commented Nov 11, 2020

Summary

I would like to pass None values to the Itemloader() and store them in an Item(). Right now, None values are discarded and therefore working with Item() does not work properly.

Motivation

Sometimes values are not available on every parsed page and when the Selector returns None, the database pipeline (Postgres) results in an KeyError: 'fieldname'.

I solved this problem by filling in a null String which is later changed to None but this seems like a hacky solution.

@Gallaecio Gallaecio transferred this issue from scrapy/scrapy Nov 11, 2020
@nyov
Copy link
Contributor

nyov commented Dec 14, 2020

Hey, this has been a discussion in the past, as I recall. See scrapy/scrapy#556
Ultimately the decision was for None values to not be kept by itemloader.
But you can restore that possibility by using a custom loader like this:

https://github.com/nyov/scrapyext/blob/2dd5e0fc03f8e4b8793b808744d4dd6452e5d5b3/scrapyext/loader.py#L19-L27

Beware, this is old code I have yet to update. All you'll really want is just to remove the following line in the current codebase:

if value is not None:

Or we could try to overturn the old decision, now that some water has passed under the bridge (evil laugh).

@ejulio
Copy link
Collaborator

ejulio commented Dec 16, 2020

Indeed, I'm in favor of having a flag or specialized ItemLoader for this behavior.
I think it's weird to loader.add_value('field', None) and not have the field in the output.
Even though None is the absence of a value, it is still a value itself

@nyov
Copy link
Contributor

nyov commented Dec 18, 2020

I don't even know why that wasn't a consideration then.
But that's exactly what we should add, I think. A documented NoneValueItemLoader subclass or a flag ItemLoader(item, nonevalues=True), either, should both work just fine?

@arkadybag
Copy link

Any updates according to it?

@Gallaecio Gallaecio added the enhancement New feature or request label Jan 26, 2021
@AmericanY
Copy link

I'm struggling with the same ! any updates???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants