-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: SimpleMongoReader should allow optional fields in metadata #13575
Conversation
Not all metadata fields should be mandatorily present in all Mongo DB documents. If some fields are missing, take None as their value instead of crashing.
@@ -99,7 +99,7 @@ def lazy_load_data( | |||
yield Document(text=text) | |||
else: | |||
try: | |||
metadata = {name: item[name] for name in metadata_names} | |||
metadata = {name: item.get(name) for name in metadata_names} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the default a string still? Like "none" or "na" ?
(Also, let's bump the version of the mongodb integration, in its toml file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the default a string still? Like "none" or "an" ?
The default is None
(not string but Python's equivalent of null
).
I would prefer to keep it None, so the caller can deal with the None
values accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, let's bump the version of the mongodb integration, in its toml file)
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some vector stores will complain and raise an error if metadata is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some vector stores will complain and raise an error if metadata is None
I am just pushing the errors downstream.
For example, if you have an optional field in Mongo DB, you should still be able to read it for all the values where it exists.
And then decide on what to do with it when processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but I'm suggesting that we should be setting a metadata field to None
on purpose, as more than a few vector dbs will complain about that
An empty string like item.get(name, "")
would be better even
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty string like item.get(name, "") would be better even
I would disagree here.
The field can be holding a boolean, date, or some other type of value.
Treating the default value as empty string for an optional boolean seems like a bad idea to me.
None
is much superior.
I know, but I'm suggesting that we should be setting a metadata field to None on purpose, as more than a few vector dbs will complain about that
We are not setting None
on purpose, it is None
in the database.
The caller will get the documents and will fix the None
fields before ingesting documents into a vector database.
Alternatively, how about removing the field entirely from metadata if the optional field is missing in a Mongo DB document? This will complicate my PR as well as the handling code though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care enough about this 😆 Merging away
Description
Not all metadata fields should be mandatorily present in all Mongo DB documents. If some fields are missing, take None as their value instead of crashing.
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Suggested Checklist:
make format; make lint
to appease the lint gods