Skip to content

Extended get files by query task #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jotelha
Copy link
Contributor

@jotelha jotelha commented Dec 11, 2019

A suggestion:

Turns the previously contributed "GetFilesByQueryTask" into something useful:

  • Allows for nested queries to be stored by using "->" instead of "." separator.
  • Different options to handle a priori uncertain number and names of queried files (see task's doc string)
  • Tests for these added features (also serving documentation purposes)

Other additions:

  • Use with-statement to enforce closing all file streams in other FilePad-Task tests, avoid warnings
  • Tiny helper function "arrow_to_dot" within "utilities/dict_mods.py" facilitates "->" to "." conversion.

Best regards,

Johannes

@computron
Copy link
Member

Thanks! Can you clarify the reason for switching from dot to arrow? You wrote "allows for nested queries to be stored" but I don't think I understood why a nested query can't be stored if there is a "." in it.

@computron computron added the stale Stale/abandoned PRs and issues label Jul 11, 2020
@jotelha
Copy link
Contributor Author

jotelha commented Jul 13, 2020

Queries are expected as nested dicts and not as plain strings, thus the -> aliasing for the . (dot) separator allows to store queries like this:

ft = GetFilesByQueryTask(
    query={
        'metadata->project':    project_id,
        'metadata->type':       'surfactant_file',
    },
    sort_key='metadata.datetime',
    sort_direction=pymongo.DESCENDING,
    limit=1,
    new_file_names=['default.pdb'])]

If I remember correctly, the MongoDB language does not allow for dots in keys. It's the same issue as in the dict_mods.py file at

However, note that "_set" does not support modification of nested dicts
using the mongo {"a.b":1} notation. This is because mongo does not allow
keys with "." to be inserted. Instead, nested dict modification is
supported using a special "->" keyword, e.g. {"a->b": 1}
.

@computron
Copy link
Member

Ah yes, I remember - MongoDB doesn't allow storing dictionaries where the keys have a dot in them. So storing the parameter:

{"query": {"key.subkey": "value"}}

can't be done - making it difficult to serialize the FireTask. The arrows should indeed make it possible to store the query and thereby serialize the Firetask. Merging this now along with the other improvements, thanks!

@computron computron merged commit e949f65 into materialsproject:master Jul 15, 2020
@computron computron removed the stale Stale/abandoned PRs and issues label Jul 15, 2020
@jotelha
Copy link
Contributor Author

jotelha commented Jul 18, 2020

Exactly. A related note: Similarly, I believe, it is not possible to store any query involving $-prefixed operators, i.e.

{'metadata.datetime': {'$gt': '2020'} }

thus it might be a good idea to store queries as plain strings instead. Are there any mongo-language-specific serialization recommendations for query documents?

@computron
Copy link
Member

I don't know of any mongo language specific serialization recommendations; it's possible that a simple string is best.

As an aside, it looks like as of MongoDB 3.6+, dots are allowed in key names. But dollar sign prefixes are still prohibited:

https://docs.mongodb.com/manual/reference/limits/#Restrictions-on-Field-Names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants