Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store.groupby enhancements #670

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

rkingsbury
Copy link
Collaborator

@rkingsbury rkingsbury commented May 27, 2022

This PR began as an effort to fix #622 although has evolved based on what I learned. I've discovered a couple of inconsistencies related to groupby behavior and kwargs that I think we should rectify (see below). @munrojm what do you think is the best way to proceed here?

behavior of properties

Apparently the default behavior in MongoDB is that when properties is None, query will return all fields in a document, but groupby will return nothing except the 'id' field. See this from the docs

The _id field is, by default, included in the output documents. To include any other fields from the input documents in the output documents, you must explicitly specify the inclusion in $project.

Personally I think within maggma it might make more sense to make the behavior consistent between query and groupby since not all maggma users will be intimately familiar with MongoDB conventions, but I'm not sure if we want to depart from what MongoDB does or not.

sort, skip, and limit kwargs

A related point - Store.groupby defines the sort, skip, and limit kwargs, but it is not clear whether those make sense in the context of a groupby operation. They don't appear to be arguments in pymongo / MongoDB aggregation and they don't appear to be functional anywhere within maggma. So I would advocate that we either make those kwargs functional (and consistent with their behavior in query), or we remove them from the interface.

Contributor Checklist

  • I have broken down my PR scope into the following TODO tasks
    • Clarify documentation of the properties kwarg in query and groupby
    • inherit groupby from MongoStore in MemoryStore
    • Move MemoryStore.groupby to MontyStore.groupby b/c montydb does not implement an aggregate method and therefore can't inherit from MongoStore.groupby
    • add tests of sort, skip, and limit kwargs in MongoStore.query
    • Flag places in code where groupby kwargs sort, skip, and limit are non-functional
  • I have run the tests locally and they passed.
  • I have added tests, or extended existing tests, to cover any new features or bugs fixed in this PR

@lgtm-com
Copy link

lgtm-com bot commented May 27, 2022

This pull request introduces 2 alerts when merging c30d05c into e44b5af - view on LGTM.com

new alerts:

  • 2 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented May 27, 2022

This pull request introduces 1 alert when merging fd14a6f into e44b5af - view on LGTM.com

new alerts:

  • 1 for Unused import

@rkingsbury
Copy link
Collaborator Author

Hmmm, it looks like the test failure occurs because montydb.collection does not implement its own aggregate method, which is called by MongoStore.groupby whereas previously this was using the MemoryStore.groupby code. Strangely, this test passes for me locally with montydb==2.3.12.

@rkingsbury
Copy link
Collaborator Author

rkingsbury commented May 27, 2022

Hmmm, it looks like the test failure occurs because montydb.collection does not implement its own aggregate method, which is called by MongoStore.groupby whereas previously this was using the MemoryStore.groupby code. Strangely, this test passes for me locally with montydb==2.3.12.

OK, no idea why the test passes locally, but montdyb definitely does not implement aggregate: https://github.com/davidlatwe/montydb/search?q=aggregate and davidlatwe/montydb#66

@codecov
Copy link

codecov bot commented May 27, 2022

Codecov Report

Attention: Patch coverage is 87.50000% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 81.30%. Comparing base (1070828) to head (501960d).

Files Patch % Lines
src/maggma/stores/advanced_stores.py 50.00% 1 Missing ⚠️
src/maggma/stores/mongolike.py 92.85% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #670      +/-   ##
==========================================
+ Coverage   81.27%   81.30%   +0.02%     
==========================================
  Files          46       46              
  Lines        3968     3968              
==========================================
+ Hits         3225     3226       +1     
+ Misses        743      742       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@munrojm
Copy link
Member

munrojm commented May 31, 2022

Hey @rkingsbury, thanks for this PR!

Personally I think within maggma it might make more sense to make the behavior consistent between query and groupby since not all maggma users will be intimately familiar with MongoDB conventions, but I'm not sure if we want to depart from what MongoDB does or not.

I think I agree with this. Having the behavior consistent between the two appears to be less confusing to me as well.

A related point - Store.groupby defines the sort, skip, and limit kwargs, but it is not clear whether those make sense in the context of a groupby operation. They don't appear to be arguments in pymongo / MongoDB aggregation and they don't appear to be functional anywhere within maggma. So I would advocate that we either make those kwargs functional (and consistent with their behavior in query), or we remove them from the interface.

I vote to make these functional here. I see your argument, but I feel as though including them will make using groupby possible in certain cases that might otherwise be difficult based on the total number of docs needing to be returned. Those query operations should be easy to add as the last set of stages in the aggregation pipeline ($sort -> $skip -> $limit). https://www.mongodb.com/docs/manual/reference/operator/aggregation-pipeline/.

@munrojm
Copy link
Member

munrojm commented May 31, 2022

Hmmm, it looks like the test failure occurs because montydb.collection does not implement its own aggregate method, which is called by MongoStore.groupby whereas previously this was using the MemoryStore.groupby code. Strangely, this test passes for me locally with montydb==2.3.12.

OK, no idea why the test passes locally, but montdyb definitely does not implement aggregate: https://github.com/davidlatwe/montydb/search?q=aggregate and davidlatwe/montydb#66

This is very strange, thanks for catching it.

@rkingsbury rkingsbury self-assigned this May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MongoLike groupby ignore sort, skip, limit kwargs; confusing properties behavior
2 participants