Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashie::Extensions::DeepGrep #528

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Hashie::Extensions::DeepGrep #528

wants to merge 12 commits into from

Conversation

vmarutha
Copy link

@vmarutha vmarutha commented Jul 23, 2020

Grep for hash.

 require 'hashie'

 options = {
   users: [
     { location: { address: '123 Street', name: 'vmarutha' } },
     { location: { address: '234 Street', name: 'dblock' } }
   ],
   other_users: ['BobbyMcWho', 'michaelherold']
 }
 options.extend(Hashie::Extensions::DeepGrep)

Currently, if a key/value matches the pattern, the result will include the parent hash. This is because we are preserving the match context, and I think it would be useful to retain.

 options.deep_grep(/Street/) # => [{:address=>"123 Street", :name=>"vmarutha"}, {:address=>"234 Street", :name=>"dblock"}]

If a value matches the pattern and is in a list, the associated key is not returned. Thinking on how to make this a more uniform results list irregardless of matching data type. This may be a bit tricky though especially with enumerables as the key or value in the hash. Let me know if you have suggestions for this case!

 options.deep_grep(/Bob/) # => [["BobbyMcWho", "michaelherold"]], note no key :other_users returned

Open to merge as is, and comeback to above edge case later on since the main functionality still works.

Edit: Thanks for all suggestions, finally opened this one up :p

@vmarutha vmarutha closed this Jul 23, 2020
@vmarutha vmarutha changed the title wip grep and test Accidental draft PR Jul 23, 2020
@vmarutha vmarutha changed the title Accidental draft PR Hashie::Extensions::Grep Jul 25, 2020
@vmarutha vmarutha reopened this Jul 25, 2020
@dblock
Copy link
Member

dblock commented Jul 25, 2020

I like it. Would vote to merge if you can finish it with docs & al.

Copy link
Collaborator

@BobbyMcWho BobbyMcWho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions

CHANGELOG.md Outdated Show resolved Hide resolved
lib/hashie/extensions/grep.rb Outdated Show resolved Hide resolved
lib/hashie/extensions/grep.rb Outdated Show resolved Hide resolved
Thanks Bobby :)

Co-authored-by: Bobby McDonald <BobbyMcWho@users.noreply.github.com>
@michaelherold
Copy link
Member

Because there is already an Enumerable#grep, I wonder if this would be better as DeepGrep? Since that's what it's adding.

@vmarutha vmarutha changed the title Hashie::Extensions::Grep Hashie::Extensions::DeepGrep Aug 9, 2020
@vmarutha vmarutha marked this pull request as ready for review August 9, 2020 17:06
@dblock
Copy link
Member

dblock commented Aug 10, 2020

LGTM, leaving final call(s) to @michaelherold, needs a README update, please

@michaelherold
Copy link
Member

Hi @vmarutha, thank you for your contribution!

I've started to dive into this and I have a few things that I've found.

  1. Could you please add an implementation of DeepGrep.deep_grep that you can use without mixing in the module? Our extensions follow that pattern so they are useful even if you do not wish to mix things in. You can see DeepLocate for an example of how this could work.
  2. The current implementation seems too greedy. Consider the following Hash:
books = {
  "Uncategorized" => [
    { title: "Ruby for beginners", pages: 120 },
    { title: "CSS for intermediates", pages: 80 },
  ],
  "Collection of Elixir books" => [
    { title: "Programming Elixir 1.9", pages: 689 }
  ],
  "Collection of Ruby books" => [
    { title: "That other programming language by Matz", pages: 578 }
  ]
}

Given the way you wrote up the intent of the method, I would expect a #deep_grep /ruby/i to return the following:

[
  { title: "Ruby for beginners", pages: 120 },
  {
    "Collection of Ruby books" => [
      { title: "That other programming language by Matz", pages: 578 }
    ]
  }
]

I expect the first Hash in the result because "Ruby" is in the title of the book. I expect the second Hash because the key "Collection of Ruby Books" has the string "Ruby" in it, which would select that whole subtree of the parent Hash.

Instead, this example returns [books], which hardly seems helpful.

  1. There are a few nits that I have that I will be happy to address later.

Does my understanding of the intent of the method mesh with your intention in (2)? Or am I off-base in my understanding?

If it would be helpful, I can attach a commit with some specs.

@vmarutha
Copy link
Author

vmarutha commented Aug 15, 2020

Hi @michaelherold,

Thanks for the feedback :)

So I tried that exact example, and I got two items as the results, one of which was books and the other was {:title=>"Ruby for beginners", :pages=>120} as expected.

I agree that returning books is not very useful, and this seems to be because I am returning the container object when there is a match on either key or value. I think this can be attributed to using DeepLocate.

I noticed this about the implementation, and I guess it is a question on whether we would like to have the container object returned. It may be inelegant if the match is near the parent node of the hash (ex books)

But I think it would be effective for deeper nested hashes.

ex { title: "Ruby for beginners", pages: 120 } Notice how pages is returned here, which is apart of the container object.

What do you think about this?

Following your suggestion about not using mixins, I'm thinking of having DeepGrep very similar to DeepLocate, and to pretty much change this main line https://github.com/hashie/hashie/blob/master/lib/hashie/extensions/deep_locate.rb#L83
I was using DeepFind as inspiration initially, but I'm realizing I would need to have finer control of the returned object than DeepLocate has to offer.

Also maybe it is best to expose another method that is a more stricter grep, one without returning the complete container object to try to address your main concern. I'd have to play around with this first.

As always, open to suggestions and feedback here.

@michaelherold
Copy link
Member

Hi @vmarutha,

Sorry, it looks like you might have been waiting for feedback, which was my mistake. I'll answer your questions from your last message.

So I tried that exact example, and I got two items as the results, one of which was books and the other was {:title=>"Ruby for beginners", :pages=>120} as expected.

Hmm, I must have botched something in my setup then!

I think this can be attributed to using DeepLocate.

Yes, I think that's likely. I wonder if we can have an exit handler that prevents returning the original argument?

What do you think about this?

I think you're right. Ideally, we would prevent returning the outermost object but return any contained hash that matches. What do you think about that?

I'm thinking of having DeepGrep very similar to DeepLocate

Yes, that's exactly what I had in mind. 😄

Also maybe it is best to expose another method that is a more stricter grep, one without returning the complete container object to try to address your main concern. I'd have to play around with this first.

That could be a good alternative to my suggestion above! I'd love to see how it feels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants