Fixes diff when fuzzy finder `anything` is used in a Hash object (proof of concept) #596

KarlHeitmann · 2024-04-25T04:42:54Z

Hi

I'd like to give you some context about this PR.

TL; DR

See #551 , it is annoying to see the noise added by anything matcher on diffs and I wanted to fix that. I initially thought the solution was trivial, but I quickly found out I was wrong when I began reading the Diff class: there is a lot happening under the hood. I know there are some limitations to the Differ, and I understand there is a lot to change. This PR is a proposal, I don't know if my solution is too hacky, and I'd like to receive some feedback in order to solve this problem.

Context

I am currently working on a RoR project that uses Jbuilder to render JSON data that I am delivering on the endpoints of my API.

I am testing my .jbuilder files with type: view RSpec tests, and I have a lot of .json fixture files I am loading and using on my test files as expected values.

The problem this PR addresses.

My plan is to use these fixture files to kill two birds with one stone. On one side, I am comparing those fixture files with the JSON rendered by the .jbuilder files, and on the other side I am using those fixture files as mocks to share with the frontend developers, so they will have the data necessary to write their React application.

The plan is: if somebody in the backend changes a jbuilder, the tests will complain and I will remember to change the corresponding JSON fixture. This way, I will always have the JSON fixtures used by the frontend developers in sync with the backend rendered files. Frontend developers often need to work with data they don't know how to create on their local machines (because they do know React or Vue, but don't know the Rails framework)

In order to achieve the results of my plan, I ~ needed ~ to scrub some fields of my JSON fixture files using the anything fuzzy matcher. An example of fields I often need to scrub are the id, created_at and updated_at values.

I quickly discovered this annoying issue described on #551 : whenever I missed something that should make the test fail and render the diff in STDOUT, I found out the key-value pairs in my expected var with a anything matcher will add noise to the diff message, obfuscating which key actually differs from the expectation.

My approach.

My first idea was: before the place in the source code that renders the diff string between actual and expected vars, I can add a stage where I'll search every key-value pair that has an anything object as value in the expected variable, and replace the anything value with the value found on the actual var I am testing.

Hopefully the commits of this PR will show you what I mean. First commit consists in an implementation that will work with a Hash with no nested hashes, second commit will use recursivity to extend the solution to a hash object with nested hashes.

My reasons.

When I started writing the code to implement my idea, I've got a lot of questions: Can I just mutate the expected var? Shouldn't I get a notice message by the diff remembering that I've used the anything fuzzy matcher in my test? can I just copy the actual value in the expected var? Am I lying when I am doing that?

My answer to these questions was: If I decide to use the fuzzy matcher anything on a key-value pair in a hash object in my test, it implies I do not care about seeing the anything in my diff . Because it is a wild card, because the anything fuzzy matcher should morph into the real value on my actual variable.

I know this solution is hacky.... but it works! It gets rid of the noise generated by the anything fuzzy matcher. I tried to use the super_diff gem, but it has the same problem.

I am sure the code I wrote can be improved and be more concise (as surely could be the text on this PR, lol), and I thought about extending the fix to cover Array objects. But I wanted to know your opinion about this issue before continuing with this. Can I use a recursive function this way? What impact on performance will have my lines of code in this extra stage I am adding? As Aristotle said: every scientific investigation should begin by gathering the opinion of the wise people (ie, the people that earlier on thought and worked on the problem I will begin to work right now).

I always liked the "diff" concept. And I am willing to continue collaborating on enhancing the diff tool. I've seen @mcmire wrote this guide about RSpec, but I have not digged on the article yet. I am doing baby steps. Every feedback is appreciated.

Best.

KarlHeitmann · 2024-05-07T02:27:24Z

I've submitted 3 new commits to get rid of the CyclomaticComplexity, PerceivedComplexity & MethodLength cops. Each commit fixes one offense. The commits can be squashed into one commit, but since there are maaany ways to solve these offenses, these are only proposals.

KarlHeitmann · 2024-05-07T02:40:36Z

Latest commit is to fix a problem when using ruby 2.4, 2.3, 2.2. It substitutes prepend method of Array class by unshift. Because prior ruby 2.4 it was not defined.

I can convert that last commit into a fixup to amend: e4ece3df8e411ea4dab6eb08e02269ccf9c3fb14

KarlHeitmann · 2024-05-08T02:08:59Z

Build is still failing before commit f9a9e9d.

I don't know how to reproduce these build environment. I don't know what ruby-head stands for neither I understand the error there. And I don't know how to reproduce the 1.8.7 neither REE (what does this mean?) ruby versions in my computer to dig in and understand why it is failing.

However, I copied and pasted the ruby strings in an irb session and printed them to STDOUT using puts:

And found out on version 1.8.7 and REE the keys were not in alphabetical order. I know rspec-support uses pretty print to make the diff of the objects, and my hypothesis is on ruby version 1.8.7, pretty print maybe was not working correctly with some ruby hashes, and its output may be somewhat random.

The only solution I thought of was to cheat and rearrange my tests so the keys are mimicking the output. Hopefully tests will pass now in version 1.8.7 and REE.

Any other ideas are welcome to make those ruby builds pass.

pirj · 2024-05-08T05:58:25Z

lib/rspec/support/differ.rb

+      def hash_with_anything?(arg)
+        return false unless Hash === arg
+
+        @keys_with_anything = recursive_get_keys(arg)


Is it safe? Can we happen to reuse an i stance of a differ?

That's a good question. On my opinion, I think nothing is ever safe :)

Let me see if I understand you. If we happen to reuse an instance of a differ the code will look like this snippet:

Suggested change

@keys_with_anything = recursive_get_keys(arg)

@keys_with_anything = keys(arg)

...

end

def keys(arg) # NOTE: method originally named "recursive_get_keys"

klass = RSpec::Mocks::ArgumentMatchers::AnyArgMatcher

...

else

if Hash === pair[1]

keys = new.keys(pair[1]) # <- NOTE: here I will instance another Differ

...

...

end

What I like about your idea, is by using a new instance of a Differ, we may benefit of mechanisms used by Differ class to rescue an error that may appear. We can leverage the power of a class in our favor, instead of recursively calling a single method.

The downside is currently the hash_with_anything? and recursive_get_keys methods are private. So I'll have to make them public. In addition, the Differ class may not be the appropriate class to get the keys of a Hash, maybe we should think about creating anothe file, eg: Rspec::Support::Utils that will have a method to retrieve keys recursively from a Hash, so we can maintain the single responsibility principle.

I googled a way to get all keys and nested keys of a Hash in Ruby and found this stack overflow thread. Most answers use recursive functions to solve the problem. There is no implementation in the ruby stdlib Hash class to get all keys (including nested keys). Maybe because the solution of this problem is not safe, so it is up to the programmer to do the implementation? I don't know.

On my opinion, this may be unsafe. But since the only condition to recursively call the recursive_get_keys method is the value of the hash is a Hash, then maybe this will narrow down the odds something will go wrong.

I think by reusing an instance of a differ is moving the recursion to another place. Is it better to move it to another place? I don't know. I will do what someone with more experience than me will advise me to do :)

If you think the change may be too risky, I can make the changes so we won't use recursion. After all, this project is used by 622K persons, I understand you need to be cautious.

pirj · 2024-05-08T06:00:29Z

lib/rspec/support/differ.rb

+      end
+
+      def recursive_get_keys(hash)
+        klass = RSpec::Mocks::ArgumentMatchers::AnyArgMatcher


What if this is not defined, like someone opted out of mocking completely, or uses rr/mocha etc?

I think it’s ok to use ‘return [] unless defined?(RSpec::Mocks)’ here

Nice! I didn't thought someone may opt out mocking completely. I've fixed it on my latest commit!

pirj · 2024-05-08T06:02:31Z

Please accept my apologies for not reviewing this swiftly.

May I ask you to add the output of some spec how it looked before and after this change?

Don’t worry about ruby-head, it is very typical for it to fail. We mostly use it a bit ahead of time to fix issues with upcoming Ruby releases.

Thanks for the effort you’re putting into this.

pirj · 2024-05-08T06:12:54Z

To me it’s high time to soft-deprecate Ruby 1.8.7.
If you feel that fixing it is not something you’d love to dive i to, please make sure no existing spec fail, and just make your new examples pending with a note “broken on old rubies for unknown reasons”.

KarlHeitmann · 2024-05-10T03:42:10Z

Hey @pirj ! Thanks for reaching out! No worries, it is just my hobby try to fix simple things in github. I have on my bucket list to make a tiny contribution to an important project (like this one). But I understand people may be busier than me 🤣

I made this gist with the output of these 3 specs BEFORE the changes here, and this gist with the output AFTER the changes. I hope this clarifies what I intend to do with my change.

My intention is whenever you want to diff a Hash with another Hash, if the expected hash contains an anything fuzzy matcher, that anything value will morph into the value of the actual variable, in order to reduce the noise generated by the diff if another key-value pair has a mismatch. As described here #551

The three examples I wrote on my gists shows you one example of two hash comparisons with NO nested hashes, and the other two shows you what happens if there is a nested hash. Related to your first comment, if you think it is too unsafe to perform this recursively, I can tweak my PR so this will work only with plain hashes. The first commit of this PR implements this behavior ONLY on hash comparisons without nested hashes.

KarlHeitmann · 2024-05-10T03:50:40Z

P.S.1: About support for ruby 1.8.7, I saw you've used if String.method_defined?(:encoding) ... else ... end block to define methods for ruby 1.8.7 and the other ruby versions dynamically. Maybe I can give it a try to fix the ruby build problem.

P.S.2: I noticed something bad while using the debugger... after I mutate the hash on the expected anything value here, when PC returns to the it scope, the expected variable has mutated! As you can see on the screenshot below:

Line 611 has mutated the expected[:an_key] var from anything to dummy. I didn't notice that previously, I think this needs to be addressed.

KarlHeitmann added 2 commits April 23, 2024 22:51

fixes diff when fuzzy finder anything is used

8c81146

fix works in hashes with nested hashes

e4ece3d

KarlHeitmann mentioned this pull request Apr 25, 2024

What should happen when you use an anything fuzzy matcher inside a Hash on your specs? mcmire/super_diff#241

Open

KarlHeitmann added 3 commits May 6, 2024 22:17

fix: gets rid of Metrics::CyclomaticComplexity in Differ#diff

4de7d8c

fix: gets rid of Metrics::PerceivedComplexity in Diff#diff

d50d742

fix: gets rid of Metrics/MethodLength in Diff#diff

41e3c21

fix: error build ruby 2.4,3,2: complain in prepend

15627b7

fix: rearrange test to pass legacy ruby build 1.8.7 & REE

f9a9e9d

pirj reviewed May 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes diff when fuzzy finder `anything` is used in a Hash object (proof of concept) #596

Fixes diff when fuzzy finder `anything` is used in a Hash object (proof of concept) #596

KarlHeitmann commented Apr 25, 2024

KarlHeitmann commented May 7, 2024

KarlHeitmann commented May 7, 2024

KarlHeitmann commented May 8, 2024

pirj May 8, 2024

KarlHeitmann May 10, 2024

pirj May 8, 2024

pirj May 8, 2024

KarlHeitmann May 10, 2024

pirj commented May 8, 2024

pirj commented May 8, 2024

KarlHeitmann commented May 10, 2024

KarlHeitmann commented May 10, 2024

-        @keys_with_anything = recursive_get_keys(arg)
+        @keys_with_anything = keys(arg)
+      ...
+      end
+      def keys(arg) # NOTE: method originally named "recursive_get_keys"
+        klass = RSpec::Mocks::ArgumentMatchers::AnyArgMatcher
+        ...
+          else
+            if Hash === pair[1]
+              keys = new.keys(pair[1]) # <- NOTE: here I will instance another Differ
+              ...
+        ...
+      end

Fixes diff when fuzzy finder anything is used in a Hash object (proof of concept) #596

Are you sure you want to change the base?

Fixes diff when fuzzy finder anything is used in a Hash object (proof of concept) #596

Conversation

KarlHeitmann commented Apr 25, 2024

TL; DR

Context

The problem this PR addresses.

My approach.

My reasons.

KarlHeitmann commented May 7, 2024

KarlHeitmann commented May 7, 2024

KarlHeitmann commented May 8, 2024

pirj May 8, 2024

Choose a reason for hiding this comment

KarlHeitmann May 10, 2024

Choose a reason for hiding this comment

pirj May 8, 2024

Choose a reason for hiding this comment

pirj May 8, 2024

Choose a reason for hiding this comment

KarlHeitmann May 10, 2024

Choose a reason for hiding this comment

pirj commented May 8, 2024

pirj commented May 8, 2024

KarlHeitmann commented May 10, 2024

KarlHeitmann commented May 10, 2024

Fixes diff when fuzzy finder `anything` is used in a Hash object (proof of concept) #596

Fixes diff when fuzzy finder `anything` is used in a Hash object (proof of concept) #596