Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diff::LCS::Hunk#diff(:unified) output changed between 1.3 and 1.4.x, part 2 #65

Closed
phiggins opened this issue Jun 29, 2020 · 5 comments · Fixed by #66
Closed

Diff::LCS::Hunk#diff(:unified) output changed between 1.3 and 1.4.x, part 2 #65

phiggins opened this issue Jun 29, 2020 · 5 comments · Fixed by #66
Assignees

Comments

@phiggins
Copy link

Hi,

Similar to the case I mentioned in #60, there were failures in another Chef project, chef-cli, upon upgrading from diff-lcs 1.3 to 1.4.x. See here for the actual failures: https://buildkite.com/chef-oss/chef-chef-cli-master-verify/builds/227#aa286955-6862-4b7c-9d0b-c347c895a150

I made another test script that demonstrates the differences:

require 'diff/lcs'
require 'diff/lcs/hunk'

puts "::Diff::LCS::VERSION == #{::Diff::LCS::VERSION}"
puts

def diff_lines(old_lines, new_lines)
  file_length_difference = 0
  previous_hunk = nil

  Diff::LCS.diff(old_lines, new_lines).each do |piece|
    hunk = Diff::LCS::Hunk.new(old_lines, new_lines, piece, 3, file_length_difference)
    file_length_difference = hunk.file_length_difference
    maybe_contiguous_hunks = (previous_hunk.nil? || hunk.merge(previous_hunk))

    puts "#{previous_hunk.diff(:unified)}\n" unless maybe_contiguous_hunks

    previous_hunk = hunk
  end
  puts "#{previous_hunk.diff(:unified)}\n" unless previous_hunk.nil?
end

diff_lines(
  ["cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b"],
  ["304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5"],
)

puts "=" * 20
puts "=" * 20
puts "=" * 20

diff_lines(
  ["recipe[a::default]", "recipe[b::default]", "recipe[c::default]", "recipe[d::default]", "recipe[e::default]", "recipe[f::default]", "recipe[g::default]", "recipe[h::default]", "recipe[i::default]", "recipe[j::default]", "recipe[k::default]", "recipe[l::default]", "recipe[m::default]", "recipe[n::default]"],
  ["recipe[a::default]", "recipe[c::default]", "recipe[d::default]", "recipe[e::default]", "recipe[f::default]", "recipe[g::default]", "recipe[h::default]", "recipe[i::default]", "recipe[j::default]", "recipe[k::default]", "recipe[l::default]", "recipe[m::default]", "recipe[n::default]", "recipe[o::new]", "recipe[p::new]", "recipe[q::new]", "recipe[r::new]"],
)

The output on diff-lcs 1.3:

::Diff::LCS::VERSION == 1.3

@@ -1,2 +1,2 @@
-cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
+304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5
====================
====================
====================
@@ -1,5 +1,4 @@
 recipe[a::default]
-recipe[b::default]
 recipe[c::default]
 recipe[d::default]
 recipe[e::default]
@@ -12,4 +11,8 @@
 recipe[l::default]
 recipe[m::default]
 recipe[n::default]
+recipe[o::new]
+recipe[p::new]
+recipe[q::new]
+recipe[r::new]

The output on diff-cls 1.4.x:

::Diff::LCS::VERSION == 1.4.3

@@ -1 +1 @@
-cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
+304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5
====================
====================
====================
@@ -1,5 +1,6 @@
 recipe[a::default]
-recipe[b::default]
 recipe[c::default]
 recipe[d::default]
 recipe[e::default]
@@ -10,6 +11,10 @@
 recipe[j::default]
 recipe[k::default]
 recipe[l::default]
+recipe[o::new]
+recipe[p::new]
+recipe[q::new]
+recipe[r::new]
 recipe[m::default]
 recipe[n::default]

BTW, thanks so much for your quick response on the last issue. ❤️

@halostatue
Copy link
Owner

This is going to take a little longer to fix, but you are correct. The problem is that the ::new recipes are added to the end of the file, but diff-lcs is inserting them before m.

The changes in numbers (especially for single-line changes as in the first case) are intentional fixes.

halostatue added a commit that referenced this issue Jun 30, 2020
halostatue added a commit that referenced this issue Jul 1, 2020
Resolve #65

- Also add even more tests for checking `ldiff` results against `diff` results.
- Fix issues with diff/ldiff output highlighted by the above tests.
- Add a parameter to indicate that the hunk being processed is the _last_ hunk;
  this results in correct counting of the hunk size.
- The misplaced chunks were happening because of an improper `.abs` on
  `#diff_size`, when the `.abs` needed to be on the finding of the maximum diff
  size.
@halostatue halostatue self-assigned this Jul 1, 2020
@halostatue
Copy link
Owner

Can you test the code #66 to see if it resolves your issue?

BTW, the test script you provided will still output incorrect values for the last chunk, compared to diff, unless you make the following change with what’s in that PR.

-  puts "#{previous_hunk.diff(:unified)}\n" unless previous_hunk.nil?
+  puts "#{previous_hunk.diff(:unified, true)}\n" unless previous_hunk.nil?

@phiggins
Copy link
Author

phiggins commented Jul 1, 2020

The tests for chef-cli still have diff related failures, but it looks like they all have to do with the line numbers reported:

Run options:
  include {:focus=>true}
  exclude {:volatile=>true}

All examples were filtered out; ignoring {:focus=>true}

ChefCLI::Policyfile::Differ
  has a UI object
  has the old lock data
  has the old lock `name'
  has the new lock data
  has the new lock `name'
  when old and new lock data are the same
    has no updates
    has no updated sections
    reports that there are no updates
  when the run list is updated
    has updates
    has an updated revision_id and run_list
    reports the updated rev_id and run_list (FAILED - 1)
    when the run_list has non-contiguous changes
      prints the correct changes with context for the run list (FAILED - 2)
  with a removed cookbook
    has updates
    has an updated revision_id and cookbook_locks
    has removed the 'apt' cookbook
    reports the updated revision_id and removed cookbooks (FAILED - 3)
  with an added cookbook
    has updates
    has an updated revision_id and cookbook_locks
    has added the bluepill cookbook
    reports the updated revision_id and added cookbook (FAILED - 4)
  with a modified cookbook
    has updates
    has an updated revision_id and cookbook_locks
    has modified the 'policyfile_demo' cookbook
    reports the updated revision_id and modified policyfile_demo cookbook (FAILED - 5)
  with updated default attributes
    has updates
    has an updated revision_id and default_attributes
    reports the updated revision_id and modified attributes (FAILED - 6)
  with updated override_attributes
    has updates
    has an updated revision_id and override_attributes
    reports the updated revision_id and override_attributes (FAILED - 7)

Failures:

  1) ChefCLI::Policyfile::Differ when the run list is updated reports the updated rev_id and run_list
     Failure/Error: expect(output).to eq(expected_message)

       expected: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...n recipe[policyfile_demo::default]\n+recipe[one::one]\n+recipe[two::two]\n+recipe[three::three]\n\n"
            got: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...n recipe[policyfile_demo::default]\n+recipe[one::one]\n+recipe[two::two]\n+recipe[three::three]\n\n"

       (compared using ==)

       Diff:
       @@ -3,7 +3,7 @@
        REVISION ID CHANGED
        ===================
        
       -@@ -1,2 +1,2 @@
       +@@ -1 +1 @@
        -cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
        +304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5
        
     # ./spec/unit/policyfile/differ_spec.rb:339:in `block (3 levels) in <top (required)>'

  2) ChefCLI::Policyfile::Differ when the run list is updated when the run_list has non-contiguous changes prints the correct changes with context for the run list
     Failure/Error: expect(output).to eq(expected_message)

       expected: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...fault]\n recipe[n::default]\n+recipe[o::new]\n+recipe[p::new]\n+recipe[q::new]\n+recipe[r::new]\n\n"
            got: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...fault]\n recipe[n::default]\n+recipe[o::new]\n+recipe[p::new]\n+recipe[q::new]\n+recipe[r::new]\n\n"

       (compared using ==)

       Diff:
       @@ -3,7 +3,7 @@
        REVISION ID CHANGED
        ===================
        
       -@@ -1,2 +1,2 @@
       +@@ -1 +1 @@
        -cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
        +304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5
        
     # ./spec/unit/policyfile/differ_spec.rb:396:in `block (4 levels) in <top (required)>'

  3) ChefCLI::Policyfile::Differ with a removed cookbook reports the updated revision_id and removed cookbooks
     Failure/Error: expect(output).to eq(expected_message)

       expected: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========....chef.io/api/v1/cookbooks/apt/versions/2.7.0/download\",\n-    \"version\": \"2.7.0\"\n-  }\n-}\n\n"
            got: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========....chef.io/api/v1/cookbooks/apt/versions/2.7.0/download\",\n-    \"version\": \"2.7.0\"\n-  }\n-}\n\n"

       (compared using ==)

       Diff:
       @@ -3,7 +3,7 @@
        REVISION ID CHANGED
        ===================
        
       -@@ -1,2 +1,2 @@
       +@@ -1 +1 @@
        -cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
        +304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5
        
     # ./spec/unit/policyfile/differ_spec.rb:456:in `block (3 levels) in <top (required)>'

  4) ChefCLI::Policyfile::Differ with an added cookbook reports the updated revision_id and added cookbook
     Failure/Error: expect(output).to eq(expected_message)

       expected: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========....io/api/v1/cookbooks/bluepill/versions/2.3.2/download\",\n+    \"version\": \"2.3.2\"\n+  }\n+}\n\n"
            got: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========....io/api/v1/cookbooks/bluepill/versions/2.3.2/download\",\n+    \"version\": \"2.3.2\"\n+  }\n+}\n\n"

       (compared using ==)

       Diff:
       @@ -3,7 +3,7 @@
        REVISION ID CHANGED
        ===================
        
       -@@ -1,2 +1,2 @@
       +@@ -1 +1 @@
        -cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
        +304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5
        
       @@ -13,7 +13,7 @@
        bluepill
        --------
        
       -@@ -1 +1,12 @@
       +@@ -1,11 +1,22 @@
        +{
        +  "version": "2.3.2",
        +  "identifier": "9c6990944d9a347dec8bd375e707ba0aecdc17cd",
     # ./spec/unit/policyfile/differ_spec.rb:529:in `block (3 levels) in <top (required)>'

  5) ChefCLI::Policyfile::Differ with a modified cookbook reports the updated revision_id and modified policyfile_demo cookbook
     Failure/Error: expect(output).to eq(expected_message)

       expected: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...\",\n   \"source\": \"cookbooks/policyfile_demo\",\n   \"cache_key\": null,\n   \"scm_info\": {\n\n"
            got: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...\",\n   \"source\": \"cookbooks/policyfile_demo\",\n   \"cache_key\": null,\n   \"scm_info\": {\n\n"

       (compared using ==)

       Diff:
       @@ -3,7 +3,7 @@
        REVISION ID CHANGED
        ===================
        
       -@@ -1,2 +1,2 @@
       +@@ -1 +1 @@
        -cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
        +304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5
        
     # ./spec/unit/policyfile/differ_spec.rb:589:in `block (3 levels) in <top (required)>'

  6) ChefCLI::Policyfile::Differ with updated default attributes reports the updated revision_id and modified attributes
     Failure/Error: expect(output).to eq(expected_output)

       expected: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...butes, f*** yeah\"\n+  \"greeting\": \"Attributes, f*** yeah\",\n+  \"new_attr\": \"hello\"\n }\n\n"
            got: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...butes, f*** yeah\"\n+  \"greeting\": \"Attributes, f*** yeah\",\n+  \"new_attr\": \"hello\"\n }\n\n"

       (compared using ==)

       Diff:
       @@ -3,7 +3,7 @@
        REVISION ID CHANGED
        ===================
        
       -@@ -1,2 +1,2 @@
       +@@ -1 +1 @@
        -cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
        +304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5
        
     # ./spec/unit/policyfile/differ_spec.rb:635:in `block (3 levels) in <top (required)>'

  7) ChefCLI::Policyfile::Differ with updated override_attributes reports the updated revision_id and override_attributes
     Failure/Error: expect(output).to eq(expected_output)

       expected: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...ng\": \"use -a\"\n+  \"attr_only_updating\": \"use -a\",\n+  \"new_attr\": \"ALL THE DIFF\"\n }\n\n"
            got: "Policy lock 'jenkins' differs between 'git: HEAD' and 'local disk':\n\nREVISION ID CHANGED\n========...ng\": \"use -a\"\n+  \"attr_only_updating\": \"use -a\",\n+  \"new_attr\": \"ALL THE DIFF\"\n }\n\n"

       (compared using ==)

       Diff:
       @@ -3,7 +3,7 @@
        REVISION ID CHANGED
        ===================
        
       -@@ -1,2 +1,2 @@
       +@@ -1 +1 @@
        -cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
        +304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5
        
     # ./spec/unit/policyfile/differ_spec.rb:682:in `block (3 levels) in <top (required)>'

Finished in 0.02819 seconds (files took 0.39759 seconds to load)
30 examples, 7 failures

Failed examples:

rspec ./spec/unit/policyfile/differ_spec.rb:314 # ChefCLI::Policyfile::Differ when the run list is updated reports the updated rev_id and run_list
rspec ./spec/unit/policyfile/differ_spec.rb:365 # ChefCLI::Policyfile::Differ when the run list is updated when the run_list has non-contiguous changes prints the correct changes with context for the run list
rspec ./spec/unit/policyfile/differ_spec.rb:424 # ChefCLI::Policyfile::Differ with a removed cookbook reports the updated revision_id and removed cookbooks
rspec ./spec/unit/policyfile/differ_spec.rb:497 # ChefCLI::Policyfile::Differ with an added cookbook reports the updated revision_id and added cookbook
rspec ./spec/unit/policyfile/differ_spec.rb:559 # ChefCLI::Policyfile::Differ with a modified cookbook reports the updated revision_id and modified policyfile_demo cookbook
rspec ./spec/unit/policyfile/differ_spec.rb:612 # ChefCLI::Policyfile::Differ with updated default attributes reports the updated revision_id and modified attributes
rspec ./spec/unit/policyfile/differ_spec.rb:658 # ChefCLI::Policyfile::Differ with updated override_attributes reports the updated revision_id and override_attributes

@halostatue
Copy link
Owner

TL;DR: the line number output changes are intentional and the output of v1.3 and earlier is incorrect. v1.4, with this upcoming patch version, is finally fully compatible with the output of diff for multiple modes.

@JonRowe did some testing for RSpec to work with both v1.3 and v1.4; the line number changes here are intentional because they’ve been wrong forever. In some ways, 1.4.4 is going to make the line number changes worse because I have a better handle on the special cases that need to be handled to produce output compatible with diff in its various modes.

This change, in particular, is an example of what I mean:

-@@ -1,2 +1,2 @@
+@@ -1 +1 @@
 -cf4b8a020bdc1ba6914093a8a07a5514cce8a3a2979a967b1f32ea704a61785b
 +304566f86a620aae85797a3c491a51fb8c6ecf996407e77b8063aa3ee59672c5

If you did the same comparison with diff -u, you would get @@ -1 + 1 @@, which is correct. It’s taken four sub-revisions, but I now have a fairly extensive bin/ldiff test suite.

Also, you may see cases of \ No newline at end of file showing up, depending on the state of the files being compared (this mostly affects bin/ldiff, but the implementation is in Hunk#diff).

file_lines[-2]&.end_with?("\n") file_lines[-1]&.end_with?("\n") printed?
true true false
true false true
false - false
nil true false
nil false true
nil nil true

I don’t yet have test files with and without newlines for this, but I have run visual comparison tests and diff -u <(diff -u a b) <(ruby -Ilib bin/ldiff -u a b) tests (no difference output).

@JonRowe
Copy link
Contributor

JonRowe commented Jul 1, 2020

FWIW we (RSpec) have gone with a sprinkling of helpers and an additional env var to test both 1.3 and 1.4 outputs, the diff (pardon the the pun) is quite small and if we didn't care about supporting both versions we'd happily just take the 1.4 outputs.

See rspec/rspec-support#421 and its attached issues on the other repositories.

halostatue added a commit that referenced this issue Jul 1, 2020
# This is the 1st commit message:

Fix improperly placed chunks

Resolve #65

- Also add even more tests for checking `ldiff` results against `diff` results.
- Fix issues with diff/ldiff output highlighted by the above tests.
- Add a parameter to indicate that the hunk being processed is the _last_ hunk;
  this results in correct counting of the hunk size.
- The misplaced chunks were happening because of an improper `.abs` on
  `#diff_size`, when the `.abs` needed to be on the finding of the maximum diff
  size.

# This is the commit message #2:

Ooops. Debugger

# This is the commit message #3:

Restore missing test

- Fix some more format issues raised by the missing test.
- Start fixing Rubocop formatting.

# This is the commit message #4:

Last RuboCop fixes

# This is the commit message #5:

Finalize diff-lcs 1.4

# This is the commit message #6:

Fix #44

The problem here was the precedence of `or` vs `||`. Switching to `||` resulted
in the expected behaviour.

# This is the commit message #7:

Resolve #43

# This is the commit message #8:

Typo

# This is the commit message #9:

Resolve #35 with a comment
halostatue added a commit that referenced this issue Jul 1, 2020
- Resolve #65: Two different issues were reported:

  - Line numbers changed between 1.3 and 1.4. This change is intentional, but
    in comparing the output against `diff` for the new examples provided
    it became clear that for unified and context diffs, the last hunk was
    reporting incorrect hunk lengths (that is, if it was `12,4 11,8`, it should
    have been `12,3 11,7`. This has been resolved by providing
    `Diff::LCS::Hunk#diff` an additional parameter, `last` which defaults to
    `false`.

  - Net new lines were added in the middle of a unified diff hunk, which was
    incorrect. That is, we were getting:

    ```diff
    @@ -10,6 +11,10 @@
    recipe[j::default]
    recipe[k::default]
    recipe[l::default]
    +recipe[o::new]
    +recipe[p::new]
    +recipe[q::new]
    +recipe[r::new]
    recipe[m::default]
    recipe[n::default]
    ```

    instead of:

    ```diff
    @@ -12,3 +11,7 @@
    recipe[l::default]
    recipe[m::default]
    recipe[n::default]
    +recipe[o::new]
    +recipe[p::new]
    +recipe[q::new]
    +recipe[r::new]
    ```

    This has been resolved. The error was that `Diff::LCS::Block#diff_size` was
    applying `.abs` to its output, which is incorrect. We need to know the
    direction of the size for placing changes, but when determining the maximum
    block size for use in `Diff::LCS::Hunk#diff` calculations, we need to know
    the `.abs` size.

  - New tests were added to prevent these changes from regressing in the
    future, both as issue tests and as additional `ldiff` tests. These tests
    highlighted more issues with diff-lcs output as compared to `diff`,
    specifically the handling and reporting of missing newlines at the end of
    files. All of the issues highlighted were resolved and the structure of
    `ldiff` tests was changed so that it is easier to add new comparison files
    at any time.

- Resolve #35: Indicate that when comparing against custom objects, `#eql?`
  must be implemented such that objects that _resolve_ to the same meaning are
  _treated_ as the same meaning. This is important because the basic LCS
  algorithm uses a hash for position matching.

- Resolve #43: Provide a more meaningful error from `Diff::LCS::Hunk.new` if
  the `piece` provided does not create useful `Diff::LCS::Block`. It's
  extremely unlikely, this error will be more useful than `NoMethodError` being
  thrown.

- Resolve #44: `ldiff` binary detection failed to work correctly. It had been
  `!old_text or !new_text`, but the precedence of `or` broke that. It is now
  `!old_text || !new_text`.

Also:

- Ran Rubocop again. Updated configuration definitions, fixed some code
  formatting.
mpraglowski added a commit to RailsEventStore/rails_event_store that referenced this issue Jul 8, 2020
The latest changes in diff-lcs gem (updated to 1.4.4) have lead to fails
in our tests due to wrong formatted output. The change has been
introduced here: halostatue/diff-lcs#65
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants