Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix diffs of files that have quoted paths #504

Merged
merged 1 commit into from Jan 1, 2022
Merged

Fix diffs of files that have quoted paths #504

merged 1 commit into from Jan 1, 2022

Conversation

jcouball
Copy link
Member

@jcouball jcouball commented Dec 30, 2020

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

  • Ensure all commits include DCO sign-off.
  • Ensure that your contributions pass unit testing.
  • Ensure that your contributions contain documentation if applicable.

Description

As was reported in #418, if git diff reports a file path that has special characters like unicode, backslash, double quotes, etc. then the diff fails.

Changes

lib/git/diff.rb

The problem was found to be in the diff parser not knowing how to deal with quoted paths. This PR changes Git's behavior to always quote paths that contain special characters and escape those special characters with backslashes in the same way C escapes control characters (e.g. \t for TAB, \n for LF, \ for backslash) or bytes with values larger than 0x80 (e.g. octal \302\265 for "micro" in UTF-8).

This Diff class uses the new Git::EscapedPath class to unescape a path that conforms to these rules.

lib/git/lib.rb

For every command, set the core.quotePath configuration setting to true by adding -c core.quotePath=true to any git command that is executed. See the details of this option in the core.quotePath documentation.

The encoding methods were moved to Git::EncodingUtils

lib/git/encoding_utils.rb

This module gathers all the encoding methods that were in Git::Lib so they can be shared between different classes.

lib/git/escaped_path.rb

The Gt::EscapedPath class represents an escaped Git path string and provides a method to unescape the string.

tests/units/test_logger.rb

Since the -c core.quotePath was added to every command line, the logger tests had to be changed to ignore changes in global options.

Other

Some characters (like a double quote I learned) are not allowed in Windows filenames. A complete enumeration of the rules for Windows filenames can be found in the article Naming Files, Paths, and Namespaces

@jcouball jcouball force-pushed the quoted_paths branch 2 times, most recently from 5e6f3ea to 2aed839 Compare December 30, 2020 21:51
Copy link

@wasanx25 wasanx25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great PR. I have same issue when Japanese file name rename.

lib/git/diff.rb Outdated
@@ -129,8 +129,8 @@ def process_full_diff
final = {}
current_file = nil
@full_diff.split("\n").each do |line|
if m = /^diff --git a\/(.*?) b\/(.*?)/.match(line)
current_file = m[1]
if m = %r{\Adiff --git ("?)a/(.+?)\1 \1b/(.+?)\1\z}.match(line)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It not work if double quotes begin with b.

Suggested change
if m = %r{\Adiff --git ("?)a/(.+?)\1 \1b/(.+?)\1\z}.match(line)
if m = %r{\Adiff --git ("?)a/(.+?)\1 \1("?)b/(.+?)\1\z}.match(line)
$ ruby -v
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]
$ irb
irb(main):001:0> pattern_a = 'diff --git "a/asdf\"asdf" "b/asdf\"asdf"'
irb(main):002:0> pattern_b = 'diff --git a/asdfasdf "b/asdf\"asdf"'
irb(main):003:0> old_regex = %r{\Adiff --git ("?)a/(.+?)\1 \1b/(.+?)\1\z}
irb(main):004:0> new_regex = %r{\Adiff --git ("?)a/(.+?)\1 \1("?)b/(.+?)\1\z}
irb(main):005:0> old_regex.match(pattern_a)
=> #<MatchData "diff --git \"a/asdf\\\"asdf\" \"b/asdf\\\"asdf\"" 1:"\"" 2:"asdf\\\"asdf" 3:"asdf\\\"asdf">
irb(main):006:0> old_regex.match(pattern_b)
=> nil
irb(main):007:0> new_regex.match(pattern_a)
=> #<MatchData "diff --git \"a/asdf\\\"asdf\" \"b/asdf\\\"asdf\"" 1:"\"" 2:"asdf\\\"asdf" 3:"" 4:"asdf\\\"asdf">
irb(main):008:0> new_regex.match(pattern_b)
=> #<MatchData "diff --git a/asdfasdf \"b/asdf\\\"asdf\"" 1:"" 2:"asdfasdf" 3:"\"" 4:"asdf\\\"asdf\"">

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wasanx25 I have changed the implementation to accommodate your problem where either a/ or b/ might be quoted but not both. The regex I went with is a little different than what you suggested:

if m = %r{\Adiff --git ("?)a/(.+?)\1 ("?)b/(.+?)\3\z}.match(line)

@wasanx25
Copy link

@jcouball Do you plan to merge this pull request? I am having problem yet.

@jcouball
Copy link
Member Author

jcouball commented Dec 23, 2021

@wasanx25 are you able to test this with your use case? I know it has been a long time.

@wasanx25
Copy link

@jcouball Sure, try do test but I need to remember use case so take a while.

@wasanx25
Copy link

wasanx25 commented Dec 28, 2021

@jcouball It passed test of my use case. Rename Japanese filename 日本語ファイル.txt to 2日本語ファイル.txt.

exec.rb

require 'git'

g = Git.open('./')

commit1 = 'f66a6371dce97e5212bb5d7a61dbc81a8023eb53'
commit2 = 'e4c26f70168aced95beeb33158767bf5d86261f4'

puts g.diff(commit1, commit2).map(&:path)
$ git diff f66a6371dce97e5212bb5d7a61dbc81a8023eb53 e4c26f70168aced95beeb33158767bf5d86261f4 | cat
diff --git "a/\346\227\245\346\234\254\350\252\236\343\203\225\343\202\241\343\202\244\343\203\253.txt" "b/2\346\227\245\346\234\254\350\252\236\343\203\225\343\202\241\343\202\244\343\203\253.txt"
similarity index 100%
rename from "\346\227\245\346\234\254\350\252\236\343\203\225\343\202\241\343\202\244\343\203\253.txt"
rename to "2\346\227\245\346\234\254\350\252\236\343\203\225\343\202\241\343\202\244\343\203\253.txt"

$ ruby exec.rb
日本語ファイル.txt

use git-1.10.0

$ ruby exec.rb
/Users/wasanx25/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/git-1.10.0/lib/git/diff.rb:148:in `block in process_full_diff': undefined method `[]' for nil:NilClass (NoMethodError)
        from /Users/wasanx25/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/git-1.10.0/lib/git/diff.rb:131:in `each'
        from /Users/wasanx25/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/git-1.10.0/lib/git/diff.rb:131:in `process_full_diff'
        from /Users/wasanx25/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/git-1.10.0/lib/git/diff.rb:110:in `process_full'
        from /Users/wasanx25/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/git-1.10.0/lib/git/diff.rb:68:in `each'
        from exec.rb:8:in `map'
        from exec.rb:8:in `<main>'

Note: I found when use Danger https://github.com/danger/danger on GitHub Pull Request

Thanks

@jcouball jcouball force-pushed the quoted_paths branch 2 times, most recently from ab5aa82 to 6b4f853 Compare December 31, 2021 19:39
Signed-off-by: James Couball <jcouball@yahoo.com>
@jcouball jcouball merged commit db060fc into master Jan 1, 2022
@jcouball jcouball deleted the quoted_paths branch January 1, 2022 00:25
@jcouball jcouball mentioned this pull request Jan 3, 2022
3 tasks
p-mongo pushed a commit to p-mongodb/ruby-git that referenced this pull request May 27, 2022
…' into mine

* p/diff-submodule: (36 commits)
  Support --submodule option to git diff.
  Support the --all option for git fetch (ruby-git#583)
  Workaround to get JRuby build working (ruby-git#582)
  Update README.md (ruby-git#580)
  Make the directory param to Git.clone optional (ruby-git#578)
  Make Git::URL.clone_to handle cloning to bare and mirror repos (ruby-git#577)
  Add Git::URL #parse and #clone_to methods (ruby-git#575)
  Use the head version of yard (ruby-git#573)
  Release v1.11.0
  Supress unneeded test output (ruby-git#570)
  Add support for fetch options "--force/-f" and "--prune-tags/-P". (ruby-git#563)
  Fix bug when grepping lines that contain numbers surrounded by colons (ruby-git#566)
  remove from maintainer (ruby-git#567)
  Address command line injection in Git::Lib#fetch
  Release v1.10.2 (ruby-git#561)
  Add create-release, setup, and console dev scripts (ruby-git#560)
  Store tempfile objects to prevent deletion during tests (ruby-git#555)
  Release v1.10.1 (ruby-git#553)
  Properly escape double quotes in shell commands on Windows (ruby-git#552)
  Properly unescape diff paths (ruby-git#504)
  ...

* p/set-url-push: (36 commits)
  Add :push option to remote_set_url.
  Support the --all option for git fetch (ruby-git#583)
  Workaround to get JRuby build working (ruby-git#582)
  Update README.md (ruby-git#580)
  Make the directory param to Git.clone optional (ruby-git#578)
  Make Git::URL.clone_to handle cloning to bare and mirror repos (ruby-git#577)
  Add Git::URL #parse and #clone_to methods (ruby-git#575)
  Use the head version of yard (ruby-git#573)
  Release v1.11.0
  Supress unneeded test output (ruby-git#570)
  Add support for fetch options "--force/-f" and "--prune-tags/-P". (ruby-git#563)
  Fix bug when grepping lines that contain numbers surrounded by colons (ruby-git#566)
  remove from maintainer (ruby-git#567)
  Address command line injection in Git::Lib#fetch
  Release v1.10.2 (ruby-git#561)
  Add create-release, setup, and console dev scripts (ruby-git#560)
  Store tempfile objects to prevent deletion during tests (ruby-git#555)
  Release v1.10.1 (ruby-git#553)
  Properly escape double quotes in shell commands on Windows (ruby-git#552)
  Properly unescape diff paths (ruby-git#504)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants