Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsePatch should preserve "leading garbage" #454

Open
ExplodingCabbage opened this issue Jan 2, 2024 · 0 comments
Open

parsePatch should preserve "leading garbage" #454

ExplodingCabbage opened this issue Jan 2, 2024 · 0 comments

Comments

@ExplodingCabbage
Copy link
Collaborator

ExplodingCabbage commented Jan 2, 2024

Here's an example of a patch emitted by git diff:

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 20b807a..4a96aff 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -2,6 +2,8 @@
 
 ## Pull Requests
 
+bla bla bla
+
 We also accept [pull requests][pull-request]!
 
 Generally we like to see pull requests that
diff --git a/README.md b/README.md
index 06eebfa..40919a6 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,7 @@
 # jsdiff
 
+foo
+
 [![Build Status](https://secure.travis-ci.org/kpdecker/jsdiff.svg)](http://travis-ci.org/kpdecker/jsdiff)
 [![Sauce Test Status](https://saucelabs.com/buildstatus/jsdiff)](https://saucelabs.com/u/jsdiff)
 
@@ -225,3 +227,5 @@ jsdiff deviates from the published algorithm in a couple of ways that don't affe
 
 * jsdiff keeps track of the diff for each diagonal using a linked list of change objects for each diagonal, rather than the historical array of furthest-reaching D-paths on each diagonal contemplated on page 8 of Myers's paper.
 * jsdiff skips considering diagonals where the furthest-reaching D-path would go off the edge of the edit graph. This dramatically reduces the time cost (from quadratic to linear) in cases where the new text just appends or truncates content at the end of the old text.
+
+bar

Parse it with parsePatch and you get this:

[
  {
    oldFileName: 'a/CONTRIBUTING.md',
    oldHeader: '',
    newFileName: 'b/CONTRIBUTING.md',
    newHeader: '',
    hunks: [ [Object] ]
  },
  {
    oldFileName: 'a/README.md',
    oldHeader: '',
    newFileName: 'b/README.md',
    newHeader: '',
    hunks: [ [Object], [Object] ]
  }
]

The stuff before each pair of filenames in the diff has vanished - i.e. this text is nowhere to be seen anywhere in the object returned by parsePatch:

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 20b807a..4a96aff 100644
diff --git a/README.md b/README.md
index 06eebfa..40919a6 100644

If all we want to do with the parsed patch is apply it, this is probably fine. Content in this part of a unified patch file seems to not follow any kind of consistent, specced format and not affect how to actually apply the patch, and is consequently referred to by the patch man page as "leading garbage"(!). But if we want to tweak and reserialize a patch, leaving the garbage unchanged (perhaps for the sake of some other tool that in some way appreciates the garbage), then discarding the garbage upon parsing breaks our ability to do that.

It would therefore be desirable, if possible, to preserve the leading and trailing garbage (perhaps even in leadingGarbage and trailingGarbage properties, just to be totally clear that as far as we're concerned it's just arbitrary text that happened to be in the patch and has no semantics).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant