Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Git::Lib#commit_data for GPG-signed commits #610

Merged
merged 9 commits into from Feb 2, 2023

Conversation

urbanautomaton
Copy link
Contributor

@urbanautomaton urbanautomaton commented Jan 16, 2023

Your checklist for this pull request

馃毃Please review the guidelines for contributing to this repository.

  • Ensure all commits include DCO sign-off.
  • Ensure that your contributions pass unit testing.
  • Ensure that your contributions contain documentation if applicable.

Description

Git::Lib#commit_data currently misparses commits containing a multiline gpgsig header, extracting only the first line as the gpgsig entry, and considering the rest of the key part of the commit message. For example, in this repo:

$ bin/console
irb(main):002:0> puts Git.open('.').object('45b467c').message
 wsBcBAABCAAQBQJiZ0gbCRBK7hj4Ov3rIwAAwrAIAKFFKX/2JvpQUg8IMJ++uZMh
 IwCLkxuNELMTIT7EbDGCSZYba6Pjr521CY47l10jv02WNr8iL8JqOssi+UMQRgzG
 3/jAuZcIs1vA/dfl+UAFxWkrTFu/dGDsWhlzkJAzDKCxWUoBVn9aU1UOO4Byja24
 B3ULgkY59a+8hbMQQwjW0E44OCq1hRDiCOZnDlhnhKAQZa/heLOFlRlEE9uiiJNj
 A/6UTvA3JTG9fF+4Ne2m1ECOOnPbIT1k6xcOY87qtCivK4jofFpGCFD1sCeaJQWZ
 HE5QiDlDLv9uqWarwIgmOX9H9/7Iu2L4hzCxFl03eUHZe0vh+GUbDsjVYGhoVuM=
 =G6pC
 -----END PGP SIGNATURE-----


Make the directory param to Git.clone optional (#578)

Signed-off-by: James Couball <jcouball@yahoo.com>
=> nil

Per the git docs the gpgsig header is used to give the GPG signature of a signed commit, with continuation lines denoted by a leading space:

tree eebfed94e75e7760540d1485c740902590a00332
parent 04b871796dc0420f8e7561a895b52484b701d51a
author A U Thor <author@example.com> 1465981137 +0000
committer C O Mitter <committer@example.com> 1465981137 +0000
gpgsig -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 $
 iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
 HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
 DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
 zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
 HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
 EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
 =jKHM
 -----END PGP SIGNATURE-----

signed commit

signed commit message body

This PR updates Git::Lib#commit_data to support multiline header values, including gpgsig.

Feedback

All feedback is most welcome, but in particular:

  • I've introduced a whole new example repo and test file to cover this. While there's only one test case at the moment, I thought more might be added (e.g. for tag signatures or mergetag signatures). Does that seem reasonable or is this overkill?
  • I've included a small amount of tidying in the various #process_{x}_data methods to remove unused parameters or default values that confused me when reading the code. If you'd prefer I kept that sort of thing to a separate PR, I'd be happy to do so.

Simon Coffey added 8 commits January 17, 2023 11:10
Git::Lib#commit_data currently produces a malformed data hash for a
commit containing a gitsig header field, with the majority of the PGP
signature being considered part of the message.

I'd like to fix this, so this introduces a new example repo with a
single signed commit in it.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
Git::Lib#commit_data currently misparses commits containing a multiline
gpgsig header, extracting only the first line as the gpgsig entry, and
considering the rest of the key part of the commit message.

Per the git docs[1] the gpgsig header is used to give the GPG signature
of a signed commit, with continuation lines denoted by a leading space:

    tree eebfed94e75e7760540d1485c740902590a00332
    parent 04b871796dc0420f8e7561a895b52484b701d51a
    author A U Thor <author@example.com> 1465981137 +0000
    committer C O Mitter <committer@example.com> 1465981137 +0000
    gpgsig -----BEGIN PGP SIGNATURE-----
     Version: GnuPG v1
     $
     iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
     HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
     DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
     zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
     HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
     EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
     =jKHM
     -----END PGP SIGNATURE-----

    signed commit

    signed commit message body

This commit adds a test and a special parsing case for the gpgsig header
to accommodate this.

[1] https://git-scm.com/docs/gitformat-signature#_commit_signatures

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
In the previous commit I introduced a new case for git commit header
parsing to cover the gpgsig header, which supports multi-line values.

I think the #process_commit_data method is getting a bit unwieldy, so
this commit extracts the generic header parsing, separating it from the
special-case handling of parent (which is not multi-line, but can have
multiple entries and thus multiple values).

By switching to a regex key/value extraction approach we're also able to
avoid splitting the entire string before re-joining the value.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
The general format of `git cat-file tag` output is identical to that of
`git cat-file commit`, so we can use the generic parsing helper.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
These methods always explicitly set the 'message' key of the output hash
based on the `git cat-file` output, so we don't need a default in the
starting hash.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
This was introduced in [1] to handle variance in indentation when
parsing commit messages; at the time the #process_commit_data method had
two callers, one which used it to process log lines, and the other which
used it to process the output of `git cat-file commit <SHA>`. The former
sees a 4-space indent, the latter, zero.

Since [2], however, the log processing has been handled by
the #process_commit_log_data method, so #process_commit_data exclusively
handles un-indented inputs, and we can remove the indent parameter.

[1] ruby-git@05117d4
[2] ruby-git@97e1fff

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
As with the previous commit, this default value was relevant when this
method was also used to process git log output. It no longer is, and its
only caller passes the sha value, so we can remove the default.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
As with the previous commits, this indent parameter appears to be a
relic of old usage; the #process_tag_data method has a single caller
that explicitly sets the indent to zero, so the indent parameter and
associated handling can just be removed.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
Copy link
Member

@jcouball jcouball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great!

@jcouball
Copy link
Member

jcouball commented Feb 2, 2023

Great job and thank you for your contribution.

I think the one test case is sufficient but I don't understand the use case as well as you do. If you think the codebase would benefit from additional test cases, they would be gladly accepted as a separate PR.

I am not a big fan of committing these test repos vs. creating them on the fly because it can be hard to understand the condition that is being tested. If you wanted to redo the test case and add creating the test repository on the fly, that would also be gladly accepted as a separate PR. #602 and #513 should different ways of doing that.

As far as the tidying, seems great to me. Unused / untested code is technical debt. Thanks for removing it.

@jcouball jcouball merged commit b6e031d into ruby-git:master Feb 2, 2023
@jcouball jcouball mentioned this pull request Feb 2, 2023
@urbanautomaton
Copy link
Contributor Author

Thanks for the merge!

I am not a big fan of committing these test repos vs. creating them on the fly because it can be hard to understand the condition that is being tested.

Ah cool - I hadn't spotted that there were dynamically-created repos for some tests, so I followed the static pattern. Agreed that on-the-fly creation seems much clearer if it can be made to work with the PGP signing, I'll have a go.

@urbanautomaton urbanautomaton deleted the sc-fix-gpgsig-parsing branch February 3, 2023 10:10
urbanautomaton pushed a commit to urbanautomaton/ruby-git that referenced this pull request Feb 3, 2023
This replaces the test repo introduced in ruby-git#610 with one created on the
fly in the test.

I've switched from GPG to SSH signing to minimise the likelihood that
we'll need extra dependencies on CI or other contributors' machines.

I've also removed some of the commit metadata assertions that didn't
feel particularly necessary, and this allowed me to reduce the setup
steps for clarity.
urbanautomaton pushed a commit to urbanautomaton/ruby-git that referenced this pull request Feb 3, 2023
This replaces the test repo introduced in ruby-git#610 with one created on the
fly in the test.

I've switched from GPG to SSH signing to minimise the likelihood that
we'll need extra dependencies on CI or other contributors' machines.

I've also removed some of the commit metadata assertions that didn't
feel particularly necessary, and this allowed me to reduce the setup
steps for clarity.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
jcouball pushed a commit that referenced this pull request Feb 5, 2023
This replaces the test repo introduced in #610 with one created on the
fly in the test.

I've switched from GPG to SSH signing to minimise the likelihood that
we'll need extra dependencies on CI or other contributors' machines.

I've also removed some of the commit metadata assertions that didn't
feel particularly necessary, and this allowed me to reduce the setup
steps for clarity.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants