Skip to content

Commit

Permalink
Fix Git::Lib#commit_data for GPG-signed commits (#610)
Browse files Browse the repository at this point in the history
* Introduce example repo with a signed commit

Git::Lib#commit_data currently produces a malformed data hash for a
commit containing a gitsig header field, with the majority of the PGP
signature being considered part of the message.

I'd like to fix this, so this introduces a new example repo with a
single signed commit in it.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>

* Fix parsing of multiline gpgsig commit header field

Git::Lib#commit_data currently misparses commits containing a multiline
gpgsig header, extracting only the first line as the gpgsig entry, and
considering the rest of the key part of the commit message.

Per the git docs[1] the gpgsig header is used to give the GPG signature
of a signed commit, with continuation lines denoted by a leading space:

    tree eebfed94e75e7760540d1485c740902590a00332
    parent 04b871796dc0420f8e7561a895b52484b701d51a
    author A U Thor <author@example.com> 1465981137 +0000
    committer C O Mitter <committer@example.com> 1465981137 +0000
    gpgsig -----BEGIN PGP SIGNATURE-----
     Version: GnuPG v1
     $
     iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/
     HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7
     DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA
     zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4
     HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1
     EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I=
     =jKHM
     -----END PGP SIGNATURE-----

    signed commit

    signed commit message body

This commit adds a test and a special parsing case for the gpgsig header
to accommodate this.

[1] https://git-scm.com/docs/gitformat-signature#_commit_signatures

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>

* Extract commit header parsing

In the previous commit I introduced a new case for git commit header
parsing to cover the gpgsig header, which supports multi-line values.

I think the #process_commit_data method is getting a bit unwieldy, so
this commit extracts the generic header parsing, separating it from the
special-case handling of parent (which is not multi-line, but can have
multiple entries and thus multiple values).

By switching to a regex key/value extraction approach we're also able to
avoid splitting the entire string before re-joining the value.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>

* Use cat-file header parsing for Git::Lib#tag_data

The general format of `git cat-file tag` output is identical to that of
`git cat-file commit`, so we can use the generic parsing helper.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>

* Remove unnecessary default tag/commit message values

These methods always explicitly set the 'message' key of the output hash
based on the `git cat-file` output, so we don't need a default in the
starting hash.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>

* Remove #process_commmit_data indent parameter

This was introduced in [1] to handle variance in indentation when
parsing commit messages; at the time the #process_commit_data method had
two callers, one which used it to process log lines, and the other which
used it to process the output of `git cat-file commit <SHA>`. The former
sees a 4-space indent, the latter, zero.

Since [2], however, the log processing has been handled by
the #process_commit_log_data method, so #process_commit_data exclusively
handles un-indented inputs, and we can remove the indent parameter.

[1] 05117d4
[2] 97e1fff

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>

* Remove #process_commit_data default sha value

As with the previous commit, this default value was relevant when this
method was also used to process git log output. It no longer is, and its
only caller passes the sha value, so we can remove the default.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>

* Remove #process_tag_data indent parameter

As with the previous commits, this indent parameter appears to be a
relic of old usage; the #process_tag_data method has a single caller
that explicitly sets the indent to zero, so the indent parameter and
associated handling can just be removed.

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>

---------

Signed-off-by: Simon Coffey <simon.coffey@futurelearn.com>
Co-authored-by: James Couball <jcouball@yahoo.com>
  • Loading branch information
urbanautomaton and jcouball committed Feb 2, 2023
1 parent b12b820 commit b6e031d
Show file tree
Hide file tree
Showing 12 changed files with 99 additions and 26 deletions.
55 changes: 29 additions & 26 deletions lib/git/lib.rb
Expand Up @@ -221,54 +221,57 @@ def object_size(sha)
def commit_data(sha)
sha = sha.to_s
cdata = command_lines('cat-file', 'commit', sha)
process_commit_data(cdata, sha, 0)
process_commit_data(cdata, sha)
end

def process_commit_data(data, sha = nil, indent = 4)
def process_commit_data(data, sha)
hsh = {
'sha' => sha,
'message' => '',
'parent' => []
'sha' => sha,
'parent' => []
}

loop do
key, *value = data.shift.split

break if key.nil?

each_cat_file_header(data) do |key, value|
if key == 'parent'
hsh['parent'] << value.join(' ')
hsh['parent'] << value
else
hsh[key] = value.join(' ')
hsh[key] = value
end
end

hsh['message'] = data.collect {|line| line[indent..-1]}.join("\n") + "\n"
hsh['message'] = data.join("\n") + "\n"

return hsh
end

CAT_FILE_HEADER_LINE = /\A(?<key>\w+) (?<value>.*)\z/

def each_cat_file_header(data)
while (match = CAT_FILE_HEADER_LINE.match(data.shift))
key = match[:key]
value_lines = [match[:value]]

while data.first.start_with?(' ')
value_lines << data.shift.lstrip
end

yield key, value_lines.join("\n")
end
end

def tag_data(name)
sha = sha.to_s
tdata = command_lines('cat-file', 'tag', name)
process_tag_data(tdata, name, 0)
process_tag_data(tdata, name)
end

def process_tag_data(data, name, indent=4)
hsh = {
'name' => name,
'message' => ''
}

loop do
key, *value = data.shift.split

break if key.nil?
def process_tag_data(data, name)
hsh = { 'name' => name }

hsh[key] = value.join(' ')
each_cat_file_header(data) do |key, value|
hsh[key] = value
end

hsh['message'] = data.collect {|line| line[indent..-1]}.join("\n") + "\n"
hsh['message'] = data.join("\n") + "\n"

return hsh
end
Expand Down
1 change: 1 addition & 0 deletions tests/files/signed_commits/dot_git/HEAD
@@ -0,0 +1 @@
ref: refs/heads/main
7 changes: 7 additions & 0 deletions tests/files/signed_commits/dot_git/config
@@ -0,0 +1,7 @@
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
Binary file added tests/files/signed_commits/dot_git/index
Binary file not shown.
1 change: 1 addition & 0 deletions tests/files/signed_commits/dot_git/logs/HEAD
@@ -0,0 +1 @@
0000000000000000000000000000000000000000 a043c677c93d9f2b285771a29742cc73715e41ea Simon Coffey <simon.coffey@futurelearn.com> 1673868871 +0000 commit (initial): Signed commit
1 change: 1 addition & 0 deletions tests/files/signed_commits/dot_git/logs/refs/heads/main
@@ -0,0 +1 @@
0000000000000000000000000000000000000000 a043c677c93d9f2b285771a29742cc73715e41ea Simon Coffey <simon.coffey@futurelearn.com> 1673868871 +0000 commit (initial): Signed commit
Binary file not shown.
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions tests/files/signed_commits/dot_git/refs/heads/main
@@ -0,0 +1 @@
a043c677c93d9f2b285771a29742cc73715e41ea
1 change: 1 addition & 0 deletions tests/files/signed_commits/notes.txt
@@ -0,0 +1 @@
it is very important that changes to this file are verified
58 changes: 58 additions & 0 deletions tests/units/test_signed_commits.rb
@@ -0,0 +1,58 @@
#!/usr/bin/env ruby

require File.dirname(__FILE__) + '/../test_helper'
require "fileutils"

class TestSignedCommits < Test::Unit::TestCase
def git_working_dir
cwd = FileUtils.pwd
if File.directory?(File.join(cwd, 'files'))
test_dir = File.join(cwd, 'files')
elsif File.directory?(File.join(cwd, '..', 'files'))
test_dir = File.join(cwd, '..', 'files')
elsif File.directory?(File.join(cwd, 'tests', 'files'))
test_dir = File.join(cwd, 'tests', 'files')
end

create_temp_repo(File.expand_path(File.join(test_dir, 'signed_commits')))
end

def create_temp_repo(clone_path)
filename = 'git_test' + Time.now.to_i.to_s + rand(300).to_s.rjust(3, '0')
@tmp_path = File.join("/tmp/", filename)
FileUtils.mkdir_p(@tmp_path)
FileUtils.cp_r(clone_path, @tmp_path)
tmp_path = File.join(@tmp_path, File.basename(clone_path))
Dir.chdir(tmp_path) do
FileUtils.mv('dot_git', '.git')
end
tmp_path
end

def setup
@lib = Git.open(git_working_dir).lib
end

def test_commit_data
data = @lib.commit_data('a043c677c93d9f2b')

assert_equal('Simon Coffey <simon.coffey@futurelearn.com> 1673868871 +0000', data['author'])
assert_equal('92fd5b7c1aeb6a4e2602799f01608b3deebbad2d', data['tree'])
assert_equal(<<~EOS.chomp, data['gpgsig'])
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRmiEtd91BkbBpcgV2yCJ+VnJz/iQUCY8U2cgAKCRCyCJ+VnJz/
ibjyAP48dGdoFgWL2BjV3CnmebdVjEjTCQtF2QGUybJsyJhhcwEAwbzAAGt3YHfS
uuLNH9ki9Sqd+/CH+L8Q2dPM5F4l3gg=
=3ATn
-----END PGP SIGNATURE-----
EOS
assert_equal(<<~EOS, data['message'])
Signed commit
This will allow me to test commit data extraction for signed commits.
I'm making the message multiline to make sure that message extraction is
preserved.
EOS
end
end

0 comments on commit b6e031d

Please sign in to comment.