Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp: plumbling/hasher, Add ObjectHasher and ImmutableHash #920

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

pjbgf
Copy link
Member

@pjbgf pjbgf commented Nov 10, 2023

The introduction of both new types are a stepping stone to enable SHA256 support concurrently - without the need for a build tag.

ImmutableHash provides a way to represent varied sized hashes (for SHA1 or SHA256) with a single type, while keeping similar performance of the existing plumbing.Hash.

The names were picked in order to provide a clearer distinction on what they do. ImmutableHash is the result of a hash operation and can't be changed once calculated. ObjectHasher, adds the Git object header before a normal hash operation that hash.Hash or plumbing/hash.Hash do.

The performance results shows that SHA1 operations could be slower, however for SHA256 it can be over 3 times faster:

~Hasher/hasher-sha1-100B-16         	            2202226	     574.7 ns/op	 174.00 MB/s     272 B/op     7 allocs/op
~Hasher/objecthash-sha1-100B-16     	            1511851	     772.6 ns/op	 129.43 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-100B-16   	            5057584	     247.4 ns/op	 404.21 MB/s      96 B/op     7 allocs/op
~Hasher/hasher-sha1-5000B-16        	              96476	     12523 ns/op	 399.25 MB/s    5536 B/op     7 allocs/op
~Hasher/objecthash-sha1-5000B-16    	             105376	     10983 ns/op	 455.27 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-5000B-16  	             420741	      2828 ns/op	1767.77 MB/s      96 B/op     7 allocs/op
~HashFromHex/hasher-parse-sha1-valid-16         	23243548     64.65 ns/op	 618.69 MB/s      48 B/op     1 allocs/op
~HashFromHex/objecthash-fromhex-sha1-valid-16   	18120699     79.62 ns/op	 502.36 MB/s      72 B/op     2 allocs/op
~HashFromHex/objecthash-fromhex-sha256-valid-16 	 8969871    124.2 ns/op	     515.22 MB/s      96 B/op     2 allocs/op

Relates to #229 #899


The other commits (together with the one described above) lay the foundations for a follow-up PR which introduces a exp/plumbing/format/idxfile which supports both SHA1 and SHA256.

Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
The support for SHA256 will require the mapping between ObjectFormat and the
correct hash.Hash to be used.

Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
The new exp package will hold only experimental code, which may eventually
make its way to other parts of go-git.
One example of how it would be used is to support the breaking up of
changes required for things like SHA256. That feature would require
several packages to be changed, and introducing such changes to plumbing
in one go would be too disruptive.

The new changes would now be staged in the new package, and as they
mature they would be either moved into their target destination
(e.g. plumbing/hash), or removed if no longer needed.

As per caveat on the documentation, this package API may introduce
breaking changes to its subpackages in minor go-git releases,
therefore beware when depending on them.

Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
The introduction of both new types are a stepping stone to enable
SHA256 support concurrently - without the need for a build tag.

ImmutableHash provides a way to represent varied sized hashes (for SHA1
or SHA256) with a single type, while keeping similar performance of the
existing plumbing.Hash.

The names were picked in order to provide a clearer distinction on what
they do. ImmutableHash is the result of a hash operation and can't be
changed once calculated. ObjectHasher, adds the Git object header before
a normal hash operation that hash.Hash or plumbing/hash.Hash do.

The performance results shows that SHA1 operations could be slower,
however for SHA256 it can be over 3 times faster:

~Hasher/hasher-sha1-100B-16         	            2202226	     574.7 ns/op	 174.00 MB/s     272 B/op     7 allocs/op
~Hasher/objecthash-sha1-100B-16     	            1511851	     772.6 ns/op	 129.43 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-100B-16   	            5057584	     247.4 ns/op	 404.21 MB/s      96 B/op     7 allocs/op
~Hasher/hasher-sha1-5000B-16        	              96476	     12523 ns/op	 399.25 MB/s    5536 B/op     7 allocs/op
~Hasher/objecthash-sha1-5000B-16    	             105376	     10983 ns/op	 455.27 MB/s     272 B/op     9 allocs/op
~Hasher/objecthash-sha256-5000B-16  	             420741	      2828 ns/op	1767.77 MB/s      96 B/op     7 allocs/op
~HashFromHex/hasher-parse-sha1-valid-16         	23243548     64.65 ns/op	 618.69 MB/s      48 B/op     1 allocs/op
~HashFromHex/objecthash-fromhex-sha1-valid-16   	18120699     79.62 ns/op	 502.36 MB/s      72 B/op     2 allocs/op
~HashFromHex/objecthash-fromhex-sha256-valid-16 	 8969871    124.2 ns/op	     515.22 MB/s      96 B/op     2 allocs/op

Signed-off-by: Paulo Gomes <paulo.gomes@suse.com>
@@ -21,6 +21,7 @@ require (
github.com/pjbgf/sha1cd v0.3.0
github.com/sergi/go-diff v1.1.0
github.com/skeema/knownhosts v1.2.1
github.com/stretchr/testify v1.8.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new testing library or can we use the existing gopkg.in/check.v1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should not have two testing libraries. However, check.v1 is no longer maintained so we are better off removing it across the board. There is an issue already talking about it.

On go-git-fixtures, I already marked the check.v1 API as deprecated and on the new code for SHA256 the idea is to rely on either std Go or testify.

Comment on lines +16 to +19
SHA1_Size = 20
SHA1_HexSize = SHA1_Size * 2
SHA256_Size = 32
SHA256_HexSize = SHA256_Size * 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SHA1_Size = 20
SHA1_HexSize = SHA1_Size * 2
SHA256_Size = 32
SHA256_HexSize = SHA256_Size * 2
SHA1Size = 20
SHA1HexSize = SHA1Size * 2
SHA256Size = 32
SHA256HexSize = SHA256Size * 2

}
}

type objectHasherSHA256 struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both objectHasherSHA1 and objectHasherSHA256 share the same logic, can we combine both in one type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a single base type was my original design, which gave me a few options:
a) have two different sized arrays, to be used based on the hash at hand.
b) rely on a slice and size it based on the hash at hand.
c) use generics and rely on type constraints.

From the options above, the smallest impact in both performance and API changes was this. But I am not convinced this is it yet - hence why feedback on this is so important.

The original plumbing.Hash idea of using a fixed sized array as the base Hash type was really good in so many different ways - including performance. So whatever comes to replace it will need to be quite good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants