concurrent hash

A simple two level Merkle tree for hashing large files to ensure integrity. Fast hashing algorithms exist for the case of large files but they dont consider the entire file and thus cannot guarantee integrity. Concurrently hashing blocks of the file and then hashing the hashes is not a new idea, both zfs and btrfs hash inodes all the way up the directory tree to ensure the filesystem is not corrupted.

Usage

Library

var ch = concurrenthash.NewConcurrentHash(context.Background(), 2, 2, sha512.New)
var hash, err = ch.HashFile(file)
fmt.Println(hash)

Some hash algorithms do not have constructors that return hash.Hash so there are convenience wrappers you can use.

Cli

./concurrenthash -file /path/to/large/file -threads 4 -block-size 1

Options:
 -algos                 print available hash algos
 -file                  input file
 -hash-func             hash algo to use, default: sha256
 -threads               amount of concurrency, default: 1
 -block-size            size of the leaf nodes to be hashed, default: 1MB

Benchmarks

Time to hash a 10GB file of /dev/urandom data

algo	time (s)
adler32	8.02
crc32Castagnoli	6.60
crc32IEEE	4.99
crc32Koopman	6.18
crc64ECMA	5.54
crc64ISO	4.20
fnv32	4.15
fnv32a	4.14
fnv64	4.58
fnv64a	4.33
md5	19.01
murmur32	5.05
murmur64	5.36
sha1	13.07
sha256	18.14
sha512	18.10

Block size benchmarks

Raw data

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github		.github
benchmark		benchmark
cmd		cmd
.chglog.yml		.chglog.yml
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
.goreleaser.yaml		.goreleaser.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
benchmarks.csv		benchmarks.csv
benchmarks.svg		benchmarks.svg
changelog.yml		changelog.yml
codecov.yml		codecov.yml
collect.go		collect.go
collect_test.go		collect_test.go
concurrenthash.go		concurrenthash.go
concurrenthash_test.go		concurrenthash_test.go
file.go		file.go
file_test.go		file_test.go
go.mod		go.mod
go.sum		go.sum
hash.go		hash.go
hash_benchmarks_test.go		hash_benchmarks_test.go
hash_test.go		hash_test.go
rand-file.txt		rand-file.txt
wrappers.go		wrappers.go
wrappers_test.go		wrappers_test.go

License

kmulvey/concurrenthash

Folders and files

Latest commit

History

Repository files navigation

concurrent hash

Usage

Library

Cli

Benchmarks

Block size benchmarks

About

Topics

Resources

License

Stars

Watchers

Forks

Languages