The Go implementation of find-duplicate-files to find duplicate files in a directory. This Go implementation does not handle symlinked directories.
This module will walk the given directory tree and then group files by size (indicating potential duplicate content) followed by comparing the hash of the file. This hash can be chunked by passing in a chunk arg. This will compute an initial hash for a chunk of the file before then computing the full hash if the first hash matched, thus avoiding computing expensive hashes on large files.
- Go 1.14.2+
> go get -v github.com/hp310780/findduplicatefiles
To use as a Go package:
import "github.com/hp310780/findduplicatefiles"
// Args: path to the directory to search for duplicates
// chunk size for initial hash. 1 indicates full file, 2 is half etc.
duplicates := find_duplicate_files.FindDuplicateFiles("/path/to/dir", 1)
To run the tests, please use the following commands:
> cd <findduplicatefiles directory>
> go test -v
The test data provided takes the following form -
- test/test_data/test*.txt: Text files of equal length but differing content. test1 and test2 are the same, test3 is different.
- test/test_data/out.gif: Image file.
- Investigate performance and benchmarking for large files.
- Investigate how to resolve symlinks gracefully.