Advise support The incremental hash #176

ug802 · 2022-07-05T01:51:48Z

i have a disk with 4T files, many small files.

i wish when i hash, it will compare with old sfv file check if exist if hash right and add new hash of new file add into sfv.

ug802 · 2022-07-05T01:58:17Z

my english is poor，just like this：
when hash
old files→check whether exist in the old sfv files and whether hash right
new files → add hash of new file into sfv
then ：
once hash Complete two goals

Thunderbolt32 · 2022-08-30T11:16:15Z

See "Workaround" below, if you don't care about details.

A Checksum/hash file like *.sfv stores the File Path and the Checksum, usually not the folder of which the checksums are calculate neither if specific files of the folder where cherry-picked. Users which are cherry-picking the files, of which a checksum file is calculated, would be angry, if RapidCRC would calculate checksums for maybe a lot a lot more files than needed.
Is a "new file" new, because it is a "new detected file path" or because it is a "new detected checksum result"?
- You can use the File Path for identification and then detect checksum changes (e.g. CRC32). (Usual way to deal with)
- You can use the strong Hash (e.g. Blake3) for identification and then detect file movements. (Identification by Hash is used on Content-Adressable-Storage like they are implemented by restic or kopia, but because of the risk of Birthday-Paradox only reasonable with strong hash algorithms and still avoided on Enterprise Systems like IBM).

I think since "checksum and file change-detection" can become complex, neither of them will become implemented. The only program that i Know and search for additional files (like MultiPar for Pararchive-Format) has a problem when dealing with a lot of small files or with a huge amount of data.

Checksum Storage Way	Can detect checksum missmatch?	Can detect missing files?	Tell you which files are without recognized checksum?	Can still work with randomly renamed files?	Comment
central Checksum-File	✔️	✔️	❌	❌	usual way
decentral in the File Name. (e.g. you always check all files of a folder)	✔️	❌	✔️	❌	also common way / as long as you preserve the checksum on file rename operations (so you have to avoid to rename by automatically tools)
decentral and sticky NTFS Streams (e.g. you always check all files of a folder)	✔️	❌	✔️	✔️	NTFS-Streams are only working as long as you're moving/storing files within NTFS Volumes.

Note: The latter two decentral storage Options are automatically recognized and checked, if RapidCRC is not verifiying a checksum-file (so only calculating "new" checksums for a file).

Workaround

You can calculate a new checksum file. Since it is only a text file, you can check that the lines of new and old checksum file are order-synchronized (If not: sort all lines alphabetically with a tool), and then a Text Comparison Program of your choice will tell you added / removed and different lines and thus new / deleted and missmatching files betwenn the two text files.

OV2 · 2022-09-22T11:56:10Z

Incremental hashing doesn't really fit that well into the concept of RapidCRC. I will most likely not add this.

ug802 changed the title ~~can use~~ Advise support The incremental hash Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advise support The incremental hash #176

Advise support The incremental hash #176

ug802 commented Jul 5, 2022 •

edited

ug802 commented Jul 5, 2022

Thunderbolt32 commented Aug 30, 2022 •

edited

OV2 commented Sep 22, 2022

Advise support The incremental hash #176

Advise support The incremental hash #176

Comments

ug802 commented Jul 5, 2022 • edited

ug802 commented Jul 5, 2022

Thunderbolt32 commented Aug 30, 2022 • edited

Workaround

OV2 commented Sep 22, 2022

ug802 commented Jul 5, 2022 •

edited

Thunderbolt32 commented Aug 30, 2022 •

edited