Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advise support The incremental hash #176

Open
ug802 opened this issue Jul 5, 2022 · 3 comments
Open

Advise support The incremental hash #176

ug802 opened this issue Jul 5, 2022 · 3 comments

Comments

@ug802
Copy link

ug802 commented Jul 5, 2022

i have a disk with 4T files, many small files.

i wish when i hash, it will compare with old sfv file check if exist if hash right and add new hash of new file add into sfv.

@ug802
Copy link
Author

ug802 commented Jul 5, 2022

my english is poor,just like this:
when hash
old files→check whether exist in the old sfv files and whether hash right
new files → add hash of new file into sfv
then :
once hash Complete two goals

@ug802 ug802 changed the title can use Advise support The incremental hash Jul 5, 2022
@Thunderbolt32
Copy link

Thunderbolt32 commented Aug 30, 2022

See "Workaround" below, if you don't care about details.

  1. A Checksum/hash file like *.sfv stores the File Path and the Checksum, usually not the folder of which the checksums are calculate neither if specific files of the folder where cherry-picked. Users which are cherry-picking the files, of which a checksum file is calculated, would be angry, if RapidCRC would calculate checksums for maybe a lot a lot more files than needed.
  2. Is a "new file" new, because it is a "new detected file path" or because it is a "new detected checksum result"?
    • You can use the File Path for identification and then detect checksum changes (e.g. CRC32). (Usual way to deal with)
    • You can use the strong Hash (e.g. Blake3) for identification and then detect file movements. (Identification by Hash is used on Content-Adressable-Storage like they are implemented by restic or kopia, but because of the risk of Birthday-Paradox only reasonable with strong hash algorithms and still avoided on Enterprise Systems like IBM).

I think since "checksum and file change-detection" can become complex, neither of them will become implemented. The only program that i Know and search for additional files (like MultiPar for Pararchive-Format) has a problem when dealing with a lot of small files or with a huge amount of data.


Checksum Storage Way Can detect checksum missmatch? Can detect missing files? Tell you which files are without recognized checksum? Can still work with randomly renamed files? Comment
central Checksum-File ✔️ ✔️ usual way
decentral in the File Name. (e.g. you always check all files of a folder) ✔️ ✔️ also common way / as long as you preserve the checksum on file rename operations (so you have to avoid to rename by automatically tools)
decentral and sticky NTFS Streams (e.g. you always check all files of a folder) ✔️ ✔️ ✔️ NTFS-Streams are only working as long as you're moving/storing files within NTFS Volumes.

Note: The latter two decentral storage Options are automatically recognized and checked, if RapidCRC is not verifiying a checksum-file (so only calculating "new" checksums for a file).

Workaround

You can calculate a new checksum file. Since it is only a text file, you can check that the lines of new and old checksum file are order-synchronized (If not: sort all lines alphabetically with a tool), and then a Text Comparison Program of your choice will tell you added / removed and different lines and thus new / deleted and missmatching files betwenn the two text files.

@OV2
Copy link
Owner

OV2 commented Sep 22, 2022

Incremental hashing doesn't really fit that well into the concept of RapidCRC. I will most likely not add this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants