Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fingerprint interfaces #30

Open
JacobHayes opened this issue Apr 13, 2021 · 0 comments
Open

Add Fingerprint interfaces #30

JacobHayes opened this issue Apr 13, 2021 · 0 comments

Comments

@JacobHayes
Copy link
Member

Still need to think through whether fingerprints should be calculated from View (in-memory) or Storage (on-disk) data.

Considerations:

  • Portability: View
    • View Fingerprints can be stable across Format/Storage changes (thus avoiding downstream invalidation)
    • If careful, fingerprinting should be stable across different Views
  • Flexibility: View
    • Not all storage systems will support much more than mtime or their own custom checksum (eg: MD5 or CRC32C for GCS/HDFS, mtime only in BQ or local disks, etc)
  • Cost: Storage
    • Storage based fingerprints will often just be a metadata lookup vs a full read for a View

Perhaps if we always track mtime+fingerprint, we can default to assuming mtime based immutability and only compute the real fingerprint on first write or mtime change (when something outside the system mutates data). For things with skewed mtime (ex: a dir of files), we'd probably track the latest one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant