Skip to content

How to: Use file's digest (e.g. MD5, SHA 1) as file path

PikachuEXE edited this page Nov 12, 2012 · 1 revision

Reason

I am talking about file path but not filename
A directory can only contain limited number of files & sub-directories
So using different paths for different files can make sure the limit is not reached
It's called Directory Partitioning if I am not mistaken (Google carrierwave directory partitioning)

Environment

  • Use S3 as image host
    So this might not work if images are uploaded to the server
  • Avatar has string column digest (null: true)

Code

avatar_uploader.rb:

def store_dir
  # model.digest is used here so that it can be read after saving (not just when we have the cache file)
  digest = model.digest
  "uploads/#{model.class.to_s.underscore}/#{model.id}/#{digest[0..1]}/#{digest[2..3]}/#{digest[4..-1]}"
end

avatar.rb:

class Avatar < ActiveRecord::Base
  mount_uploader :image, AvatarUploader

  ### Digest
  #
  # Assume the model has `digest` column
  #
  # When Reading image, the local file does NOT exist,
  # And this method should return the digest from model
  #
  # When Writing image (upload), the local file DOES exist,
  # And this method should return digest calculated from file content
  def digest
    # Check if LOCAL file exists, i.e. is uploading file
    # 1. File not changed = just read from record
    # 2. File not present = How to generate digest?
    # 3. File has no path = Same as #2
    # 4. File is not local = The file is uploaded, not uploading, this check does not work if file are uploaded locally though
    if image_changed? &&
        image.file.present? &&
        image.file.respond_to?(:path) &&
        File.exists?(image.file.path)
      DigestGenerator. digest_for_file_at(image.file.path)
    else # Reading image
      self[:digest]
    end
  end

  before_save :update_digest

  protected

  def update_digest
    self.digest = digest if image_changed?
  end
end

digest_generator.rb:

require 'digest'

module DigestGenerator
  ##
  # Create a hash based on the file content at +file_path+
  # I hope it is not too slow
  #
  # === Params
  # +file_path+: the path for file, assumed exists
  # +digest_klass+: Optional, the digest class your want to use
  #
  # === Returns
  #
  # digest from file content
  def self.digest_for_file_at(file_path, digest_klass=Digest::SHA1)
    digest = digest_klass.new

    digest.file(file_path).hexdigest
  end
end
Clone this wiki locally