Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Storage#download #331

Merged
merged 1 commit into from Dec 11, 2018
Merged

Remove Storage#download #331

merged 1 commit into from Dec 11, 2018

Conversation

janko
Copy link
Member

@janko janko commented Dec 2, 2018

When refresh_metadata plugin is used, UploadedFile#download using Storage#download causes bad performance when UploadedFile#download is called multiple times during metadata extraction, because in that case the same file will be downloaded from the storage multiple times.

add_metadata :foo do |io, context|
  io.download { ... } # downloaded from the storage (1st time)
end

add_metadata :bar do |io, context|
  io.download { ... } # downloaded from the storage (2nd time)
end

Moreover, the download that UploadedFile#refresh_metadata! started at the beginning of metadata extraction won't be reused. It will be reused for determine_mime_type and store_dimensions plugins, but for any custom metadata that uses UploadedFile#download (either directly or implicitly via Shrine.with_file) a whole new download will be started.

Storage#download existed for two reasons:

  1. If Storage#open uses Down::ChunkedIO, calling UploadedFile#download the first time on an UploadedFile object would copy the remote content to two Tempfiles: the Tempfile that's returned and Down::ChunkedIO's internal Tempfile. Storage#download allows storage implementers to avoid the additional copy.

  2. Some SDKs such as aws-sdk-s3 implement automatic retries only if the download destination is a Tempfile (like in S3#download), but not if it's a block (like in S3#open).

The 1st issue can be addressed by explicitly passing rewindable: false for storages that use Down::ChunkedIO and support :rewindable. Though it's not really a big deal anyway.

The 2nd issue should be fixed in the SDK. google-api-client already supports resumable downloads for any download destination, and fix for aws-sdk-s3 is in the works (aws/aws-sdk-ruby#1617)

I no longer believe Storage#download provides enough benefits to keep around. The performance issues with refresh_metadata plugin and custom metadata are a great reason to throw it away. An added benefit is that storages now have one less method to implement.

Partially addresses #329

When refresh_metadata plugin is used, UploadedFile#download using
Storage#download causes bad performance when UploadedFile#download is
called multiple times during metadata extraction, because in that case
the same file will be downloaded from the storage multiple times.

  add_metadata :foo do |io, context|
    io.download { ... } # downloaded from the storage (1st time)
  end

  add_metadata :bar do |io, context|
    io.download { ... } # downloaded from the storage (2nd time)
  end

Moreover, the download that `UploadedFile#refresh_metadata!` started at
the beginning of metadata extraction won't be reused. It will be reused
for determine_mime_type and store_dimensions plugins, but for any custom
metadata that uses UploadedFile#download (either directly or implicitly
via Shrine.with_file) a whole new download will be started.

Storage#download existed for two reasons:

1) If Storage#open uses Down::ChunkedIO, calling UploadedFile#download
   the first time on an UploadedFile object would copy the remote
   content to two Tempfiles: the Tempfile that's returned and
   Down::ChunkedIO's internal Tempfile. Storage#download allows storage
   implementers to avoid the additional copy.

2) Some SDKs such as aws-sdk-s3 implement automatic retries only if the
   download destination is a Tempfile (like in S3#download), but not if
   it's a block (like in S3#open).

The 1st issue can be addressed by explicitly passing `rewindable: false`
for storages that use Down::ChunkedIO and support :rewindable. Though
it's not really a big deal anyway.

The 2nd issue should be fixed in the SDK. google-api-client already
supports resumable downloads for any download destination, and
fix for aws-sdk-s3 is in the works.

I no longer believe Storage#download provides enough benefits to keep
around. The performance issues with refresh_metadata plugin and custom
metadata are a great reason to throw it away. An added benefit is that
storages now have one less method to implement.
@janko janko merged commit 950a120 into master Dec 11, 2018
@janko janko deleted the remove-storage-download branch December 11, 2018 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant