Skip to content

Commit

Permalink
Remove Storage#download
Browse files Browse the repository at this point in the history
When refresh_metadata plugin is used, UploadedFile#download using
Storage#download causes bad performance when UploadedFile#download is
called multiple times during metadata extraction, because in that case
the same file will be downloaded from the storage multiple times.

  add_metadata :foo do |io, context|
    io.download { ... } # downloaded from the storage (1st time)
  end

  add_metadata :bar do |io, context|
    io.download { ... } # downloaded from the storage (2nd time)
  end

Moreover, the download that `UploadedFile#refresh_metadata!` started at
the beginning of metadata extraction won't be reused. It will be reused
for determine_mime_type and store_dimensions plugins, but for any custom
metadata that uses UploadedFile#download (either directly or implicitly
via Shrine.with_file) a whole new download will be started.

Storage#download existed for two reasons:

1) If Storage#open uses Down::ChunkedIO, calling UploadedFile#download
   the first time on an UploadedFile object would copy the remote
   content to two Tempfiles: the Tempfile that's returned and
   Down::ChunkedIO's internal Tempfile. Storage#download allows storage
   implementers to avoid the additional copy.

2) Some SDKs such as aws-sdk-s3 implement automatic retries only if the
   download destination is a Tempfile (like in S3#download), but not if
   it's a block (like in S3#open).

The 1st issue can be addressed by explicitly passing `rewindable: false`
for storages that use Down::ChunkedIO and support :rewindable. Though
it's not really a big deal anyway.

The 2nd issue should be fixed in the SDK. google-api-client already
supports resumable downloads for any download destination, and
fix for aws-sdk-s3 is in the works.

I no longer believe Storage#download provides enough benefits to keep
around. The performance issues with refresh_metadata plugin and custom
metadata are a great reason to throw it away. An added benefit is that
storages now have one less method to implement.
  • Loading branch information
janko committed Dec 2, 2018
1 parent 9977435 commit dcb5b2a
Show file tree
Hide file tree
Showing 11 changed files with 172 additions and 249 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
## HEAD

* Deprecate `Storage::S3#download` (@janko-m)

* Stop using `Storage#download` in `UploadedFile#download` for peformance (@janko-m)

* Remove `#download` from the Shrine storage specification (@janko-m)

* Strip query params in upload location when re-uploading from `shrine-url` storage (@jrochkind)

* Inline Base plugin into core classes, extract them to separate files (@printercu)
Expand Down
21 changes: 0 additions & 21 deletions doc/creating_storages.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,27 +122,6 @@ The storage can support additional options to customize how the file will be
opened, `Shrine::UploadedFile#open` and `Shrine::UploadedFile#download` will
forward any given options to `#open`.

## Download

`Shrine::UploadedFile#download` by default uses the `#open` storage method to
stream file content to a Tempfile. However, if you would like to use your own
custom way of downloading to a file, you can define `#download` on the storage
and `Shrine::UploadedFile#download` will automatically call that instead.

```rb
class MyStorage
# ...
def download(id, **options)
# download the uploaded file to a Tempfile
end
# ...
end
```

The storage can support additional options to customize how the file will be
downloaded, `Shrine::UploadedFile#download` will forward any given options to
`#download`.

## Url

The `#url` storage method is called by `Shrine::UploadedFile#url`, it accepts a
Expand Down
1 change: 1 addition & 0 deletions doc/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ uploaded_file.size
# storage methods
uploaded_file.url
uploaded_file.exists?
uploaded_file.open
uploaded_file.download
uploaded_file.delete
# ...
Expand Down
26 changes: 16 additions & 10 deletions lib/shrine/storage/file_system.rb
Original file line number Diff line number Diff line change
Expand Up @@ -207,17 +207,13 @@ def path(id)
end

# Catches the deprecated `#download` method.
def method_missing(name, *args)
def method_missing(name, *args, &block)
case name
when :download then deprecated_download(*args, &block)
else
super
end
if name == :download
begin
Shrine.deprecation("Shrine::Storage::FileSystem#download is deprecated and will be removed in Shrine 3.")
tempfile = Tempfile.new(["shrine-filesystem", File.extname(args[0])], binmode: true)
open(*args) { |file| IO.copy_stream(file, tempfile) }
tempfile.tap(&:open)
rescue
tempfile.close! if tempfile
raise
end
else
super
end
Expand Down Expand Up @@ -256,6 +252,16 @@ def relative_path(id)
def relative(path)
path.sub(%r{^/}, "")
end

def deprecated_download(id, **options)
Shrine.deprecation("Shrine::Storage::FileSystem#download is deprecated and will be removed in Shrine 3.")
tempfile = Tempfile.new(["shrine-filesystem", File.extname(id)], binmode: true)
open(id, **options) { |file| IO.copy_stream(file, tempfile) }
tempfile.tap(&:open)
rescue
tempfile.close! if tempfile
raise
end
end
end
end
7 changes: 0 additions & 7 deletions lib/shrine/storage/linter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ def initialize(storage, action: :error)
def call(io_factory = default_io_factory)
storage.upload(io_factory.call, id = "foo".dup, {})

lint_download(id) if storage.respond_to?(:download)
lint_open(id)
lint_exists(id)
lint_url(id)
Expand All @@ -59,12 +58,6 @@ def call(io_factory = default_io_factory)
end
end

def lint_download(id)
downloaded = storage.download(id)
error :download, "doesn't return a Tempfile" if !downloaded.is_a?(Tempfile)
error :download, "returns an empty IO object" if downloaded.read.empty?
end

def lint_open(id)
opened = storage.open(id)
error :open, "doesn't return a valid IO object" if !io?(opened)
Expand Down
54 changes: 28 additions & 26 deletions lib/shrine/storage/s3.rb
Original file line number Diff line number Diff line change
Expand Up @@ -273,11 +273,6 @@ module Storage
# [serve private content via CloudFront]: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
# [`Aws::CloudFront::UrlSigner`]: https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/CloudFront/UrlSigner.html
class S3
# Tempfile with content_type accessor which represents downloaded files.
class Tempfile < ::Tempfile
attr_accessor :content_type
end

attr_reader :client, :bucket, :prefix, :host, :upload_options, :signer, :public

# Initializes a storage for uploading to S3. All options are forwarded to
Expand Down Expand Up @@ -373,21 +368,6 @@ def upload(io, id, shrine_metadata: {}, **upload_options)
end
end

# Downloads the file from S3 and returns a `Tempfile`. The download will
# be automatically retried up to 3 times. Any additional options are
# forwarded to [`Aws::S3::Object#get`].
#
# [`Aws::S3::Object#get`]: http://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#get-instance_method
def download(id, **options)
tempfile = Tempfile.new(["shrine-s3", File.extname(id)], binmode: true)
data = object(id).get(response_target: tempfile, **options)
tempfile.content_type = data.content_type
tempfile.tap(&:open)
rescue
tempfile.close! if tempfile
raise
end

# Returns a `Down::ChunkedIO` object that downloads S3 object content
# on-demand. By default, read content will be cached onto disk so that
# it can be rewinded, but if you don't need that you can pass
Expand Down Expand Up @@ -531,12 +511,11 @@ def object(id)
bucket.object([*prefix, id].join("/"))
end

# Catches the deprecated `#stream` method.
def method_missing(name, *args)
if name == :stream
Shrine.deprecation("Shrine::Storage::S3#stream is deprecated over calling #each_chunk on S3#open.")
object = object(*args)
object.get { |chunk| yield chunk, object.content_length }
# Catches the deprecated `#download` and `#stream` methods.
def method_missing(name, *args, &block)
case name
when :stream then deprecated_stream(*args, &block)
when :download then deprecated_download(*args, &block)
else
super
end
Expand Down Expand Up @@ -634,6 +613,29 @@ def encode_content_disposition(content_disposition)
CGI.escape(filename).gsub("+", " ")
end
end

def deprecated_stream(id)
Shrine.deprecation("Shrine::Storage::S3#stream is deprecated over calling #each_chunk on S3#open.")
object = object(id)
object.get { |chunk| yield chunk, object.content_length }
end

def deprecated_download(id, **options)
Shrine.deprecation("Shrine::Storage::S3#download is deprecated over S3#open.")

tempfile = Tempfile.new(["shrine-s3", File.extname(id)], binmode: true)
data = object(id).get(response_target: tempfile, **options)
tempfile.content_type = data.content_type
tempfile.tap(&:open)
rescue
tempfile.close! if tempfile
raise
end

# Tempfile with #content_type accessor which represents downloaded files.
class Tempfile < ::Tempfile
attr_accessor :content_type
end
end
end
end
13 changes: 4 additions & 9 deletions lib/shrine/uploaded_file.rb
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,7 @@ def open(*args)
end
end

# Calls `#download` on the storage if the storage implements it,
# otherwise streams content into a newly created Tempfile.
# Streams content into a newly created Tempfile and returns it.
#
# If a block is given, the opened Tempfile object is yielded to the
# block, and at the end of the block it's automatically closed and
Expand All @@ -122,13 +121,9 @@ def open(*args)
#
# uploaded_file.download { |tempfile| tempfile.read } # tempfile is deleted
def download(*args)
if storage.respond_to?(:download)
tempfile = storage.download(id, *args)
else
tempfile = Tempfile.new(["shrine", ".#{extension}"], binmode: true)
stream(tempfile, *args)
tempfile.open
end
tempfile = Tempfile.new(["shrine", ".#{extension}"], binmode: true)
stream(tempfile, *args)
tempfile.open

block_given? ? yield(tempfile) : tempfile
ensure
Expand Down
21 changes: 2 additions & 19 deletions test/linter_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,31 +23,14 @@
end
end

describe "download" do
it "tests that returned object is a Tempfile" do
@storage.instance_eval { def download(id); File.open(__FILE__); end }
assert_raises(Shrine::LintError) { @linter.call }
end

it "tests that returned IO is not empty" do
@storage.instance_eval { def download(id); Tempfile.new(""); end }
assert_raises(Shrine::LintError) { @linter.call }
end

it "doesn't require #download to be defined" do
@storage.instance_eval { undef download }
@linter.call
end
end

describe "open" do
it "tests that returned object is an IO" do
@storage.instance_eval { def download(id); StringIO.new("foo").tap { |io| io.instance_eval{undef rewind} }; end }
@storage.instance_eval { def open(id); StringIO.new("foo").tap { |io| io.instance_eval{undef rewind} }; end }
assert_raises(Shrine::LintError) { @linter.call }
end

it "tests that returned IO is not empty" do
@storage.instance_eval { def download(id); StringIO.new; end }
@storage.instance_eval { def open(id); StringIO.new; end }
assert_raises(Shrine::LintError) { @linter.call }
end
end
Expand Down
106 changes: 54 additions & 52 deletions test/s3_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -375,54 +375,6 @@ def s3(**options)
end
end

describe "#download" do
it "downloads the object to a Tempfile" do
@s3.client.stub_responses(:get_object, body: "content")
tempfile = @s3.download("foo")
assert_kind_of Tempfile, tempfile
assert_equal "content", tempfile.read
end

it "opens the Tempfile in binary mode" do
tempfile = @s3.download("foo")
assert tempfile.binmode?
end

it "makes a single #get_object API call" do
@s3.download("foo")
assert_equal 1, @s3.client.api_requests.count
assert_equal :get_object, @s3.client.api_requests[0][:operation_name]
assert_equal "foo", @s3.client.api_requests[0][:params][:key]
end

it "forwards additional options to #get_object" do
@s3.download("foo", range: "bytes=0-100")
assert_equal :get_object, @s3.client.api_requests[0][:operation_name]
assert_equal "bytes=0-100", @s3.client.api_requests[0][:params][:range]
end

it "respects :prefix" do
@s3 = s3(prefix: "prefix")
@s3.download("foo")
assert_equal :get_object, @s3.client.api_requests[0][:operation_name]
assert_equal "prefix/foo", @s3.client.api_requests[0][:params][:key]
end

it "deletes the Tempfile if an error occurs while retrieving file contents" do
@s3.client.stub_responses(:get_object, "NetworkingError")
tempfile = Tempfile.new("")
Tempfile.stubs(:new).returns(tempfile)
assert_raises(Aws::S3::Errors::NetworkingError) { @s3.download("foo") }
assert tempfile.closed?
assert_nil tempfile.path
end

it "propagates failures in creating tempfiles" do
Tempfile.stubs(:new).raises(Errno::EMFILE) # too many open files
assert_raises(Errno::EMFILE) { @s3.download("foo") }
end
end

describe "#open" do
it "returns a Down::ChunkedIO which downloads the object" do
@s3.client.stub_responses(:get_object, body: "content")
Expand Down Expand Up @@ -697,10 +649,60 @@ def s3(**options)
end

describe "#method_missing" do
deprecated "implements #stream" do
@s3.client.stub_responses(:head_object, content_length: 7)
@s3.client.stub_responses(:get_object, body: "content")
assert_equal [["content", 7]], @s3.enum_for(:stream, "foo").to_a
describe "#stream" do
deprecated "yields downloaded content" do
@s3.client.stub_responses(:head_object, content_length: 7)
@s3.client.stub_responses(:get_object, body: "content")
assert_equal [["content", 7]], @s3.enum_for(:stream, "foo").to_a
end
end

describe "#download" do
deprecated "downloads the object to a Tempfile" do
@s3.client.stub_responses(:get_object, body: "content")
tempfile = @s3.download("foo")
assert_kind_of Tempfile, tempfile
assert_equal "content", tempfile.read
end

deprecated "opens the Tempfile in binary mode" do
tempfile = @s3.download("foo")
assert tempfile.binmode?
end

deprecated "makes a single #get_object API call" do
@s3.download("foo")
assert_equal 1, @s3.client.api_requests.count
assert_equal :get_object, @s3.client.api_requests[0][:operation_name]
assert_equal "foo", @s3.client.api_requests[0][:params][:key]
end

deprecated "forwards additional options to #get_object" do
@s3.download("foo", range: "bytes=0-100")
assert_equal :get_object, @s3.client.api_requests[0][:operation_name]
assert_equal "bytes=0-100", @s3.client.api_requests[0][:params][:range]
end

deprecated "respects :prefix" do
@s3 = s3(prefix: "prefix")
@s3.download("foo")
assert_equal :get_object, @s3.client.api_requests[0][:operation_name]
assert_equal "prefix/foo", @s3.client.api_requests[0][:params][:key]
end

deprecated "deletes the Tempfile if an error occurs while retrieving file contents" do
@s3.client.stub_responses(:get_object, "NetworkingError")
tempfile = Tempfile.new("")
Tempfile.stubs(:new).returns(tempfile)
assert_raises(Aws::S3::Errors::NetworkingError) { @s3.download("foo") }
assert tempfile.closed?
assert_nil tempfile.path
end

deprecated "propagates failures in creating tempfiles" do
Tempfile.stubs(:new).raises(Errno::EMFILE) # too many open files
assert_raises(Errno::EMFILE) { @s3.download("foo") }
end
end

it "calls super for other methods" do
Expand Down
7 changes: 0 additions & 7 deletions test/support/test_storage.rb
Original file line number Diff line number Diff line change
@@ -1,15 +1,8 @@
require "shrine/storage/memory"
require "tempfile"

class Shrine
module Storage
class Test < Memory
def download(id)
tempfile = Tempfile.new(["shrine", File.extname(id)], binmode: true)
IO.copy_stream(open(id), tempfile)
tempfile.tap(&:open)
end

def move(io, id, **options)
store[id] = io.storage.delete(io.id)
end
Expand Down

0 comments on commit dcb5b2a

Please sign in to comment.