Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add presigned_url method. #35

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add presigned_url method. #35

wants to merge 2 commits into from

Conversation

danielroseman
Copy link
Contributor

For GCS, return an actual presigned URL for the file. For disk, return a
standard file:// url. For inmemory, return the original URL, for the
sake of completeness.

Copy link
Contributor

@JoeSouthan JoeSouthan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved codewise might be worth checking that we want this at all

@@ -53,6 +53,12 @@ def delete!(bucket:, key:)
true
end

# rubocop: disable Lint/UnusedMethodArgument
def presigned_url(bucket:, key:, expiry:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def presigned_url(bucket:, key:, expiry:)
def presigned_url(bucket:, key:, **args)

Does this work? Might be able to get rid of the rubocop comments

@ivgiuliani
Copy link
Contributor

ivgiuliani commented May 10, 2021

I'm trying to figure out the main use cases for this from a caller perspective. I imagine we need to make these two things easy-ish:

  • generate a valid url so that external users can download the file. This needs to take into account a certain number of options (e.g. expiration, authentication, ...) which will likely be specific to the provider (we currently only support GCS but if we ever add support for S3 or others this could be an issue). GCS has a fair number of these options.
  • make it easy for callers to test that a part of their process is generating a bunch of valid-looking urls (e.g. expect( thing.generate_presigned_urls).to match_array('http://bucket/url.txt', 'http://bucket/url2.txt')). I think in this case callers will not (normally) want to check that the content of the file is correct as that will require them executing an http request or similar in any case, so in that sense I reckon returning dummy values for adapters that don't support this natively could be ok

If we don't implement presigned urls in here, I think clients can still do it (for GCS, at least) with something like this, which is not great but not too awful either:

FileStorage.for("gs://bucket/url.txt").tap do |ks|
  gcs = Google::Cloud::Storage.new # we don't explicitly configure auth via FileStorage, for now
  gcs.bucket(ks.bucket, skip_lookup: true).file(ks.key).signed_url(...)
end

Just wondering what options do we have available, off the top of my head:

  1. fog solves this by exposing a connection object, which is the internal Aws::S3/Google::Cloud::Storage instance, we could do something similar which could help callers in other use cases as well, but we'll still have some problems with the inmemory and disk adapters and doesn't seem to make things that much better anyway:
FileStorage.for("gs://bucket/url.txt").tap do |ks|
  ks.connection.bucket(ks.bucket, skip_lookup: true).file(ks.key).signed_url(...)
end
  1. only support a restricted number of options (e.g. expiry time), and if you need something more complex then you can use the solution above (largely what this PR does). For things like inmemory we can only return dummy values but I reckon this is fine as the main tests you'd want to write in this case is that we get something that looks like a URL:
expect { URI.parse(FileStorage.for("inmemory://bucket/url.txt").presigned_url(expiry: 3.years)) }.to_not raise_error
  1. accept that individual calls to presigned_url will need to account for provider-specific options though we can make some effort in standardising the naming of the options. This breaks the abstraction a bit, but it's probably not too bad overall and is very similar to what this PR is doing (the main difference is that the responsibility of the argument parsing is largely left to the adapter):
FileStorage.for("gs://bucket/url.txt").presigned_url(expiry: 1.year, authentication: ...)
FileStorage.for("s3://bucket/url.txt").presigned_url(expiry: 1.year, authentication: ...) # crash if `authentication` is not supported

# disk and inmemory will ignore most options as they don't make sense for them while still returning a valid-looking URI
FileStorage.for("inmemory://bucket/url.txt").presigned_url(expiry: 1.year, authentication: ...)
FileStorage.for("disk://bucket/url.txt").presigned_url(expiry: 1.year, authentication: ...)

I'm leaning more towards option 3 in this case which is not a huge change from what this PR is currently doing (I reckon we need to just proxy all the arguments to presigned_url on KeyStorage to the adapters) but leaves enough flexibility for callers to do whatever they need to, at the cost of breaking the abstraction a bit. Thoughts?

For GCS, return an actual presigned URL for the file. For disk, return a
standard `file://` url. For inmemory, return the original URL, for the
sake of completeness.
@ivgiuliani
Copy link
Contributor

Rebased this against master, I'll try to revive it next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants