Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't retry streaming requests with blocks after data has been recieved #1617

Merged
merged 7 commits into from Jun 11, 2020

Conversation

janko
Copy link
Contributor

@janko janko commented Sep 19, 2017

When downloading S3 objects with a block, aws-sdk-s3 internally creates a BlockIO to be the writable response target.

object.get do |chunk|
  # streaming
end

Normally aws-sdk-s3 retries any failed downloads in case of network errors, but not for streaming downloads. This is because BlockIO is not truncatable, unlike a response target like StringIO or Tempfile, and aws-sdk-s3 needs it to be because it retries the whole download from the start.

In Shrine and tus-ruby-server I'm using aws-sdk-s3's streaming download feature, mainly for streaming the S3 object through a web application. It would be great if these downloads were to be retried in case of network errors in this case as well, especially for tus-ruby-server where the S3 objects will typically be very large.

This PR adds this functionality. It does so by remembering how many bytes of the content was "written" to the response target so far; when the request is retried, part of the response body that has already been written is simply skipped, until it reaches the part it hasn't written yet.

After this PR I plan to add support for range requests, so that the requests are retried from the last byte offset. But for start I wanted to add this because I didn't know whether all S3 endpoints support range requests, so I thought we would still need this as a safety net.

@janko
Copy link
Contributor Author

janko commented Sep 19, 2017

This is not yet ready for review, there is a complication with TruncatedBodyError.

@janko janko force-pushed the retry-streaming-s3-object-downloads branch from 07f1f5e to d099329 Compare September 20, 2017 00:36
@janko
Copy link
Contributor Author

janko commented Sep 20, 2017

Ready for review.

I changed the code to truncate the response target only in case of TruncatedBodyError, which is raised when the number of bytes received doesn't match Content-Length. In that case we don't know which bytes are missing, so if we're streaming we have to raise an error in that case (as is the current behaviour).

This PR also makes the assumption that the response target will always respond to #size, so I needed to add it to IODecrypter. This then means that IO objects like Ruby pipes (result of IO.pipe) can't be used as the :response_target anymore, because they don't respond to #size. Let me know if this assumption is not something that we would want, and I can revert to tracking the bytes written in a separate context variable, as I did initially.

@janko
Copy link
Contributor Author

janko commented Sep 22, 2017

Note that this addresses the feature request discussed in #1535 (that is, it will be fully addressed once Range header is added on retries).

@janko janko force-pushed the retry-streaming-s3-object-downloads branch from d099329 to 9c746d0 Compare April 29, 2018 10:32
Copy link
Member

@awood45 awood45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, just need some context to finalize.

gems/aws-sdk-core/lib/seahorse/client/net_http/handler.rb Outdated Show resolved Hide resolved
@awood45
Copy link
Member

awood45 commented Jun 28, 2018

Additionally, administrivia, can you confirm: "By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license."

@janko
Copy link
Contributor Author

janko commented Jul 6, 2018

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@janko janko force-pushed the retry-streaming-s3-object-downloads branch from 9c746d0 to 90c5d9b Compare January 6, 2019 18:00
@mullermp
Copy link
Contributor

mullermp commented Oct 9, 2019

@janko Is this PR still desirable? If so I will do a review and get this in.

@janko
Copy link
Contributor Author

janko commented Oct 10, 2019

Yes, I think it would still be useful to have that capability. Personally I haven't had experience with downloading large S3 objects, so I don't know how often these network errors might happen. But considering that the SDK has default retries in normal cases, it seems like it does happen.

I ideally wanted to later combine it with Range requests, so that we don't need to download content that's already "written". The desired logic would be similar to what google-api-client does.

But I couldn't figure out how to make it work with client-side encryption. I guess I will try to find a way to skip that behaviour when client-side encryption is used. Anyway, that would be in a separate PR.

@mullermp mullermp added pr/needs-review This PR needs a review from a Member. and removed work-in-progress labels Oct 10, 2019
@mullermp
Copy link
Contributor

Thanks for the info. The code looks good to me. I would gladly merge it in.

There is however a broken test:

Aws::S3::Encryption::Client encryption methods #get_object decrypts the object with response target under retry

Could you update this PR to fix that?

@mullermp mullermp added needs-tests and removed pr/needs-review This PR needs a review from a Member. labels Oct 10, 2019
@mullermp
Copy link
Contributor

I attempted this but I hit a road block as well. It seems the chunk size is 64 while the response body is 48. This could be from cipher padding?

next_chunk 16 # in if condition
prev_chunk 48 # in if condition
chunk size: 64 # before signal data
final receive: 16 # before complete_response
      decrypts the object with response target under retry (FAILED - 1)

@janko
Copy link
Contributor Author

janko commented Nov 21, 2019

Thanks a lot for working on this 😃 Yes, the client-side encryption is definitely difficult to handle, I spent a lot of time trying to figure it out before. I think I would skip the retry mechanism when client-side encryption is used, if that's possible to detect.

@mullermp
Copy link
Contributor

mullermp commented Nov 22, 2019

No problem. I wanted to waste my afternoon debugging this! (just kidding of course...)

I found that it can certainly be skipped with:

if Aws::S3::Encryption::IODecrypter != resp.body.class && bytes_received < resp.body.size

We can do this as a last resort but where's the fun in that? If the chunk size is 64, but the response body size is 48, what are the missing 16 bytes? I feel like we're closer now.

Edit: have to set this aside for now.. if you're feeling inspired, please take a look again!

@alextwoods
Copy link
Contributor

I've been looking into fixes for #2311 and this seems related.

I've updated this PR with a fix for the S3 encryption client. The issue is that the underlying body that the IODecrypter wraps is NOT reset between retries (The IODecrypter is recreated with the previous StringIO on the new request).

There are a few additional issues I think that I've tried to solve:

  1. When a request is retried, it would get a new BlockIO created and hence updating size and then checking it wouldn't be meaningful. On 2XX headers, I changed it to only create a new target_response IO if the current body is a StringIO.
  2. When an error is encountered, the old code would set the body to StringIO.new (unless the body responded to :io). This was first added here and is intended to keep us from writing errors to either the block or the provided file and was modified later to add a check for the IODecrypter (which manages errors itself). If we reset the body to a new StringIO object on retryable errors, the next retry will create a new BlockIO with size 0. However, this change now opens us to the possibility of yielding errors to the passed block.

Additionally - this change will not work for the S3::Encryption::Client when used with a block because we need to create a new cipher on retry.

@alextwoods
Copy link
Contributor

There are a few issues with specs from the merge with master, but I think we can resolve those fairly easily (mostly the changes need to be applied into the adaptive retry logic as well).

However, there are still 2 main issues I mentioned above. Specifically:

  1. This does not work for client side encryption when using a block (the current code results in incorrect decryption because the cipher is reset between retries).
  2. We would yield an error body to the provided block (a consequence of removing the setting of body to a new StringIO which is required so that the BlockIO w/ its stored size can be re-used in the next request when retired).

@janko - Given these I think we have a few options and I wanted to get your take.

  1. We can move forward with this PR. For client side encryption, we drop support for retry with a block for 200 ok responses (ie, we would still retry if we get say a throttling exception, but after we start getting data in the body and we're using a block as the response_target, if the request fails, we don't retry). To fix the error body yielded to the block - we can revert my change, but store the previous body object (ie, the BlockIO) on the context and then in the on headers in the response_target plugin we just re-use that previous body object from the context if it exists.
  2. We drop this PR and don't provide support for retires after 200ok, for generic streaming operations when a block is used. However, we can then create a plugin specific to S3 get_object which is able to take advantage of the range parameter to avoid having to re-download/re-request data that the client has already received. S3's get_object is the only streaming operation currently that supports the range parameter, so regardless we'd need specific behavior to support this.

At this point I'd lean towards #2 - I'd prefer to reduce the complexity of the response_target and retry code for generic streaming operations and do provide the best support for retries of interrupted downloads in the get_object operations. What do you think?

@janko
Copy link
Contributor Author

janko commented Jun 10, 2020

@alextwoods Thank you for picking this up, option 2️⃣ sounds good to me 👍

@alextwoods
Copy link
Contributor

I've created #2326 as a feature request for adding the S3 plugin for retry with range. I've updated this PR to keep a few things that will be useful for that request + fix #2311. If you're able to pitch in on the get_object specific retry w/ range (#2326 ) that would be great, otherwise, I'm hoping to have some time next week to pick it up.

Copy link
Contributor

@mullermp mullermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good however merge from master looks to be wrong

@alextwoods alextwoods changed the title Retry streaming S3 object downloads Don't retry streaming requests with blocks after data has been recieved Jun 11, 2020
@alextwoods alextwoods merged commit 9c05213 into aws:master Jun 11, 2020
@alextwoods
Copy link
Contributor

@janko - I've created a draft PR for adding retry support using the range header for S3 get_objects: #2343 - Since get_objects is the only streaming operation that supports range its limited to that operation. Additionally because of the way we do retries, the code gets a bit messy. Lots of special cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants