Don't retry streaming requests with blocks after data has been recieved #1617

janko · 2017-09-19T12:15:07Z

When downloading S3 objects with a block, aws-sdk-s3 internally creates a BlockIO to be the writable response target.

object.get do |chunk|
  # streaming
end

Normally aws-sdk-s3 retries any failed downloads in case of network errors, but not for streaming downloads. This is because BlockIO is not truncatable, unlike a response target like StringIO or Tempfile, and aws-sdk-s3 needs it to be because it retries the whole download from the start.

In Shrine and tus-ruby-server I'm using aws-sdk-s3's streaming download feature, mainly for streaming the S3 object through a web application. It would be great if these downloads were to be retried in case of network errors in this case as well, especially for tus-ruby-server where the S3 objects will typically be very large.

This PR adds this functionality. It does so by remembering how many bytes of the content was "written" to the response target so far; when the request is retried, part of the response body that has already been written is simply skipped, until it reaches the part it hasn't written yet.

After this PR I plan to add support for range requests, so that the requests are retried from the last byte offset. But for start I wanted to add this because I didn't know whether all S3 endpoints support range requests, so I thought we would still need this as a safety net.

janko · 2017-09-19T13:39:56Z

This is not yet ready for review, there is a complication with TruncatedBodyError.

janko · 2017-09-20T10:23:45Z

Ready for review.

I changed the code to truncate the response target only in case of TruncatedBodyError, which is raised when the number of bytes received doesn't match Content-Length. In that case we don't know which bytes are missing, so if we're streaming we have to raise an error in that case (as is the current behaviour).

This PR also makes the assumption that the response target will always respond to #size, so I needed to add it to IODecrypter. This then means that IO objects like Ruby pipes (result of IO.pipe) can't be used as the :response_target anymore, because they don't respond to #size. Let me know if this assumption is not something that we would want, and I can revert to tracking the bytes written in a separate context variable, as I did initially.

janko · 2017-09-22T18:00:34Z

Note that this addresses the feature request discussed in #1535 (that is, it will be fully addressed once Range header is added on retries).

awood45

Generally looks good, just need some context to finalize.

gems/aws-sdk-core/lib/seahorse/client/net_http/handler.rb

awood45 · 2018-06-28T16:58:57Z

Additionally, administrivia, can you confirm: "By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license."

janko · 2018-07-06T18:50:51Z

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mullermp · 2019-10-09T21:45:01Z

@janko Is this PR still desirable? If so I will do a review and get this in.

janko · 2019-10-10T07:11:39Z

Yes, I think it would still be useful to have that capability. Personally I haven't had experience with downloading large S3 objects, so I don't know how often these network errors might happen. But considering that the SDK has default retries in normal cases, it seems like it does happen.

I ideally wanted to later combine it with Range requests, so that we don't need to download content that's already "written". The desired logic would be similar to what google-api-client does.

But I couldn't figure out how to make it work with client-side encryption. I guess I will try to find a way to skip that behaviour when client-side encryption is used. Anyway, that would be in a separate PR.

mullermp · 2019-10-10T20:30:14Z

Thanks for the info. The code looks good to me. I would gladly merge it in.

There is however a broken test:

Aws::S3::Encryption::Client encryption methods #get_object decrypts the object with response target under retry

Could you update this PR to fix that?

mullermp · 2019-11-21T23:27:46Z

I attempted this but I hit a road block as well. It seems the chunk size is 64 while the response body is 48. This could be from cipher padding?

next_chunk 16 # in if condition
prev_chunk 48 # in if condition
chunk size: 64 # before signal data
final receive: 16 # before complete_response
      decrypts the object with response target under retry (FAILED - 1)

janko · 2019-11-21T23:35:38Z

Thanks a lot for working on this 😃 Yes, the client-side encryption is definitely difficult to handle, I spent a lot of time trying to figure it out before. I think I would skip the retry mechanism when client-side encryption is used, if that's possible to detect.

mullermp · 2019-11-22T00:01:18Z

No problem. I wanted to waste my afternoon debugging this! (just kidding of course...)

I found that it can certainly be skipped with:

if Aws::S3::Encryption::IODecrypter != resp.body.class && bytes_received < resp.body.size

We can do this as a last resort but where's the fun in that? If the chunk size is 64, but the response body size is 48, what are the missing 16 bytes? I feel like we're closer now.

Edit: have to set this aside for now.. if you're feeling inspired, please take a look again!

alextwoods · 2020-06-09T00:06:46Z

I've been looking into fixes for #2311 and this seems related.

I've updated this PR with a fix for the S3 encryption client. The issue is that the underlying body that the IODecrypter wraps is NOT reset between retries (The IODecrypter is recreated with the previous StringIO on the new request).

There are a few additional issues I think that I've tried to solve:

When a request is retried, it would get a new BlockIO created and hence updating size and then checking it wouldn't be meaningful. On 2XX headers, I changed it to only create a new target_response IO if the current body is a StringIO.
When an error is encountered, the old code would set the body to StringIO.new (unless the body responded to :io). This was first added here and is intended to keep us from writing errors to either the block or the provided file and was modified later to add a check for the IODecrypter (which manages errors itself). If we reset the body to a new StringIO object on retryable errors, the next retry will create a new BlockIO with size 0. However, this change now opens us to the possibility of yielding errors to the passed block.

Additionally - this change will not work for the S3::Encryption::Client when used with a block because we need to create a new cipher on retry.

alextwoods · 2020-06-09T20:35:00Z

There are a few issues with specs from the merge with master, but I think we can resolve those fairly easily (mostly the changes need to be applied into the adaptive retry logic as well).

However, there are still 2 main issues I mentioned above. Specifically:

This does not work for client side encryption when using a block (the current code results in incorrect decryption because the cipher is reset between retries).
We would yield an error body to the provided block (a consequence of removing the setting of body to a new StringIO which is required so that the BlockIO w/ its stored size can be re-used in the next request when retired).

@janko - Given these I think we have a few options and I wanted to get your take.

We can move forward with this PR. For client side encryption, we drop support for retry with a block for 200 ok responses (ie, we would still retry if we get say a throttling exception, but after we start getting data in the body and we're using a block as the response_target, if the request fails, we don't retry). To fix the error body yielded to the block - we can revert my change, but store the previous body object (ie, the BlockIO) on the context and then in the on headers in the response_target plugin we just re-use that previous body object from the context if it exists.
We drop this PR and don't provide support for retires after 200ok, for generic streaming operations when a block is used. However, we can then create a plugin specific to S3 get_object which is able to take advantage of the range parameter to avoid having to re-download/re-request data that the client has already received. S3's get_object is the only streaming operation currently that supports the range parameter, so regardless we'd need specific behavior to support this.

At this point I'd lean towards #2 - I'd prefer to reduce the complexity of the response_target and retry code for generic streaming operations and do provide the best support for retries of interrupted downloads in the get_object operations. What do you think?

janko · 2020-06-10T09:37:14Z

@alextwoods Thank you for picking this up, option 2️⃣ sounds good to me 👍

resets of IO setting of IO on error

alextwoods · 2020-06-10T23:51:32Z

I've created #2326 as a feature request for adding the S3 plugin for retry with range. I've updated this PR to keep a few things that will be useful for that request + fix #2311. If you're able to pitch in on the get_object specific retry w/ range (#2326 ) that would be great, otherwise, I'm hoping to have some time next week to pick it up.

mullermp

looks good however merge from master looks to be wrong

…object-downloads

alextwoods · 2020-06-23T18:19:56Z

@janko - I've created a draft PR for adding retry support using the range header for S3 get_objects: #2343 - Since get_objects is the only streaming operation that supports range its limited to that operation. Additionally because of the way we do retries, the code gets a bit messy. Lots of special cases.

janko force-pushed the retry-streaming-s3-object-downloads branch from 07f1f5e to d099329 Compare September 20, 2017 00:36

janko force-pushed the retry-streaming-s3-object-downloads branch from d099329 to 9c746d0 Compare April 29, 2018 10:32

awood45 suggested changes Jun 28, 2018

View reviewed changes

gems/aws-sdk-core/lib/seahorse/client/net_http/handler.rb Outdated Show resolved Hide resolved

awood45 added the pending label Jun 28, 2018

This was referenced Dec 2, 2018

WIP sketch for discussion: Idempotent/reusable UploadedFile#download approach shrinerb/shrine#329

Closed

Remove Storage#download shrinerb/shrine#331

Merged

diehlaws added work-in-progress and removed pending labels Jan 4, 2019

Retry streaming S3 object downloads

90c5d9b

janko force-pushed the retry-streaming-s3-object-downloads branch from 9c746d0 to 90c5d9b Compare January 6, 2019 18:00

mullermp added pr/needs-review This PR needs a review from a Member. and removed work-in-progress labels Oct 10, 2019

mullermp added needs-tests and removed pr/needs-review This PR needs a review from a Member. labels Oct 10, 2019

ajredniwja removed needs-tests labels Feb 26, 2020

Rest IO in IODecrypter on retries and re-use BlockIO on retries.

bc3af36

alextwoods mentioned this pull request Jun 8, 2020

Streaming HTTP requests retried after sending data #2311

Closed

3 tasks

Merge branch 'master' into retry-streaming-s3-object-downloads

da48e0a

alextwoods added 2 commits June 10, 2020 16:05

Remove generic retry of streaming responses after getting a 2XX. Keep

57e250b

resets of IO setting of IO on error

Ensure block_io records the chunk_size

fd3fe7f

alextwoods mentioned this pull request Jun 10, 2020

Retry streaming S3 object downloads using range #2326

Closed

mullermp approved these changes Jun 11, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into retry-streaming-s3-…

7aa61be

…object-downloads

mullermp approved these changes Jun 11, 2020

View reviewed changes

Add changelog entry

2e64464

alextwoods changed the title ~~Retry streaming S3 object downloads~~ Don't retry streaming requests with blocks after data has been recieved Jun 11, 2020

alextwoods merged commit 9c05213 into aws:master Jun 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't retry streaming requests with blocks after data has been recieved #1617

Don't retry streaming requests with blocks after data has been recieved #1617

janko commented Sep 19, 2017 •

edited

janko commented Sep 19, 2017

janko commented Sep 20, 2017 •

edited

janko commented Sep 22, 2017

awood45 left a comment

awood45 commented Jun 28, 2018

janko commented Jul 6, 2018

mullermp commented Oct 9, 2019

janko commented Oct 10, 2019 •

edited

mullermp commented Oct 10, 2019

mullermp commented Nov 21, 2019

janko commented Nov 21, 2019

mullermp commented Nov 22, 2019 •

edited

alextwoods commented Jun 9, 2020

alextwoods commented Jun 9, 2020

janko commented Jun 10, 2020

alextwoods commented Jun 10, 2020

mullermp left a comment

alextwoods commented Jun 23, 2020

Don't retry streaming requests with blocks after data has been recieved #1617

Don't retry streaming requests with blocks after data has been recieved #1617

Conversation

janko commented Sep 19, 2017 • edited

janko commented Sep 19, 2017

janko commented Sep 20, 2017 • edited

janko commented Sep 22, 2017

awood45 left a comment

Choose a reason for hiding this comment

awood45 commented Jun 28, 2018

janko commented Jul 6, 2018

mullermp commented Oct 9, 2019

janko commented Oct 10, 2019 • edited

mullermp commented Oct 10, 2019

mullermp commented Nov 21, 2019

janko commented Nov 21, 2019

mullermp commented Nov 22, 2019 • edited

alextwoods commented Jun 9, 2020

alextwoods commented Jun 9, 2020

janko commented Jun 10, 2020

alextwoods commented Jun 10, 2020

mullermp left a comment

Choose a reason for hiding this comment

alextwoods commented Jun 23, 2020

janko commented Sep 19, 2017 •

edited

janko commented Sep 20, 2017 •

edited

janko commented Oct 10, 2019 •

edited

mullermp commented Nov 22, 2019 •

edited