Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse InputStream for ResourceRegionHttpMessageConverter #24214

Conversation

randomnicode
Copy link

@randomnicode randomnicode commented Dec 15, 2019

Dynamically generated input streams cannot be closed and reopened
multiple times. This fix reuses the same InputStream across multiple
ResourceRegion writes.

This solution does not presume anything about the underlying Resources
across different ResourceRegions. It will attempt to reuse an InputStream
if it's already open for a given Resource. If the range request is out of order,
then the solution closes the existing open stream and attempts to open a
new one. This is, at worst (all ranges are out of order or overlapping),
consistent with the existing solution, and at best (all ranges are non
overlapping and in order), reuses the same InputStream to help matters.

This is a proposal for and closes #24210

Dynamically generated input streams cannot be closed and reopened
multiple times. This fix reuses the same InputStream across multiple
ResourceRegion writes.

This is a proposal for and closes spring-projects#24210
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Dec 15, 2019
@rstoyanchev rstoyanchev self-assigned this Dec 16, 2019
@rstoyanchev rstoyanchev added in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Dec 16, 2019
@rstoyanchev rstoyanchev added this to the 5.2.3 milestone Dec 16, 2019
Copy link
Contributor

@rstoyanchev rstoyanchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, it's reasonable to assume 1 request serves (regions for) 1 resource. This is codified in HttpRange#toResourceRegions which creates regions given List<HttpRange> and a single Resource. So there doesn't have to be a Map.

A real problem with this solution however is that start positions for ranges are relative to the beginning of the InputStream but after the first region, the InputStream is no longer at the beginning. This is confirmed by ResourceRegionHttpMessageConverterTests failing to pass partialContentMultipleByteRanges.

One could keep a current index of where the InputStream is at and adjust start positions accordingly. However two more issues would still remain. It isn't illegal for ranges to overlap. It is also not illegal for them to be out of order. The spec says:

   A server that supports range requests MAY ignore or reject a Range
   header field that consists of more than two overlapping ranges, or a
   set of many small ranges that are not listed in ascending order,
   since both are indications of either a broken client or a deliberate
   denial-of-service attack (Section 6.1).  A client SHOULD NOT request
   multiple ranges that are inherently less efficient to process and
   transfer than a single range that encompasses the same data.

   A client that is requesting multiple ranges SHOULD list those ranges
   in ascending order (the order in which they would typically be
   received in a complete representation) unless there is a specific
   need to request a later part earlier.  For example, a user agent
   processing a large representation with an internal catalog of parts
   might need to request later parts first, particularly if the
   representation consists of pages stored in reverse order and the user
   agent wishes to transfer one page at a time.

The only option for a solution I can think of then is to check whether regions are in ascending order and do not overlap, and if so allow re-use of the InputStream.

Feel free to update the solution.

@rstoyanchev rstoyanchev removed this from the 5.2.3 milestone Dec 16, 2019
@rstoyanchev rstoyanchev added status: waiting-for-feedback We need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged or decided on labels Dec 16, 2019
@randomnicode
Copy link
Author

randomnicode commented Dec 16, 2019

Thanks for the feedback!

While 1 Resource per request is a reasonable expectation, (and indeed my original PR did not have a Map), I reconsidered it because the issue is this: the method argument gets a Collection<ResourceRegion> with no guarantees about the underlying Resource for each of those objects. Extracting the Resource from the first ResourceRegion of the collection, and thereafter using it for all other ResourceRegion (even though they have their own Resource) feels like code smell to me (how is this method supposed to know that the underlying Resource for all the ResourceRegions provided is the same?). Thus in my issue (#24210), I suggested rearchitecting ResourceRegion. It seemed more reasonable to me to have a ResourceRegions class that consisted of 1 Resource and a collection of ranges, rather than have a collection of ResourceRegions each of which has a Resource object (that actually ends up being the same Resource object anyway). This then causes the downstream issues and necessitates assumptions such as what we're trying to make. If I'm allowed to, I can try rewriting those classes and submitting that solution, but it'll be a longer one (as it seems the ResourceRegion class is used elsewhere too).

The issue about the stream index being out of place is a good one and easier to mitigate with the aforementioned rewrite, when a ResourceRegion itself contains a list of ranges and one Resource, but I can try and fix it here within this scope.

Lastly, about the range order. I had pegged that point in an earlier revision of #24210, noting that we'd likely need to sort (and merge) ranges for this to be a viable option but I wasn't sure what the spec said. Is this still an assumption we could make? We could just sort and merge the ranges and respond with the sorted (and merged) order. The client wouldn't get the ranges in the order they requested, but as the spec mentions it would likely be a broken client or a DoS attack anyway. If someone actually wants out-of-order ranges, then they could just submit multiple requests (which they would realistically anyway, for instance for a zip directory: get the last part, process it, get a portion of the earlier parts based on the processing result)?

Part of my assumption comes from this piece of code: https://github.com/spring-projects/spring-framework/blob/master/spring-web/src/main/java/org/springframework/http/HttpRange.java#L191-L194 which implies that even though overlapping ranges are allowed by spec, we de facto don't allow it or at least don't expect it (if there are overlapping ranges, then by definition the total byte count for some requests would be greater than the size of the Resource, which we don't allow). Consider a byte range request of 0-499,0-499 for a 500 byte resource as an example, which is a valid range, but total byte count of 1000 for the ranges > resource size of 500 would throw an exception.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Dec 16, 2019
@randomnicode
Copy link
Author

randomnicode commented Dec 16, 2019

I've pushed out a fix addressing the issues and updated the PR description. The new solution should be consistent with the existing solution for out-of-order (and overlapping) requests.

@rstoyanchev
Copy link
Contributor

the method argument gets a Collection with no guarantees about the underlying Resource for each of those objects

In practice however ResourceRegion is only ever instantiated through HttpRangeToResourceRegions. This could be verified through an assertion, or more likely a fallback that simply re-opens the stream if it isn't the same Resource, which should never happen.

Thus in my issue (#24210), I suggested rearchitecting ResourceRegion

Not an option at this time. The downsides of such refactoring, in a framework at least, outweigh any benefits.

Part of my assumption comes from this piece of code: https://github.com/spring-projects/spring-framework/blob/master/spring-web/src/main/java/org/springframework/http/HttpRange.java#L191-L194 which implies that even though overlapping ranges are allowed by spec, we de facto don't allow it or at least don't expect it

That rules is meant to put a boundary on the amount of content that can be sent. It does not prevent overlap, e.g. 0-10,5-15 out of 100.

The framework tests do not pass with changes from the PR. However, I'm taking a closer look and based on the proposal will implement a similar solution.

@rstoyanchev rstoyanchev removed status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged or decided on labels Dec 18, 2019
@rstoyanchev rstoyanchev added this to the 5.2.3 milestone Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ResourceRegions converter opens and closes InputStream multiple times for multiple HttpRanges
3 participants