Add support for processing requests with StreamContent to AddStandardHedgingHandler() #5112

adamhammond · 2024-04-16T23:05:16Z

Add support for processing HttpRequestMessage objects containing StreamContent to the AddStandardHedgingHandler() resilience API.

This change does not update any public API contracts. It updates internal and private API contracts only.

Microsoft Reviewers: Open in CodeFlow

Add support for HttpRequestMessage objects containing StreamContent to the AddStandardHedgingHandler() resilience API. This change does not update any public API contracts. It updates internal and private API contracts only. Link to issue: dotnet#5105

Add support for HttpRequestMessage objects containing StreamContent to the AddStandardHedgingHandler() resilience API. This change does not update any public API contracts. It updates internal and private API contracts only. This is a small commit that is part of a larger PR. See the PR and its corresponding initial commit for the full set of changes.

Add support for HttpRequestMessage objects containing StreamContent to the AddStandardHedgingHandler() resilience API. This change does not update any public API contracts. It updates internal and private API contracts only. This is a small commit to update the ConfigureAwait arg on an async

amcasey · 2024-04-17T20:34:26Z

Apparently, the code coverage for M.E.Http.Resilience has dropped from 97% to 96.75% and that's why CI failed.

...icrosoft.Extensions.Http.Resilience/Hedging/ResilienceHttpClientBuilderExtensions.Hedging.cs

src/Libraries/Microsoft.Extensions.Http.Resilience/Internal/RequestMessageSnapshot.cs

joperezr · 2024-04-18T16:29:37Z

cc: @iliar-turdushev can you take a peek?

Add support for HttpRequestMessage objects containing StreamContent to the AddStandardHedgingHandler() resilience API. This change does not update any public API contracts. It updates internal and private API contracts only. This is a small commit to resolve comments made on the PR

Add support for HttpRequestMessage objects containing StreamContent to the AddStandardHedgingHandler() resilience API. This change does not update any public API contracts. It updates internal and private API contracts only. This is a small commit to resolve comments made on the PR.

adamhammond · 2024-04-19T18:15:20Z

Apparently, the code coverage for M.E.Http.Resilience has dropped from 97% to 96.75% and that's why CI failed.

@amcasey is code coverage calculated for the entire package or just for the lines I touched? I have added full test coverage for this change with the exception of the catch-block in two of the files where I catch IOException. Test coverage can't be added for this without doing something awkward like extending the StreamContent class with virtual methods so that they can be mocked. That would also require adding a certain flag on the assembly to permit this AFAIK. It doesn't seem like this is worth the trouble and added complexity. Thoughts?

martintmk · 2024-04-22T00:36:28Z

src/Libraries/Microsoft.Extensions.Http.Resilience/Internal/RequestMessageSnapshot.cs

    {
-        if (request.Content is StreamContent)
+        HttpContent? clonedContent = null;
+        if (content is not null)


This introduces performance regression for non-streamed content. Create an extra branch for StreamContent where this logic belongs.

@martintmk can you expand on why you believe this will cause a performance regression?

The cloning of the HttpContent (i.e. this method) only executes if the request.Content is StreamContent. See the other places in this class where this CloneContentAsync method is called for more context. You will see that each call is wrapped in a if (request.Content is StreamContent) check. Therefore, I am performing the branching that you are suggesting (pending that I'm understanding your usage of the term "branch" to mean logic branch is correct). Note: I did this specifically to prevent any performance regression for existing non-Streamed content users of the AddStandardHedgingHandler API.

Also, if you are referring to the if (request.Content is StreamContent) check as being the culprit of a new performance regression, please observe that this check already existed before my change. The only difference is that, prior to this change, if request.Content is StreamContent is true, then the RequestMessageSnapshot methods throw an InvalidOperationException. Therefore, perf should remain the same for existing, non-Streamed content users.

I see, thanks for detailed explanation. Can you still try to avoid the extra branches by condensing the code into a single PrepareContentAsync method
It would also reduce the number of mutants you have to kill.

Ahh I see what you mean. I understand where you are coming from; however, I don't think encapsulating the content operations behind a single method will achieve the additional clarity or eliminate the required mutation tests as you are suggesting. Below are my thoughts:

The mutation tests still need to be accounted for regardless.

I don't believe it would add any additional clarity. In fact, I believe it might do the opposite and here's why: the method would need to operate in both directions (i.e. cloning a passed-in request object into the given RequestMessageSnapshot instance fields and cloning the given RequestMessageSnapshot instance into a new request object). I don't see any clear way to represent that behind a single method without it creating more confusion. It even seems like it would be a code smell.

martintmk · 2024-04-22T00:38:26Z

src/Libraries/Microsoft.Extensions.Http.Resilience/Internal/RequestMessageSnapshot.cs

+            HttpContent originalContent = content;
+            Stream originalRequestBody = await content.ReadAsStreamAsync().ConfigureAwait(false);
+            MemoryStream clonedRequestBody = new MemoryStream();
+            await originalRequestBody.CopyToAsync(clonedRequestBody).ConfigureAwait(false);


The streamed content can point to a large file which can have > GBs in size. Since we are copying to memory we should constraint it. Let's say < 10MB

I'm not sure if it's a good idea to impose a size limit. For example, I don't think the framework imposes any such limit when cloning request content for redirected requests. IMO, we should allow this to be limitless, and require users to enforce their own request content size limits, if they have them, via server configurations and/or their own custom handlers, filters, middleware, etc. Thoughts?

This change can buffer potentially endless stream into memory and crash the process. I think it's something we should guard against.

This thing should be used for relatively small streamed payloads.

Btw, you can try to call LoadIntoBufferAsync on the content to see how it behaves.

I do understand the concern; however, the max size that a given stream can grow before it crashes the process is bound only by the memory resources available on the given server that it's executing on. Therefore, by choosing a static max limit, we are creating a cap that might unnecessarily hinder users of the API that have both requirements and the necessary server resources available to process requests with exceptionally large stream content. Further, RequestMessageSnapshot is only cloning content from existing request objects that contain existing StreamContent. Therefore, a HttpRequestMessage object has already been created with StreamContent of the given size before any of the RequestMessageSnapshot operations are executed. If the size of the StreamContent was in fact too large, it would have already crashed the process.

I feel strongly that we should not impose a size limit when cloning the StreamContent.

If the size of the StreamContent was in fact too large, it would have already crashed the process.

This is not true, the StreamContent can point to file stream and it consumes minimal memory even as you read the whole file. What you are doing here is materializing the whole stream into the memory. This is what concerns me.

The only way what you are saying is true is if the content is a FileStream and is read into a buffer, correct? This is somewhat of a remarkable edge case since we are talking about streams in the context of HTTP request content. It would help me understand this better if you could provide an example scenario where a developer would need to support this. Also, it is universally known that a given request must be deep cloned in order to support hedging. I believe the general assumption is that deep cloning will be handled via cloning to a MemoryStream as opposed to any alternative stream, which would require the entire stream to be copied to memory. Therefore, I return to my previous stance that users of the AddStandardHedgingHandler extension method should add their own handler, middleware, etc. to impose a size limit on request content if they have one. My biggest concern is that we don't have enough information to know what the optimal max limit value should be that would support the most use cases effectively.

src/Libraries/Microsoft.Extensions.Http.Resilience/Internal/RequestMessageSnapshot.cs

Add support for HttpRequestMessage objects containing StreamContent to the AddStandardHedgingHandler() resilience API. This change does not update any public API contracts. It updates internal and private API contracts only. This is a small commit to resolve comments made on the PR.

adamhammond · 2024-04-22T22:59:37Z

Apparently, the code coverage for M.E.Http.Resilience has dropped from 97% to 96.75% and that's why CI failed.

is code coverage calculated for the entire package or just for the lines I touched?

The coverage is calculated for the whole package.

@RussKie I was able to add code coverage for the catch-block. I had to create a helper that extended MemoryStream with overloaded stream operation-methods to throw IOException in order to emulate the needed behavior. After adding the additional coverage, all pipeline checks passed 👍

iliar-turdushev · 2024-04-25T15:25:44Z

...braries/Microsoft.Extensions.Http.Resilience.Tests/Resilience/RequestMessageSnapshotTests.cs

+        AddContentHeaders(_requestMessage!.Content);
+        using RequestMessageSnapshot snapshot = await RequestMessageSnapshot.CreateAsync(_requestMessage).ConfigureAwait(false);
+        ((IResettable)snapshot).TryReset();
+        _ = await Assert.ThrowsAsync<InvalidOperationException>(async () => await snapshot.CreateRequestMessageAsync().ConfigureAwait(false));


Suggested change

_ = await Assert.ThrowsAsync<InvalidOperationException>(async () => await snapshot.CreateRequestMessageAsync().ConfigureAwait(false));

_ = await Assert.ThrowsAsync<InvalidOperationException>(() => snapshot.CreateRequestMessageAsync());

nit: This and other similar assert statements can be simplified.

iliar-turdushev

The PR looks good to me.

martintmk · 2024-04-26T06:45:27Z

@adamhammond

I have alternative proposal that should enable your scenarios and won't cause potential issues.

In your code-base you add the following handler:

public class BufferingHandler : DelegatingHandler
{
    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        await request.Content.LoadIntoBufferAsync();
        return await base.SendAsync(request, cancellationToken);
    }
}

Now hedging will support StreamContent if the stream is seekable. It also gets the buffer and replicates it across hedged requests, so there are no multiple duplications. Something like:

        if (request.Content is StreamContent streamContent && await streamContent.ReadAsStreamAsync(cancellationToken) is Stream { CanSeek: true } stream)
        {
            byte[] buffer;

            if (stream is MemoryStream memoryStream)
            {
                // but buffer asside
                buffer = memoryStream.GetBuffer();
            }
            else
            {
                var memory = new MemoryStream();
                await stream.CopyToAsync(memory);
                buffer = memory.GetBuffer();
            }

            request.Content = new ByteArrayContent(buffer);
        }

This delegates the responsibility of buffering to the caller. In other words it's an explicit actions that tells us: "I have buffered the streamed content and I know what I am doing"

The only extra code on your part is that minimal BufferingHandler that you need to add before standard hedging.

WDYT?

mobratil · 2024-04-26T08:20:15Z

@martintmk I like your proposal because it makes the decisions about storing the stream in memory explicit.
The developer should explicitly say that some potentially large allocation is going to happen.
With the current PR the allocation might happen without developer's intend.

martintmk · 2024-04-26T08:35:21Z

Just to reiterate why I am pushing back on on this. The standard hedging is/will be used broadly across Microsoft most important services. My intuition tells me that allowing unlimited or default buffering of large streams into memory by default with eventually cause problems or outages.

So we need to do it opt-in with the option to limit the size that will be buffered.

adamhammond · 2024-05-01T19:53:16Z

@adamhammond

Now hedging will support StreamContent if the stream is seekable. It also gets the buffer and replicates it across hedged requests, so there are no multiple duplications. Something like:

        if (request.Content is StreamContent streamContent && await streamContent.ReadAsStreamAsync(cancellationToken) is Stream { CanSeek: true } stream)
        {
            byte[] buffer;

            if (stream is MemoryStream memoryStream)
            {
                // but buffer asside
                buffer = memoryStream.GetBuffer();
            }
            else
            {
                var memory = new MemoryStream();
                await stream.CopyToAsync(memory);
                buffer = memory.GetBuffer();
            }

            request.Content = new ByteArrayContent(buffer);
        }

WDYT?

I do like the idea of utilizing a buffer if one is available; however, this unfortunately is not possible (or rather, is not safe) to do in the context of hedging. It violates the "snapshot" pattern, breaking the implicit contract of the RequestMessageSnapshot class because reusing the buffer to clone content means that any subsequent changes made to said content in the buffer by the user after a snapshot is created will also be reflected in the snapshot.

To expand, recall that the entire purpose of RequestMessageSnapshot class is to clone HttpRequestMessage objects for the purpose of hedging. The AddStandardHedgingHandler extension method essentially adds a DelegatingHandler to the given HttpClient's handler pipeline in which it creates a resilience pipeline and executes said pipeline for the given request. One of the resilience strategies within that pipeline clones the given HttpRequestMessage into a RequestMessageSnapshot and stores it in the resilience context. The hedging strategy then uses this RequestMessageSnapshot instance to clone a new HttpRequestMessage object from the RequestMessageSnapshot object pulled from the resilience context for every subsequent (i.e. hedged) request. The important thing to note about this is that users can still add other DelegatingHandlers to the HttpClient after calling the AddStandardHedgingHandler extension method which will force said handlers to execute after the resilience handler (i.e. after the hedging strategy) in the HttpClient's handler pipeline.

To illustrate, let's say that a user added a custom DelegatingHandler that manipulates the request content in some way before sending the request over the wire. If RequestMessageSnapshot reuses the content buffer when cloning the request, then the changes made to the request content by the custom handler would also be making these changes to the snapshot's content. Let's say the request gets hedged. An HttpRequestMessage would be cloned from the snapshot; however, it wouldn't be a clone of the request prior to sending, it would be a clone of the request prior to being sent over the wire. This might result in the user making duplicate or incorrect edits when the rest of the handler pipeline executes for each hedged request.

For hedging to work properly, the RequestMessageSnapshot must be a true snapshot.

martintmk · 2024-05-02T08:13:33Z

For hedging to work properly, the RequestMessageSnapshot must be a true snapshot.

I agree here, let's find a way to make this work opt-in. How about adding a MaxRequestBodyBufferLength to HttpHedgingStrategyOptions or similar if the name is too much. Set it to some sensible value, 5_000_000 for example or -1 for disabled.

Then the snapshot can read this value a use it for buffering, throwing exception of buffer is exceeded.

Another alternative is to add ConfigureRequestBodyBuffering extension for IStandardHedgingBuilder. The letter being more appropriate IMHO.

martintmk · 2024-05-07T13:10:54Z

...icrosoft.Extensions.Http.Resilience/Hedging/ResilienceHttpClientBuilderExtensions.Hedging.cs

+                        requestMessage.SetResilienceContext(args.ActionContext);
+
+                        // replace the request message
+                        args.ActionContext.Properties.Set(ResilienceKeys.RequestMessage, requestMessage);


Suggested change

args.ActionContext.Properties.Set(ResilienceKeys.RequestMessage, requestMessage);

args.ActionContext.Properties.Set(ResilienceKeys.RequestMessage, requestMessage);

There is a breaking change here :/

The OnHedging callback does not have access to request message anymore. This is because OnHedging is called after the action is created and before it is invoked.

https://github.com/App-vNext/Polly/blob/f85029c6d14ad20fd36e4fcdde7a32f33409137a/src/Polly.Core/Hedging/Controller/TaskExecution.cs#L127

Fixes dotnet#5105

51c0c53

Add support for HttpRequestMessage objects containing StreamContent to the AddStandardHedgingHandler() resilience API. This change does not update any public API contracts. It updates internal and private API contracts only. Link to issue: dotnet#5105

dotnet-policy-service bot assigned adamhammond Apr 16, 2024

Adam Hammond added 2 commits April 16, 2024 16:57

RussKie reviewed Apr 17, 2024

View reviewed changes

...icrosoft.Extensions.Http.Resilience/Hedging/ResilienceHttpClientBuilderExtensions.Hedging.cs Outdated Show resolved Hide resolved

src/Libraries/Microsoft.Extensions.Http.Resilience/Internal/RequestMessageSnapshot.cs Outdated Show resolved Hide resolved

Adam Hammond added 2 commits April 18, 2024 14:17

This comment was marked as resolved.

Sign in to view

martintmk reviewed Apr 22, 2024

View reviewed changes

src/Libraries/Microsoft.Extensions.Http.Resilience/Internal/RequestMessageSnapshot.cs Outdated Show resolved Hide resolved

martintmk reviewed Apr 22, 2024

View reviewed changes

src/Libraries/Microsoft.Extensions.Http.Resilience/Internal/RequestMessageSnapshot.cs Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for processing requests with StreamContent to AddStandardHedgingHandler() #5112

Add support for processing requests with StreamContent to AddStandardHedgingHandler() #5112

adamhammond commented Apr 16, 2024 •

edited by dotnet-policy-service bot

amcasey commented Apr 17, 2024

joperezr commented Apr 18, 2024

adamhammond commented Apr 19, 2024 •

edited

This comment was marked as resolved.

martintmk Apr 22, 2024

adamhammond Apr 22, 2024 •

edited

martintmk Apr 23, 2024

adamhammond Apr 23, 2024 •

edited

martintmk Apr 22, 2024

adamhammond Apr 22, 2024

martintmk Apr 23, 2024

adamhammond Apr 23, 2024

martintmk Apr 25, 2024

adamhammond May 1, 2024 •

edited

This comment was marked as outdated.

This comment was marked as outdated.

adamhammond commented Apr 22, 2024

This comment was marked as resolved.

iliar-turdushev Apr 25, 2024

iliar-turdushev left a comment

martintmk commented Apr 26, 2024

mobratil commented Apr 26, 2024 •

edited

martintmk commented Apr 26, 2024

adamhammond commented May 1, 2024

martintmk commented May 2, 2024

martintmk May 7, 2024

	_ = await Assert.ThrowsAsync<InvalidOperationException>(async () => await snapshot.CreateRequestMessageAsync().ConfigureAwait(false));
	_ = await Assert.ThrowsAsync<InvalidOperationException>(() => snapshot.CreateRequestMessageAsync());

	args.ActionContext.Properties.Set(ResilienceKeys.RequestMessage, requestMessage);
	args.ActionContext.Properties.Set(ResilienceKeys.RequestMessage, requestMessage);

Add support for processing requests with StreamContent to AddStandardHedgingHandler() #5112

Are you sure you want to change the base?

Add support for processing requests with StreamContent to AddStandardHedgingHandler() #5112

Conversation

adamhammond commented Apr 16, 2024 • edited by dotnet-policy-service bot

Microsoft Reviewers: Open in CodeFlow

amcasey commented Apr 17, 2024

joperezr commented Apr 18, 2024

adamhammond commented Apr 19, 2024 • edited

This comment was marked as resolved.

Choose a reason for hiding this comment

adamhammond Apr 22, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamhammond Apr 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamhammond May 1, 2024 • edited

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment was marked as outdated.

adamhammond commented Apr 22, 2024

This comment was marked as resolved.

Choose a reason for hiding this comment

iliar-turdushev left a comment

Choose a reason for hiding this comment

martintmk commented Apr 26, 2024

mobratil commented Apr 26, 2024 • edited

martintmk commented Apr 26, 2024

adamhammond commented May 1, 2024

martintmk commented May 2, 2024

Choose a reason for hiding this comment

adamhammond commented Apr 16, 2024 •

edited by dotnet-policy-service bot

adamhammond commented Apr 19, 2024 •

edited

adamhammond Apr 22, 2024 •

edited

adamhammond Apr 23, 2024 •

edited

adamhammond May 1, 2024 •

edited

mobratil commented Apr 26, 2024 •

edited