ShallowEtagHeaderFilter should use a more efficiently dynamically resizing buffer than ResizableByteArrayOutputStream [SPR-12081] #16697

spring-projects-issues · 2014-08-14T08:52:31Z

Craig opened SPR-12081 and commented

ShallowEtagHeaderFilter buffers the response in ResizableByteArrayOutputStream. When it needs to grows the response, it uses ResizableByteArrayOutputStream.resize(int):
https://github.com/spring-projects/spring-framework/blob/v4.0.6.RELEASE/spring-web/src/main/java/org/springframework/web/filter/ShallowEtagHeaderFilter.java#L245
https://github.com/spring-projects/spring-framework/blob/v4.0.6.RELEASE/spring-core/src/main/java/org/springframework/util/ResizableByteArrayOutputStream.java#L64

ResizableByteArrayOutputStream then creates a new buffer, copies the old buffer into the new buffer, then (implicitly) releases the old buffer.

It would be more efficient to use a linked list of buffers. The resize operation would create a new empty buffer and add it to the linked list of buffers. This approach results in less copying so less overall memory use (reducing the frequently of problems such as #15482 when due to buffer resizing).

For an example of this approach in use today, see Grails' StreamCharBuffer: https://github.com/grails/grails-core/blob/355a031bfacbdc94c60b4a8fe4131a500c8833cb/grails-encoder/src/main/groovy/org/grails/buffer/StreamCharBuffer.java Note that this approach is also in use in MyFaces, see https://issues.apache.org/jira/browse/MYFACES-3450

Affects: 4.0.6

Issue Links:

ShallowEtagHeaderFilter should make use of specified content length [SPR-8271] #12919 ShallowEtagHeaderFilter should make use of specified content length

spring-projects-issues · 2014-09-26T12:48:35Z

Craig commented

#653

spring-projects-issues · 2014-10-04T03:01:12Z

Juergen Hoeller commented

I suppose this is only really inefficient when the content length has not been specified? Since with a specified content length, we'll use an appropriately sized buffer in the first place...

I'm just wondering how important this is for the 4.1.x line. We'll start 4.2 work pretty soon and this kind of rearchitecting in the details would be a fine fit there from my perspective.

Juergen

spring-projects-issues · 2014-10-06T07:30:44Z

Craig commented

Agreed - this is only important when the content length is not specified. However, for most responses, the content length isn't known ahead of time. I think that only when serving static content is the length known, for typical dynamic responses (gsp, jsp, velocity, freemarker, raw response.write, etc) there's no way to know so a lot of resizing happens. These types of responses are also the least cacheable by caching proxies so they're the most likely to be processed by the application server making this issue that much more important.

I'd love to have the non-API change parts made for 4.1.x. But if that's not possible, I'll still be more than happy to see the improvement made for 4.2.x. :-)

spring-projects-issues · 2014-10-09T07:24:23Z

Brian Clozel commented

Hi Craig

This is something we can work on, for sure.
I've written a quick JMH benchmark to check what kind of improvement we could get here.

Initial results show that the current implementation performs equally or better that the proposal.

For 25Kb responses

Benchmark                                                      Mode  Samples       Score      Error  Units
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperFixed          thrpt        5  182496,538 ± 4991,415  ops/s
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperUnknown        thrpt        5  181891,086 ± 4292,277  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperFixed      thrpt        5  186418,003 ± 2680,324  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown    thrpt        5  121349,954 ± 3360,872  ops/s

For 100Kb responses

Benchmark                                                      Mode  Samples      Score     Error  Units
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperFixed          thrpt        5  44293,692 ± 437,824  ops/s
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperUnknown        thrpt        5  44691,418 ± 651,204  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperFixed      thrpt        5  44758,725 ± 488,173  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown    thrpt        5  28252,114 ± 404,786  ops/s

This benchmark (like many of those) may be flawed, so don't hesitate to point issues there.
Note that this benchmark measures throughput, but not memory use.

spring-projects-issues · 2014-10-16T12:08:02Z

Craig commented

Brian Clozel sorry for the delayed response - it took me a while to figure out why your test got such different results than what I expected. Turns out it's a rather small thing that makes a measurable difference.

In your test, you do:

bh.consume(baos.toByteArray());

which causes FastByteArrayOutputStream to do a "safe" byte array. It creates a new byte array, copies all the buffered data into it, releases all the previously buffered data, uses the newly created by array as the new buffer, then copies the buffer and returns it. This copying then returning the buffer is the slow part - toByteArrayUnsafe() returns the internal buffer instead of copying it. If you use baos.toByteArrayUnsafe() you'll get different results with FastByteArrayOutputStream beating ResizableByteArrayOutputStream with a fixed size and roughly tying for the unknown size.

In my pull request, toByteArray() is only used in the first commit in order to be safe (we can't rely on classes that extend ShallowEtagHeaderFilter and override isEligibleForEtag or generateETagHeaderValue to not modify the byte[]). So the lower performance your jmh test highlights would be present if only the first commit is merged.

However, the second commit eliminates this problem. toByteArray() is never called; the OutputStream is read as a stream (instead of a byte[]). So it's both safe and fast.

Finally, I think that it's much more realistic for little bits of data to be written many times rather than one bit of data written once. So I added these 2 tests:

@Benchmark
public void byteArrayWrapperUnknown100times(ResponseData data, Blackhole bh) throws IOException {
	ResizableByteArrayOutputStream baos = new ResizableByteArrayOutputStream(1024);
	for(int i=0; i < 100; i++){
		baos.write(data.bytes);
	}
	bh.consume(baos.toByteArray());
}

@Benchmark
public void fastByteArrayWrapperUnknown100times(ResponseData data, Blackhole bh) throws IOException {
	FastByteArrayOutputStream baos = new FastByteArrayOutputStream(1024);
	for(int i=0; i < 100; i++){
		baos.write(data.bytes);
	}
	bh.consume(baos.toByteArray());
}

@Benchmark
public void fastByteArrayWrapperUnknown100timesUnsafe(ResponseData data, Blackhole bh) throws IOException {
	FastByteArrayOutputStream baos = new FastByteArrayOutputStream(1024);
	for(int i=0; i < 100; i++){
		baos.write(data.bytes);
	}
	bh.consume(baos.toByteArrayUnsafe());
}

Using these tests, FastByteArrayOutputStream beats ResizableByteArrayOutputStream by ~50%. It even wins when using toByteArray() instead of toByteArrayUnsafe().

Benchmark                                                                    Mode  Samples       Score       Error  Units
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperFixed                        thrpt        5  153125.521 ▒  8547.369  ops/s
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperUnknown                      thrpt        5  152464.681 ▒  7378.370  ops/s
o.s.s.ResponseWrapperBenchmark.byteArrayWrapperUnknown100times              thrpt        5     509.781 ▒    37.328  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperFixed                    thrpt        5  146966.577 ▒ 18102.712  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperFixedUnsafe              thrpt        5  293713.313 ▒ 26438.443  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown                  thrpt        5   91566.986 ▒  5903.223  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown100times          thrpt        5     531.457 ▒    78.856  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknown100timesUnsafe    thrpt        5     771.595 ▒    98.998  ops/s
o.s.s.ResponseWrapperBenchmark.fastByteArrayWrapperUnknownUnsafe            thrpt        5  130245.591 ▒ 19222.186  ops/s

spring-projects-issues closed this as completed May 26, 2015

spring-projects-issues added type: enhancement A general enhancement in: web Issues in web modules (web, webmvc, webflux, websocket) labels Jan 11, 2019

spring-projects-issues added this to the 4.2 RC1 milestone Jan 11, 2019

spring-projects-issues assigned bclozel Jan 11, 2019

spring-projects-issues mentioned this issue Jan 11, 2019

ShallowEtagHeaderFilter should make use of specified content length [SPR-8271] #12919

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ShallowEtagHeaderFilter should use a more efficiently dynamically resizing buffer than ResizableByteArrayOutputStream [SPR-12081] #16697

ShallowEtagHeaderFilter should use a more efficiently dynamically resizing buffer than ResizableByteArrayOutputStream [SPR-12081] #16697

spring-projects-issues commented Aug 14, 2014 •

edited

spring-projects-issues commented Sep 26, 2014

spring-projects-issues commented Oct 4, 2014

spring-projects-issues commented Oct 6, 2014

spring-projects-issues commented Oct 9, 2014

spring-projects-issues commented Oct 16, 2014

ShallowEtagHeaderFilter should use a more efficiently dynamically resizing buffer than ResizableByteArrayOutputStream [SPR-12081] #16697

ShallowEtagHeaderFilter should use a more efficiently dynamically resizing buffer than ResizableByteArrayOutputStream [SPR-12081] #16697

Comments

spring-projects-issues commented Aug 14, 2014 • edited

spring-projects-issues commented Sep 26, 2014

spring-projects-issues commented Oct 4, 2014

spring-projects-issues commented Oct 6, 2014

spring-projects-issues commented Oct 9, 2014

spring-projects-issues commented Oct 16, 2014

spring-projects-issues commented Aug 14, 2014 •

edited