Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test: Reentrant close in HttpRequestDecoder #13730

Open
wants to merge 3 commits into
base: 4.1
Choose a base branch
from

Conversation

yawkat
Copy link
Contributor

@yawkat yawkat commented Dec 15, 2023

Discovered by fuzzing, but I'm not sure whether this can actually happen outside EmbeddedChannel.

Normally, HttpRequestDecoder will strictly follow the pattern (HttpRequest HttpContent* LastHttpContent)* for downstream handlers. However, when a downstream handler closes the channel inside a channelRead for a HttpRequest, this may reorder inputs, eg HttpRequest HttpRequest LastHttpContent LastHttpContent. This is mostly ByteToMessageDecoder's fault, and I'm not sure how to fix it in a performance-neutral way. Basically, HttpRequestDecoder.channelInactive is called while HttpRequestDecoder.channelRead is still running, so the next request gets inserted mid-request.

If this can only happen in EmbeddedChannel (which I'm not sure of), then we should just fix EmbeddedChannel instead.

Discovered by fuzzing, but I'm not sure whether this can actually happen outside EmbeddedChannel.

Normally, HttpRequestDecoder will strictly follow the pattern `(HttpRequest HttpContent* LastHttpContent)*` for downstream handlers. However, when a downstream handler closes the channel inside a `channelRead` for a `HttpRequest`, this may reorder inputs, eg `HttpRequest HttpRequest LastHttpContent LastHttpContent`. This is mostly ByteToMessageDecoder's fault, and I'm not sure how to fix it in a performance-neutral way. Basically, `HttpRequestDecoder.channelInactive` is called while `HttpRequestDecoder.channelRead` is still running, so the next request gets inserted mid-request.

If this can only happen in EmbeddedChannel (which I'm not sure of), then we should just fix EmbeddedChannel instead.
@yawkat
Copy link
Contributor Author

yawkat commented Dec 15, 2023

Hmm it seems AbstractChannel guards against exactly this:

// As a user may call deregister() from within any method while doing processing in the ChannelPipeline,
// we need to ensure we do the actual deregister operation later. This is needed as for example,
// we may be in the ByteToMessageDecoder.callDecode(...) method and so still try to do processing in
// the old EventLoop while the user already registered the Channel to a new EventLoop. Without delay,
// the deregister operation this could lead to have a handler invoked by different EventLoop and so
// threads.
//
// See:
// https://github.com/netty/netty/issues/4435
invokeLater(new Runnable() {
@Override
public void run() {
try {
doDeregister();
} catch (Throwable t) {
logger.warn("Unexpected exception occurred while deregistering a channel.", t);
} finally {
if (fireChannelInactive) {
pipeline.fireChannelInactive();
}
// Some transports like local and AIO does not allow the deregistration of
// an open channel. Their doDeregister() calls close(). Consequently,
// close() calls deregister() again - no need to fire channelUnregistered, so check
// if it was registered.
if (registered) {
registered = false;
pipeline.fireChannelUnregistered();
}
safeSetSuccess(promise);
}
}
});

However EmbeddedChannel.close runs pending tasks, even if we are already running event loop stuff. So the invokeLater is not effective.

yawkat added a commit to micronaut-projects/micronaut-core that referenced this pull request Dec 15, 2023
There are two bugs fixed here:

(1) Empty URI handling in HateoasErrorResponseProcessor

```
10:18:18.775 [main] ERROR i.m.h.s.netty.RoutingInBoundHandler - Micronaut Server Error - No request state present. Cause: URI cannot be empty
java.lang.IllegalArgumentException: URI cannot be empty
	at io.micronaut.http.hateoas.DefaultLink.<init>(DefaultLink.java:49)
	at io.micronaut.http.hateoas.Link.of(Link.java:115)
	at io.micronaut.http.server.exceptions.response.HateoasErrorResponseProcessor.processResponse(HateoasErrorResponseProcessor.java:69)
	at io.micronaut.http.server.RouteExecutor.createDefaultErrorResponse(RouteExecutor.java:214)
	at io.micronaut.http.server.netty.RoutingInBoundHandler.writeResponse(RoutingInBoundHandler.java:229)
	at io.micronaut.http.server.netty.NettyRequestLifecycle.lambda$handleException$2(NettyRequestLifecycle.java:147)
	at io.micronaut.core.execution.ImperativeExecutionFlowImpl.onComplete(ImperativeExecutionFlowImpl.java:132)
	at io.micronaut.http.server.netty.NettyRequestLifecycle.handleException(NettyRequestLifecycle.java:147)
	at io.micronaut.http.server.netty.NettyRequestLifecycle.handleNormal(NettyRequestLifecycle.java:89)
	at io.micronaut.http.server.netty.RoutingInBoundHandler.accept(RoutingInBoundHandler.java:220)
[...]
```

Test input added to FuzzyInputSpec. FuzzyInputSpec has been adjusted to recognize logged errors.

(2) Bad sorting in MediaType.orderedOf

```
11:48:58.716 [41560@yawkat-oracle main] ERROR i.m.http.server.RouteExecutor - Unexpected error occurred: Comparison method violates its general contract!
java.lang.IllegalArgumentException: Comparison method violates its general contract!
	at java.base/java.util.TimSort.mergeLo(TimSort.java:781)
	at java.base/java.util.TimSort.mergeAt(TimSort.java:518)
	at java.base/java.util.TimSort.mergeForceCollapse(TimSort.java:461)
	at java.base/java.util.TimSort.sort(TimSort.java:254)
	at java.base/java.util.Arrays.sort(Arrays.java:1307)
	at java.base/java.util.ArrayList.sort(ArrayList.java:1721)
	at io.micronaut.http.MediaType.orderedOf(MediaType.java:870)
	at io.micronaut.http.netty.NettyHttpHeaders.accept(NettyHttpHeaders.java:301)
```

Created a new MediaTypeFuzzTest that hits this issue. The sample input (crash-22df7f6e72bba86bdb1fdbd4bf92372fd4fa6bbe) reproduces the issue and will be run as part of the normal micronaut-http test suite now. Setting the env variable JAZZER_FUZZ=1 will enable exploratory fuzzing.

(3) Added workaround for netty/netty#13730

This does not appear to be an issue in the real world, only with EmbeddedChannel. I've added a workaround for the issue so that it doesn't trigger my fuzz tests anymore.

---

None of these bugs appear security-relevant, (3) is not applicable outside a test env, (1) and (2) only produce additional error logs.
sdelamo pushed a commit to micronaut-projects/micronaut-core that referenced this pull request Dec 15, 2023
* Fix logged errors reported by fuzzing
There are two bugs fixed here:

(1) Empty URI handling in HateoasErrorResponseProcessor

```
10:18:18.775 [main] ERROR i.m.h.s.netty.RoutingInBoundHandler - Micronaut Server Error - No request state present. Cause: URI cannot be empty
java.lang.IllegalArgumentException: URI cannot be empty
	at io.micronaut.http.hateoas.DefaultLink.<init>(DefaultLink.java:49)
	at io.micronaut.http.hateoas.Link.of(Link.java:115)
	at io.micronaut.http.server.exceptions.response.HateoasErrorResponseProcessor.processResponse(HateoasErrorResponseProcessor.java:69)
	at io.micronaut.http.server.RouteExecutor.createDefaultErrorResponse(RouteExecutor.java:214)
	at io.micronaut.http.server.netty.RoutingInBoundHandler.writeResponse(RoutingInBoundHandler.java:229)
	at io.micronaut.http.server.netty.NettyRequestLifecycle.lambda$handleException$2(NettyRequestLifecycle.java:147)
	at io.micronaut.core.execution.ImperativeExecutionFlowImpl.onComplete(ImperativeExecutionFlowImpl.java:132)
	at io.micronaut.http.server.netty.NettyRequestLifecycle.handleException(NettyRequestLifecycle.java:147)
	at io.micronaut.http.server.netty.NettyRequestLifecycle.handleNormal(NettyRequestLifecycle.java:89)
	at io.micronaut.http.server.netty.RoutingInBoundHandler.accept(RoutingInBoundHandler.java:220)
[...]
```

Test input added to FuzzyInputSpec. FuzzyInputSpec has been adjusted to recognize logged errors.

(2) Bad sorting in MediaType.orderedOf

```
11:48:58.716 [41560@yawkat-oracle main] ERROR i.m.http.server.RouteExecutor - Unexpected error occurred: Comparison method violates its general contract!
java.lang.IllegalArgumentException: Comparison method violates its general contract!
	at java.base/java.util.TimSort.mergeLo(TimSort.java:781)
	at java.base/java.util.TimSort.mergeAt(TimSort.java:518)
	at java.base/java.util.TimSort.mergeForceCollapse(TimSort.java:461)
	at java.base/java.util.TimSort.sort(TimSort.java:254)
	at java.base/java.util.Arrays.sort(Arrays.java:1307)
	at java.base/java.util.ArrayList.sort(ArrayList.java:1721)
	at io.micronaut.http.MediaType.orderedOf(MediaType.java:870)
	at io.micronaut.http.netty.NettyHttpHeaders.accept(NettyHttpHeaders.java:301)
```

Created a new MediaTypeFuzzTest that hits this issue. The sample input (crash-22df7f6e72bba86bdb1fdbd4bf92372fd4fa6bbe) reproduces the issue and will be run as part of the normal micronaut-http test suite now. Setting the env variable JAZZER_FUZZ=1 will enable exploratory fuzzing.

(3) Added workaround for netty/netty#13730

This does not appear to be an issue in the real world, only with EmbeddedChannel. I've added a workaround for the issue so that it doesn't trigger my fuzz tests anymore.

---

None of these bugs appear security-relevant, (3) is not applicable outside a test env, (1) and (2) only produce additional error logs.

* annotation
yawkat added a commit to micronaut-projects/micronaut-core that referenced this pull request Jan 4, 2024
Decompression errors should be treated as DEBUG to prevent log spam.

I've also copied over some other fuzz tests from #10131. None of them were an issue except one caused by a bug in netty EmbeddedChannel ( netty/netty#13730 ).
yawkat added a commit to micronaut-projects/micronaut-core that referenced this pull request Jan 4, 2024
Decompression errors should be treated as DEBUG to prevent log spam.

I've also copied over some other fuzz tests from #10131. None of them were an issue except one caused by a bug in netty EmbeddedChannel ( netty/netty#13730 ).
sdelamo added a commit to micronaut-projects/micronaut-core that referenced this pull request Jan 5, 2024
* fix(deps): update dependency ch.qos.logback:logback-classic to v1.4.14 (#10191)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update test (#10187)

* KSP: Resolved type arguments shouldn't be marked as a type variable (#10193)

* Optimize overrides-method check (#10195)

* Fix error with duplicate builder methods (#10226)

* fix(deps): update dependency io.netty.incubator:netty-incubator-codec-http3 to v0.0.23.final (#10222)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update ROADMAP.adoc (#10213)

* Update ROADMAP.adoc

Close #10205

* Update ROADMAP.adoc

* fix(deps): update dependency com.github.javaparser:javaparser-symbol-solver-core to v3.25.7 (#10207)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* imp: don't send Content-length header for GET requests (#10218)

* Allow context-path to be an empty String (#10231)

* Allow context-path to be an empty String

* Update ConfigurableUriNamingStrategySpec

* Fix concurrent retrieval of expression resolvers (#10176) (#10229)

* Change to public so that java doc is generated for Configuration Reference link in manual. (#10233)

closes #10202

* KSP: Fix enum class values (#10240)

* docs: add reactor context propagation example (#10220)

* docs: add reactor context propagation example

closes #10219

* fix test failures

* add clarification about modified propagation context

* improve example

* fix code smell

* refactor to fix code smell

* change example to use custom context element

* refine example

* include example imports

* KSP: Exclude static properties (#10245)

* test: remove javax.persistence.Transient from Mapper (#10113)

@transient exclusion was removed in @transient since https://github.com/micronaut-projects/micronaut-core/pull/8072/files

* Update breaking changes for Micronaut 4  (#10248)

* Update breaks.adoc

update to include breaking library changes

* Update breaks.adoc

* fix(deps): update dependency io.micronaut.aws:micronaut-aws-bom to v4.1.2 (#10249)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* KSP: Workaround for inner anonymous classes (#10247)

* fix(deps): update dependency io.micronaut.rxjava2:micronaut-rxjava2-bom to v2.2.0 (#10241)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update dependency io.micronaut.reactor:micronaut-reactor-bom to v3.2.0 (#10234)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Add links to guides for Micronaut Core (#10269)

* Allow Refreshable scope under test for Env.FUNCTION and ANDROID (#10264)

It was seen in the Aws module that `@MockLambdaTest` and `@MockBean` did not work as expected.

micronaut-projects/micronaut-aws#1870

This is because:

1. `@MockBean` uses the Refreshable scope
2. `@MockLambdaTest` adds Environment.FUNCTION to the test context
3. The Refresh scope is disabled for the FUNCTION environment

After this PR, RefreshScope is still disabled for Environment.FUNCTION...

...*UNLESS* Environment.TEST is also enabled.

* test. ThreadSelection accept event loop in the handler in blocking and auto scenarios (#10104)

* change comment copy

* test. ThreadSelection accept event loop in the handler in blocking and auto scenarios

The test is really flaky:

https://ge.micronaut.io/scans/tests?search.timeZoneId=Europe%2FMadrid&tests.container=io.micronaut.http.server.netty.threading.ThreadSelectionSpec

* disable test

---------

Co-authored-by: yawkat <jonas.konrad@oracle.com>

* [bug] EL fails when using primitive method arguments in expression (#10149)

* Reproducer

* Add Kotlin reproducer as well

* Unbox primitive

---------

Co-authored-by: Denis Stepanov <denis.s.stepanov@oracle.com>

* fix(deps): update netty monorepo to v4.1.103.final (#10270)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update managed.ksp to v1.9.21-1.0.16 (#10268)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* [skip ci] Release v4.2.2

* Back to 4.2.3-SNAPSHOT

* Fix logged errors reported by fuzzing (#10273)

* Fix logged errors reported by fuzzing
There are two bugs fixed here:

(1) Empty URI handling in HateoasErrorResponseProcessor

```
10:18:18.775 [main] ERROR i.m.h.s.netty.RoutingInBoundHandler - Micronaut Server Error - No request state present. Cause: URI cannot be empty
java.lang.IllegalArgumentException: URI cannot be empty
	at io.micronaut.http.hateoas.DefaultLink.<init>(DefaultLink.java:49)
	at io.micronaut.http.hateoas.Link.of(Link.java:115)
	at io.micronaut.http.server.exceptions.response.HateoasErrorResponseProcessor.processResponse(HateoasErrorResponseProcessor.java:69)
	at io.micronaut.http.server.RouteExecutor.createDefaultErrorResponse(RouteExecutor.java:214)
	at io.micronaut.http.server.netty.RoutingInBoundHandler.writeResponse(RoutingInBoundHandler.java:229)
	at io.micronaut.http.server.netty.NettyRequestLifecycle.lambda$handleException$2(NettyRequestLifecycle.java:147)
	at io.micronaut.core.execution.ImperativeExecutionFlowImpl.onComplete(ImperativeExecutionFlowImpl.java:132)
	at io.micronaut.http.server.netty.NettyRequestLifecycle.handleException(NettyRequestLifecycle.java:147)
	at io.micronaut.http.server.netty.NettyRequestLifecycle.handleNormal(NettyRequestLifecycle.java:89)
	at io.micronaut.http.server.netty.RoutingInBoundHandler.accept(RoutingInBoundHandler.java:220)
[...]
```

Test input added to FuzzyInputSpec. FuzzyInputSpec has been adjusted to recognize logged errors.

(2) Bad sorting in MediaType.orderedOf

```
11:48:58.716 [41560@yawkat-oracle main] ERROR i.m.http.server.RouteExecutor - Unexpected error occurred: Comparison method violates its general contract!
java.lang.IllegalArgumentException: Comparison method violates its general contract!
	at java.base/java.util.TimSort.mergeLo(TimSort.java:781)
	at java.base/java.util.TimSort.mergeAt(TimSort.java:518)
	at java.base/java.util.TimSort.mergeForceCollapse(TimSort.java:461)
	at java.base/java.util.TimSort.sort(TimSort.java:254)
	at java.base/java.util.Arrays.sort(Arrays.java:1307)
	at java.base/java.util.ArrayList.sort(ArrayList.java:1721)
	at io.micronaut.http.MediaType.orderedOf(MediaType.java:870)
	at io.micronaut.http.netty.NettyHttpHeaders.accept(NettyHttpHeaders.java:301)
```

Created a new MediaTypeFuzzTest that hits this issue. The sample input (crash-22df7f6e72bba86bdb1fdbd4bf92372fd4fa6bbe) reproduces the issue and will be run as part of the normal micronaut-http test suite now. Setting the env variable JAZZER_FUZZ=1 will enable exploratory fuzzing.

(3) Added workaround for netty/netty#13730

This does not appear to be an issue in the real world, only with EmbeddedChannel. I've added a workaround for the issue so that it doesn't trigger my fuzz tests anymore.

---

None of these bugs appear security-relevant, (3) is not applicable outside a test env, (1) and (2) only produce additional error logs.

* annotation

* KSP: Fix `@Generated` processing (#10282)

* Initialize classes at build time that can deadlock graal compiler (#10283)

A class loading deadlock can occur when ConversionContext is initialised concurrently since it has static final fields that are sub interfaces which are also initialised causing a loop.

This doesn't manifest on JIT likely because these classes are never initialised concurrently, but a deadlock can occur in the Graal compiler.

A real fix would to be alter the code but unfortunately there is no way to do this in a backwards compatible way.

A workaround in Graal is to explicitly initialise these classes at build time which happens very early, avoiding the concurrent access that causes the deadlock.

* KSP: Do not store default empty string annotation value (#10284)

* fix(deps): update netty monorepo to v4.1.104.final (#10276)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* chore(deps): update gradle/gradle-build-action action to v2.11.0 (#10252)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Ensure responseWritten is called on discard (#10288)

FullOutboundHandler needs to call responseWritten on discard, otherwise RoutingInboundHandler won't clean up request body. This can happen if the channel is closed while it's not writable, so there's a response backlog.

* Javac: Extract type variable name using the element (#10293)

* Publish service ready / stopped in consistent order (#10325)

* Publish service ready / stopped in consistent order

* Update test-suite/src/test/groovy/io/micronaut/EventListenerSpec.groovy

Co-authored-by: Sergio del Amo <sergio.delamo@softamo.com>

* Update test-suite/src/test/groovy/io/micronaut/EventListenerSpec.groovy

Co-authored-by: Sergio del Amo <sergio.delamo@softamo.com>

---------

Co-authored-by: Sergio del Amo <sergio.delamo@softamo.com>

* [skip ci] Release v4.2.3

* Back to 4.2.4-SNAPSHOT

* test: StreamPressureSpec ignore on macOs

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Denis Stepanov <denis.s.stepanov@oracle.com>
Co-authored-by: Tim Yates <tim.yates@gmail.com>
Co-authored-by: Jeremy Grelle <grellej@unityfoundation.io>
Co-authored-by: rorueda <rorueda@gmail.com>
Co-authored-by: Dean Wette <wetted@objectcomputing.com>
Co-authored-by: hrothwell <46227616+hrothwell@users.noreply.github.com>
Co-authored-by: yawkat <jonas.konrad@oracle.com>
Co-authored-by: micronaut-build <micronaut-build-account@grails.org>
Co-authored-by: Graeme Rocher <graeme.rocher@oracle.com>
…te something else

Motivation:

We need to ensure we only try to run tasks that were submitted for exectuion when we are not currently run other code in the EmbeddedChannel. We failed to do so and so coudl end up executing tasks directly while these should have been only run once the current execution is complete.

Modifications:

- Keep track of how deep our "execution stack" is and only run the pending tasks if we are sure it will not cause out of order problems
- Testcase

Result:

Correctly respect ordering
@normanmaurer
Copy link
Member

@yawkat thanks for reporting... it took me a while but I just pushed a fix for this into your branch. PTAL.

@normanmaurer normanmaurer marked this pull request as ready for review April 27, 2024 12:02
@normanmaurer normanmaurer added this to the 4.1.110.Final milestone Apr 27, 2024
@chrisvest
Copy link
Contributor

The FlushConsolidationHandlerTest is consistently failing.

@yawkat
Copy link
Contributor Author

yawkat commented Apr 29, 2024

a bit of a semantic change, we now run tasks after more operations. but LGTM

@normanmaurer
Copy link
Member

The FlushConsolidationHandlerTest is consistently failing.

Will check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants