Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix another NullPointerException in FixedStreamMessage #4667

Merged
merged 10 commits into from
Feb 14, 2023

Conversation

ikhoon
Copy link
Contributor

@ikhoon ikhoon commented Feb 10, 2023

A NullPointerException caused due to a race condition between collect() and abort() was fixed in #4652. Howerver, we got another reoprt from Slack community. https://line-armeria.slack.com/archives/C1NGPBUH2/p1675994120153789

2023-02-09T02:08:55,526 [armeria-common-worker-epoll-3-3] WARN  com.linecorp.armeria.internal.common.stream.FixedStreamMessage - Subscriber.onError() should not raise an exception. subscriber: null
com.linecorp.armeria.common.util.CompositeException: 2 exceptions occurred.
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError0(FixedStreamMessage.java:247) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError(FixedStreamMessage.java:237) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort1(FixedStreamMessage.java:342) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort0(FixedStreamMessage.java:328) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort(FixedStreamMessage.java:308) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.OneElementFixedStreamMessage.abort(OneElementFixedStreamMessage.java:112) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:287) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:264) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.doClose(AbstractServerCall.java:239) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:222) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:217) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.FramedGrpcService.lambda$startCall$4(FramedGrpcService.java:318) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?]
    at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?]
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
    at com.linecorp.armeria.common.util.UnmodifiableFuture.doComplete(UnmodifiableFuture.java:164) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.CancellationScheduler$CancellationFuture.doComplete(CancellationScheduler.java:521) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.CancellationScheduler.invokeTask(CancellationScheduler.java:477) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.CancellationScheduler.finishNow0(CancellationScheduler.java:322) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.CancellationScheduler.finishNow(CancellationScheduler.java:306) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.server.DefaultServiceRequestContext.cancel(DefaultServiceRequestContext.java:327) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.StreamingDecodedHttpRequest.abortResponse(StreamingDecodedHttpRequest.java:181) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.Http2RequestDecoder.onRstStreamRead(Http2RequestDecoder.java:356) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at io.netty.handler.codec.http2.Http2FrameListenerDecorator.onRstStreamRead(Http2FrameListenerDecorator.java:59) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final]
    at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:442) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final]
    at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final]
    at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:509) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final]
    ...
Caused by: com.linecorp.armeria.common.util.CompositeException$ExceptionOverview: Multiple exceptions (2)
|-- java.lang.NullPointerException: Cannot invoke "org.reactivestreams.Subscriber.onError(java.lang.Throwable)" because "this.subscriber" is null
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError0(FixedStreamMessage.java:242)
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError(FixedStreamMessage.java:237)
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort1(FixedStreamMessage.java:342)
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort0(FixedStreamMessage.java:328)
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort(FixedStreamMessage.java:308)
    at com.linecorp.armeria.internal.common.stream.OneElementFixedStreamMessage.abort(OneElementFixedStreamMessage.java:112)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:287)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:264)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.doClose(AbstractServerCall.java:239)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:222)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:217)
    at com.linecorp.armeria.server.grpc.FramedGrpcService.lambda$startCall$4(FramedGrpcService.java:318)
    ...

Modifications:

  • Check if a stream is aborted while subscribe0() or collect() is the pending queue of an event executor.
    • If it is aborted, abort the subscriber or the collection future.
  • Check if a stream is subscribed while abort1() is in the pending queue of an event executor.
    • If it is subscribed, delegate the subscribe0() to signal abortCause via onError().
  • Test possible race conditions by switching the execution order of in an event executor.

Result:

You no longer see a NullPointerException when a stream is aborted.

a `NullPointerException` caused due to a race condition between
`collect()` and `abort() was fixed in line#4652. Howerver, we got another
reoprt from Slack community. https://line-armeria.slack.com/archives/C1NGPBUH2/p1675994120153789
```
2023-02-09T02:08:55,526 [armeria-common-worker-epoll-3-3] WARN  com.linecorp.armeria.internal.common.stream.FixedStreamMessage - Subscriber.onError() should not raise an exception. subscriber: null
com.linecorp.armeria.common.util.CompositeException: 2 exceptions occurred.
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError0(FixedStreamMessage.java:247) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError(FixedStreamMessage.java:237) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort1(FixedStreamMessage.java:342) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort0(FixedStreamMessage.java:328) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort(FixedStreamMessage.java:308) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.stream.OneElementFixedStreamMessage.abort(OneElementFixedStreamMessage.java:112) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:287) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:264) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.doClose(AbstractServerCall.java:239) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:222) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:217) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.grpc.FramedGrpcService.lambda$startCall$4(FramedGrpcService.java:318) ~[armeria-grpc-1.21.1-SNAPSHOT.jar:?]
    at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?]
    at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?]
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
    at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
    at com.linecorp.armeria.common.util.UnmodifiableFuture.doComplete(UnmodifiableFuture.java:164) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.CancellationScheduler$CancellationFuture.doComplete(CancellationScheduler.java:521) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.CancellationScheduler.invokeTask(CancellationScheduler.java:477) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.CancellationScheduler.finishNow0(CancellationScheduler.java:322) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.common.CancellationScheduler.finishNow(CancellationScheduler.java:306) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.internal.server.DefaultServiceRequestContext.cancel(DefaultServiceRequestContext.java:327) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.StreamingDecodedHttpRequest.abortResponse(StreamingDecodedHttpRequest.java:181) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at com.linecorp.armeria.server.Http2RequestDecoder.onRstStreamRead(Http2RequestDecoder.java:356) ~[armeria-1.21.1-SNAPSHOT.jar:?]
    at io.netty.handler.codec.http2.Http2FrameListenerDecorator.onRstStreamRead(Http2FrameListenerDecorator.java:59) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final]
    at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:442) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final]
    at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final]
    at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:509) ~[netty-codec-http2-4.1.86.Final.jar:4.1.86.Final]
    ...
Caused by: com.linecorp.armeria.common.util.CompositeException$ExceptionOverview: Multiple exceptions (2)
|-- java.lang.NullPointerException: Cannot invoke "org.reactivestreams.Subscriber.onError(java.lang.Throwable)" because "this.subscriber" is null
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError0(FixedStreamMessage.java:242)
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.onError(FixedStreamMessage.java:237)
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort1(FixedStreamMessage.java:342)
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort0(FixedStreamMessage.java:328)
    at com.linecorp.armeria.internal.common.stream.FixedStreamMessage.abort(FixedStreamMessage.java:308)
    at com.linecorp.armeria.internal.common.stream.OneElementFixedStreamMessage.abort(OneElementFixedStreamMessage.java:112)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:287)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.closeListener(AbstractServerCall.java:264)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.doClose(AbstractServerCall.java:239)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:222)
    at com.linecorp.armeria.server.grpc.AbstractServerCall.close(AbstractServerCall.java:217)
    at com.linecorp.armeria.server.grpc.FramedGrpcService.lambda$startCall$4(FramedGrpcService.java:318)
    ...
```

Modifications:

- Check if a stream is aborted while `subscribe0()` or `collect()` is
  the pending queue of an event executor.
  - If it is aborted, abort the subscriber or the collection future.
- Check if a stream is subscribed while `abort1()` is in the pending
  queue of an event executor.
  - If it is subscribed, delegate the subscribe0() to signal abortCause
    via onError().
- Test possible race conditions by switching the execution order of in
  an event executor.

Result:

You no longer see a `NullPointerException` when a stream is aborted.
@ikhoon ikhoon added the defect label Feb 10, 2023
@ikhoon ikhoon added this to the 1.22.1 milestone Feb 10, 2023
@ikhoon ikhoon changed the title Fix another NullPointerException on FixedStreamMessage Fix another NullPointerException in FixedStreamMessage Feb 10, 2023
@dlvenable
Copy link

Thank you @ikhoon . I have run some tests from my side using this branch and have not been able to reproduce this issue anymore. It looks like it is working for me.

@ikhoon
Copy link
Contributor Author

ikhoon commented Feb 13, 2023

It's a relief. Thanks for testing this fix. 🙇‍♂️

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a super nit comment. Thanks for the fix @ikhoon 👍 🙇 🚀

@@ -336,10 +359,19 @@ private void abort1(Throwable cause, boolean subscribed) {
if (completed) {
return;
}
completed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to be handled in this PR:

nit; I think it would be easier to reason about if this path was only called from the event loop.

I understand that functionally there is no difference when subscribed = false since completed isn't used and the targets to clean up is null anyways.
Having said this, I'm wondering if it's just easier to set completionFuture when the executor is set during abortion:

        if (executorUpdater.compareAndSet(this, null, ImmediateEventExecutor.INSTANCE)) {
            // No subscription was made.
            completionFuture.completeExceptionally(finalCause);
        } else {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said this, I'm wondering if it's just easier to set completionFuture when the executor is set during abortion:

Did you think of inlining the code? For example:

if (executorUpdater.compareAndSet(this, null, ImmediateEventExecutor.INSTANCE)) {
    cleanupObjects(cause);
    completionFuture.completeExceptionally(cause);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you think of inlining the code? For example:

I missed that we need to still call cleanupObjects, but yes I think this is what I thought

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting completed to true could be useful to prevent double abortion although the result without completed is the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prevent double abortion

I see, I missed this intention. I think it's fine to leave as-is then 👍

if (executor.inEventLoop()) {
if (executor == ImmediateEventExecutor.INSTANCE) {
// Double abortion
abort1(finalCause, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also consider the case that a user subscribes with ImmediateEventExecutor.INSTANCE?
How about preventing double abortion by setting the abortCause only once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing this! 😄

@jrhee17 jrhee17 added this pull request to the merge queue Feb 14, 2023
Merged via the queue into line:master with commit 33d1c5e Feb 14, 2023
@ikhoon ikhoon deleted the fixed-stream-onError branch May 25, 2023 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants