WebSocket + Role-based authentication stopped working with Quarkus 3.9.x: Security Identity is not available #40307

lyrst · 2024-04-26T12:48:07Z

Describe the bug

I hava a WebSocket-endpoint that is secured with the "RolesAllowed"-annotation.
In order to secure the websocket endpint I followed the approach that is described here #29919

The issue that the Test testFoo1Endpoint does not even run with a NullPointerException.

As soon as I switch to Quarkus 3.8.4 the tests fails with an assertion-related error.

Expected behavior

The testFoo1Endpoint-tests is able to run without the NullPointerException.

Actual behavior

The testFoo1Endpoint-tests is fails with a NullPointerException.

How to Reproduce?

Run the following example code.

quarkus-sec-issue.zip

Output of `uname -a` or `ver`

Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:25 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6030 arm64

Output of `java -version`

No response

Quarkus version or git rev

3.9.4

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

mvn

Additional information

No response

The text was updated successfully, but these errors were encountered:

quarkus-bot · 2024-04-26T12:48:11Z

/cc @sberyozkin (security)

sberyozkin · 2024-04-26T13:51:00Z

@lyrst Thanks for the reproducer
@mkouba @michalvavrik Are you aware of what might have changed that causes Websockets authenticator and configurator in the attached reproducer failing to provide the identity info with the token ? WebSockets next is an independent project AFAIK so should not have impacted the code which was running in 3.8.4

Thanks

sberyozkin · 2024-04-26T13:53:07Z

I don't recall us providing any doc-ed recommendations along those lines, so may be we can recommend to switch to WebSockets Next to achieve the expected RBAC ?

mkouba · 2024-04-26T13:58:40Z

Are you aware of what might have changed that causes Websockets authenticator and configurator in the attached reproducer failing to provide the identity info with the token ?

I have no idea.

WebSockets next is an independent project AFAIK so should not have impacted the code which was running in 3.8.4

Yes, I've created #40312 to track our security-related efforts for WebSockets Next.

lyrst · 2024-04-26T14:06:09Z

I tried to use the new WebSocketNext extension as I thought that the RolesAllowed-annotation would work out of the box as it is based on the HttpServer (quote from documentation "The WebSocket handling reuses the main HTTP server.") but this does not seem to work ;-) But I guess I was lacking more information and as @mkouba created a new issue, it seems that more code changes are needed so make "WebSocketNext" work with the @RolesAllowed

mkouba · 2024-04-26T14:11:12Z

I tried to use the new WebSocketNext extension as I thought that the RolesAllowed-annotation would work out of the box as it is based on the HttpServer (quote from documentation "The WebSocket handling reuses the main HTTP server.") but this does not seem to work ;) But I guess I was lacking more information ;) and as @mkouba created a new issue, it seems that more code changes are needed so make "WebSocketNext" work with the @RolesAllowed

Yes, the integration with the security extension is currently WIP.

sberyozkin · 2024-04-26T14:53:57Z

Hi Martin, @mkouba, as far as this issue is concerned, can you please have a quick look at the #29919, as you helped there as well, and see if something rings a bell with respect to possible changes to the code referenced in #29919, between 3.8 and 3.9, np if nothing obvious springs to mind

michalvavrik · 2024-04-26T15:28:37Z

NPE is thrown in Quarkus REST handler, I'll have a look.

michalvavrik · 2024-04-26T18:11:15Z

I'm afraid this will take a CDI expert, at the very least to give me a hint. I'll write you an email.

mkouba · 2024-04-29T07:03:30Z

Hm, so if I understand it correctly the failing test (ExampleResourceTest) does not involve a WebSocket connection at all. It's merely testing the JAX-RS resource, right?

michalvavrik · 2024-04-29T07:07:14Z

Hm, so if I understand it correctly the failing test (ExampleResourceTest) does not involve a WebSocket connection at all. It's merely testing the JAX-RS resource, right?

That's correct. The HTTP request comes into the WebSocket handler, but leaves it as it does not detect the WS header.

michalvavrik · 2024-04-29T11:00:14Z

When Reactive Routes are used together with Quarkus REST and some @RouteFilter is applied before the Quarkus REST begin processing, CDI request context comes active to the Quarkus REST. However Quarkus REST creates a new CDI request context state with io.quarkus.arc.ManagedContext#activate(). I verified that when invoked on the same thread, the CDI request context state keeps changing. Here are some of my debug logging, not sure if useful for anyone else:

io.quarkus.vertx.runtime.VertxCurrentContextFactory is fallback
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (null) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
TestIdentityAssociation.setIdentity deferred - thread 124 context state io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
///// routing context  /hello/foo1
|||||||||||||||| end rt ctx
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
cdi request scope activation in REST thread 124 context state io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c
TestIdentityAssociation.setIdentity sync - thread 124 context state io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
TestIdentityAssociation.getDeferredIdentity thread 124 context state io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c

Interesting is also why the state has changed, I added printing of a stack trace, however it's too long to put here. See the attachment.

What fixes (or works around?) the issue is for example this: bf7880ab3
test-log.txt
65734ebe7aaa265dae724cd0010cd20

I found other ways to work around the issue, but it really requires a feedback from a CDI experts to determine the one that should be used. Use reproducer from the description. Thank you

cc @mkouba @manovotn @Ladicek @geoand

michalvavrik · 2024-04-29T11:15:11Z

My simplistic understanding is that the context state changes when subscribed Uni is created when the old context state is still in place, even though subscription is happening when the new context state is in place.

manovotn · 2024-04-29T17:14:37Z

It seems like AbstractResteasyReactiveContext#requireCDIRequestScope() completely omits the case where you already have active req. context that you didn't start up yourself and instead always tries to enforce new, blank instance through ThreadSetupAction#activateInitial.
While I do not claim I understand the whole chain happening there (not even close in fact), this seems like an issue.

What fixes (or works around?) the issue is for example this: bf7880ab3

Yes, my debugging attempts landed at pretty much the same thing to resolve it - manovotn@7923d22

That being said, I am still unsure about:

Is it expected that you can get a req, context that's already active?
- Seems like yes, because in this case, it's Vertx RouteHandler that does the activation and that seems perfectly valid?
Given that it is permitted scenario, should we have some special handling for its deactivation given that someone else activated it and might want to deactive/destroy later?
- From my observations in this reproducer, the deactivation and destruction of context is handled by Vertx RouteHandler prior to the logic implemented in AbstractResteasyReactiveContext which then does no-op in this regard

michalvavrik · 2024-04-29T19:06:03Z

Very nice @manovotn , thank you for looking at it! I might not be a good adept to answer the questions raised, maybe someone else will, but I'll try.

Yes, my debugging attempts landed at pretty much the same thing to resolve it - manovotn@7923d22

lgtm, but tests will be needed. I can prepare them.

Is it expected that you can get a req, context that's already active?
Seems like yes, because in this case, it's Vertx RouteHandler that does the activation and that seems perfectly valid?

I was asking myself a same thing, but considering that Reactive Routes are "finished product" (the way I see it; it's removed from RHBQ at least and recently dropped from SR JWT) I don't want to propose important changes there.

Given that it is permitted scenario, should we have some special handling for its deactivation given that someone else activated it and might want to deactive/destroy later?

My thinking was that we should not accept that users activate CDI request context. If user code or extensions activate CDI request context prior to Quarkus REST processing, they should destroy it as well. This is very early in processing and user code often needs Quarkus REST to prepare the request context for example with current identity and RoutingContext. We know how to handle and detect case when Reactive Routes activate it, but we can't really keep checking if someone destroyed active context during the processing inside Quarkus REST, can we? Wouldn't it require loop constantly checking as it can be async process?

So my thinking was that we allow it for Reactive Routes, but consider it "illegal state" when it comes in other cases. What should be done I'm not sure. Can we destroy the activate request context ourself and activate a new one? It is enough for cases where concrete state is kept, but not enough when someone simply calls requestContext.terminate(). It should be simply illegal to activate request context without termination so early. WDYT @geoand ?

From my observations in this reproducer, the deactivation and destruction of context is handled by Vertx RouteHandler prior to the login implemented in AbstractResteasyReactiveContext which then does no-op in this regard

+1

manovotn · 2024-04-29T20:01:11Z

lgtm, but tests will be needed. I can prepare them.

That would be awesome, thank you.
I didn't mean to propose this as a final fix, just what I came up with while discovering what's going on :)

My thinking was that we should not accept that users activate CDI request context.

There are CDI APIs allowing you to programmatically activate req. context and even more of them in Arc itself. However, I don't think you'd be able to get that far without first activating it from within some other Quarkus part.

... but we can't really keep checking if someone destroyed active context during the processing inside Quarkus REST, can we? Wouldn't it require loop constantly checking as it can be async process?

Yea, if you go on tempering with context activation and destruction forcefully, you are IMO on your own.
I don't think we should assume it's being manipulated with; doing so would break user app expectations anyway (i.e. state of existing user defined beans for example).

So my thinking was that we allow it for Reactive Routes, but consider it "illegal state" when it comes in other cases. What should be done I'm not sure. Can we destroy the activate request context ourself and activate a new one? It is enough for cases where concrete state is kept, but not enough when someone simply calls requestContext.terminate(). It should be simply illegal to activate request context without termination so early. WDYT @geoand ?

I'd say it is generally easier to respect if the context is already active when your framework/integration needs it and just reuse that one. Whoever creates the context is also responsible for its destruction.
My original question was more along the lines of whether we can be sure that the context that we get here will outlast the window in which we need it to work.

geoand · 2024-04-30T05:43:25Z

Let me try and summarize so as to check if I have understood the issue correctly:

There is some piece of code that runs in a Vertx route that runs before Quarkus REST and that route activates the CDI context which then Quarkus REST completely ignores and starts a new one?
Also, I have not understood from the discussion what has changed between 3.8 and 3.9.

michalvavrik · 2024-04-30T06:52:36Z

(in short :-)) @manovotn I agree with your conclusions / answers. Thank you.

My original question was more along the lines of whether we can be sure that the context that we get here will outlast the window in which we need it to work.

So far I believe we can in case of Reactive Routes.

There is some piece of code that runs in a Vertx route that runs before Quarkus REST and that route activates the CDI context which then Quarkus REST completely ignores and starts a new one?

yes

Also, I have not understood from the discussion what has changed between 3.8 and 3.9.

I think it comes from refactoring of the EagerSecurityHandler as if you move request context activation few lines above, it works. Problem is that we can't just accept that is the cause. There is a race. It would make for a fragile code that noone can touch. That's why I asked for help from CDI experts to find a root cause.

geoand · 2024-04-30T07:11:31Z

It's definitely possible to use the existing request scoped state if it exists, but is that really something we want to do in all cases?

manovotn · 2024-04-30T14:37:20Z

It's definitely possible to use the existing request scoped state if it exists, but is that really something we want to do in all cases?

Can you think of a case where that would be an issue?
I would put it the other way and rather say that we shouldn't forcibly destroy previous context and put up a new one. That can easily break expectations of other code (be it user or extension) that activated it.

geoand · 2024-05-01T05:37:21Z

Can you think of a case where that would be an issue?

Not off hand

geoand · 2024-05-01T05:50:12Z

What I think can become problematic is how we handle the end of the request processing in Quarkus REST.

Currently as we always start the context, we always terminate it as well. If however now we move to not starting the context, what is the expectation going to be about closing it?

Relates to: quarkusio#40307

geoand · 2024-05-01T05:51:50Z

#40388 is a draft that reuses the request context which you folks can use to test.

manovotn · 2024-05-01T08:46:08Z

What I think can become problematic is how we handle the end of the request processing in Quarkus REST.

Currently as we always start the context, we always terminate it as well. If however now we move to not starting the context, what is the expectation going to be about closing it?

That it is handled by whoever started it. Which is the case that I observed in this reproducer as well (see last bit of #40307 (comment))

#40388 is a draft that reuses the request context which you folks can use to test.

+1, this is very similar to what I proposed earlier (manovotn@7923d22) and to what Michal has (bf7880ab3).
Although I'd prefer if the activateInitial method actually does unconditional context activation but that's just minor comment :)

geoand · 2024-05-01T08:49:37Z

That it is handled by whoever started it

Theoretically yes, but is that actually the case in practice.
If so, we also need to update the patch to not destroy the context

manovotn · 2024-05-01T09:12:28Z

That it is handled by whoever started it

Theoretically yes, but is that actually the case in practice. If so, we also need to update the patch to not destroy the context

A repeated invocation of context destruction is a no-op (as is the case in the reproducer).
I see your point though; we could instead alter it so that if we detect an active context, we don't even attempt to destroy it.
We should be able to do that through the requestScopeActivated boolean attribute with some additional checks in place.
I could take a look tomorrow as I am not in front of my laptop now (bank holiday here).

geoand · 2024-05-01T09:35:21Z

Right, that's my point.

I'll update my draft PR soon

geoand · 2024-05-01T12:10:19Z

I see your point though; we could instead alter it so that if we detect an active context, we don't even attempt to destroy it.
We should be able to do that through the requestScopeActivated boolean attribute with some additional checks in place

I didn't really find a way to do that, because I don't see how I can attach anything to the RequestContextState

manovotn · 2024-05-02T04:36:35Z

I see your point though; we could instead alter it so that if we detect an active context, we don't even attempt to destroy it.
We should be able to do that through the requestScopeActivated boolean attribute with some additional checks in place

I didn't really find a way to do that, because I don't see how I can attach anything to the RequestContextState

@geoand you cannot attach anything to it.
We should be able to use the property inside AbstractResteasyReactiveContext to track whether the context originated here or elsewhere and perform cleanup conditionally.
Here's what I mean - #40408

geoand · 2024-05-02T04:42:18Z

We can do that I guess

lyrst added the kind/bug Something isn't working label Apr 26, 2024

quarkus-bot bot added the area/security label Apr 26, 2024

sberyozkin added the area/websockets label Apr 26, 2024

mkouba added the area/undertow-websockets label Apr 26, 2024

geoand added a commit to geoand/quarkus that referenced this issue May 1, 2024

Reuse request context in Quarkus REST if it exists

70e466c

Relates to: quarkusio#40307

geoand mentioned this issue May 1, 2024

Reuse request context in Quarkus REST if it exists #40388

Closed

manovotn mentioned this issue May 2, 2024

Quarkus REST - reuse CDI request context if it exists #40408

Merged

manovotn linked a pull request May 2, 2024 that will close this issue

Quarkus REST - reuse CDI request context if it exists #40408

Merged

sberyozkin mentioned this issue May 2, 2024

WebSockets Next: security integration #40312

Closed

manovotn closed this as completed in #40408 May 3, 2024

quarkus-bot bot added this to the 3.11 - main milestone May 3, 2024

gsmet modified the milestones: 3.11 - main, 3.10.1 May 10, 2024

gsmet modified the milestones: 3.10.1, 3.8.5 May 22, 2024

HerrDerb mentioned this issue May 23, 2024

Since 3.10.1 RequestContext handling is broken on request cancle #40810

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebSocket + Role-based authentication stopped working with Quarkus 3.9.x: Security Identity is not available #40307

WebSocket + Role-based authentication stopped working with Quarkus 3.9.x: Security Identity is not available #40307

lyrst commented Apr 26, 2024 •

edited

quarkus-bot bot commented Apr 26, 2024

sberyozkin commented Apr 26, 2024

sberyozkin commented Apr 26, 2024

mkouba commented Apr 26, 2024

lyrst commented Apr 26, 2024 •

edited

mkouba commented Apr 26, 2024

sberyozkin commented Apr 26, 2024

michalvavrik commented Apr 26, 2024

michalvavrik commented Apr 26, 2024

mkouba commented Apr 29, 2024

michalvavrik commented Apr 29, 2024

michalvavrik commented Apr 29, 2024

michalvavrik commented Apr 29, 2024

manovotn commented Apr 29, 2024 •

edited

michalvavrik commented Apr 29, 2024 •

edited

manovotn commented Apr 29, 2024

geoand commented Apr 30, 2024

michalvavrik commented Apr 30, 2024

geoand commented Apr 30, 2024

manovotn commented Apr 30, 2024

geoand commented May 1, 2024

geoand commented May 1, 2024

geoand commented May 1, 2024

manovotn commented May 1, 2024

geoand commented May 1, 2024

manovotn commented May 1, 2024

geoand commented May 1, 2024

geoand commented May 1, 2024

manovotn commented May 2, 2024

geoand commented May 2, 2024

WebSocket + Role-based authentication stopped working with Quarkus 3.9.x: Security Identity is not available #40307

WebSocket + Role-based authentication stopped working with Quarkus 3.9.x: Security Identity is not available #40307

Comments

lyrst commented Apr 26, 2024 • edited

Describe the bug

Expected behavior

Actual behavior

How to Reproduce?

Output of uname -a or ver

Output of java -version

Quarkus version or git rev

Build tool (ie. output of mvnw --version or gradlew --version)

Additional information

quarkus-bot bot commented Apr 26, 2024

sberyozkin commented Apr 26, 2024

sberyozkin commented Apr 26, 2024

mkouba commented Apr 26, 2024

lyrst commented Apr 26, 2024 • edited

mkouba commented Apr 26, 2024

sberyozkin commented Apr 26, 2024

michalvavrik commented Apr 26, 2024

michalvavrik commented Apr 26, 2024

mkouba commented Apr 29, 2024

michalvavrik commented Apr 29, 2024

michalvavrik commented Apr 29, 2024

michalvavrik commented Apr 29, 2024

manovotn commented Apr 29, 2024 • edited

michalvavrik commented Apr 29, 2024 • edited

manovotn commented Apr 29, 2024

geoand commented Apr 30, 2024

michalvavrik commented Apr 30, 2024

geoand commented Apr 30, 2024

manovotn commented Apr 30, 2024

geoand commented May 1, 2024

geoand commented May 1, 2024

geoand commented May 1, 2024

manovotn commented May 1, 2024

geoand commented May 1, 2024

manovotn commented May 1, 2024

geoand commented May 1, 2024

geoand commented May 1, 2024

manovotn commented May 2, 2024

geoand commented May 2, 2024

lyrst commented Apr 26, 2024 •

edited

Output of `uname -a` or `ver`

Output of `java -version`

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

lyrst commented Apr 26, 2024 •

edited

manovotn commented Apr 29, 2024 •

edited

michalvavrik commented Apr 29, 2024 •

edited