Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads created by Jetty no longer run as the Subject that Jetty was started as #6396

Closed
stoty opened this issue Jun 11, 2021 · 20 comments
Closed
Assignees
Labels
Bug For general bugs on Jetty side

Comments

@stoty
Copy link

stoty commented Jun 11, 2021

Jetty version
9.4.37.v20210219 and later 9.4.x versions

Java version/vendor
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)

OS type/version
Darwin 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64

Description
The thread creation change in #5859 has caused a regression.
The threads created by Jetty no longer run as the Subject that Jetty was started as.
This was discovered when trying to build Avatica with Jetty 9.4.37.v20210219
See https://issues.apache.org/jira/browse/CALCITE-4646

How to reproduce

  • Check out calclite-avatica HEAD
  • Update jetty.version to 9.4.37.v20210219 or later in gradle.properties
  • run ./gradlew clean test
@stoty stoty added the Bug For general bugs on Jetty side label Jun 11, 2021
stoty added a commit to stoty/jetty.project that referenced this issue Jun 11, 2021
@janbartel
Copy link
Contributor

We will look into this, but in the meanwhile is there any other information you can supply? Is the thread a request handling-handling thread or has another thread been spun up during request handling? Are you using jetty's SPNEGO integration?

@janbartel
Copy link
Contributor

janbartel commented Jun 11, 2021

Also, please can you provide some links as to which of your tests that are failing as we're not familiar with your codebase so some navigation help would be useful, as would a stack trace or two.

@janbartel
Copy link
Contributor

janbartel commented Jun 11, 2021

Just followed the repro steps (I used jetty-9.4.42) and whilst there is a lot of test output, I can't see anything that is an obvious error and no output that mentions SPNEGO.

Edit: Ah! Sorry, I downloaded calcite-master instead of calcite-avatica-master!

@stoty
Copy link
Author

stoty commented Jun 11, 2021

Yes, this is a request handling thread. Avatica server does not create threads on its own.

Avatica SPNEGO implementation is using Jetty's SPNEGO integration, but the some of the Jetty classes are inherited and extended to work around some bugs/limitations in Jetty's SPENGO handling.

This is where SPNEGO gets configured in Avatica:
https://github.com/apache/calcite-avatica/blob/e4f22c33c5a67257056ee4428486b900cc139038/server/src/main/java/org/apache/calcite/avatica/server/HttpServer.java#L334

The failing tests are:
https://github.com/apache/calcite-avatica/blob/e4f22c33c5a67257056ee4428486b900cc139038/server/src/test/java/org/apache/calcite/avatica/AvaticaSpnegoTest.java#L54
https://github.com/apache/calcite-avatica/blob/e4f22c33c5a67257056ee4428486b900cc139038/server/src/test/java/org/apache/calcite/avatica/server/HttpServerSpnegoWithoutJaasTest.java#L185

Unfortunately the stack traces aren't telling a lot, all you can see in them is a generic Kerberos failure:

021-06-10 18:33:05,285 [Test worker] INFO - JDBC URL jdbc:avatica:remote:url=https://localhost:49984;authentication=SPNEGO;serialization=PROTOBUF;truststore=/Users/stoty/workspaces/apache-phoenix/calcite-avatica/server/build/avatica-test.jks;truststore_password=avaticasecret
2021-06-10 18:33:05,293 [pool-2-thread-1] INFO - The preauth data is empty.
2021-06-10 18:33:05,294 [pool-2-thread-1] INFO - KRB error occurred while processing request: Additional pre-authentication required
2021-06-10 18:33:05,297 [pool-2-thread-1] INFO - AS_REQ ISSUE: authtime 1623342785297,client@EXAMPLE.COM for krbtgt/EXAMPLE.COM@EXAMPLE.COM
2021-06-10 18:33:05,384 [pool-2-thread-1] INFO - TGS_REQ ISSUE: authtime 1623342785383,client@EXAMPLE.COM for HTTP/localhost@EXAMPLE.COM
2021-06-10 18:33:05,388 [qtp86626802-29] WARN - Caught GSSException trying to authenticate the client
GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES128 CTS mode with HMAC SHA1-96)
at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:858)
at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
at sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:906)
at sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:556)
at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
at org.apache.calcite.avatica.server.PropertyBasedSpnegoLoginService.login(PropertyBasedSpnegoLoginService.java:89)
at org.eclipse.jetty.security.authentication.LoginAuthenticator.login(LoginAuthenticator.java:67)
at org.eclipse.jetty.security.authentication.SpnegoAuthenticator.validateRequest(SpnegoAuthenticator.java:85)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:537)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:279)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:779)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:911)
at java.lang.Thread.run(Thread.java:748)
Caused by: KrbException: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES128 CTS mode with HMAC SHA1-96
at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:278)
at sun.security.krb5.KrbApReq.(KrbApReq.java:149)
at sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:140)
at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:831)
... 27 more

@janbartel
Copy link
Contributor

OK, as you're already subclassing jetty classes, can you subclass the QueuedThreadPool and override the newThread(Runnable) method to not use the PrivilegedThreadFactory, but rather make the new thread inside a Subject.doAs(subject, new PrivilegedAction<Void>() with your special subject? Be mindful of the classloader leak reported by #5859, and make sure you set the classloader to something reasonable like thread.setContextClassLoader(getClass().getClassLoader())

If that works, I'll look at adding the ability to set a ThreadFactory onto the QueuedThreadPool .

@janbartel janbartel self-assigned this Jun 11, 2021
@stoty
Copy link
Author

stoty commented Jun 11, 2021

Copying my reply here from the PR:

In my mind the previous behaviour, where the Jetty threads inherited the active subject is the intuitve one, I'd expect the web server threads to run with the same Subject that it was started as.
Others may have other expectations and requirements.

I have tried overriding QueuedThreadPool, but it is not possible because of the private fields, so I ended up copying and renaming the whole class.
Defining a new QueuedThreadPool and using the modified ThreadFactory in it did solve the problems, the tests are running fine.

Being able to set the threadFactory on QueuedThreadPool is a good solution (probably for other purposes too), and it would be a clean and safe way to solve our problem.

@joakime
Copy link
Contributor

joakime commented Jun 11, 2021

If that works, I'll look at adding the ability to set a ThreadFactory onto the QueuedThreadPool .

QueuedThreadPool already supports passing in a ThreadFactory on it's constructor, it was added in issue #4121 (First available in Jetty 9.4.22.v20191022)

https://github.com/eclipse/jetty.project/blob/jetty-9.4.42.v20210604/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L160-L163

https://github.com/eclipse/jetty.project/blob/5cd5e6d2375eeab146813b0de9f19eda6ab6e6cb/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L160-L163

@jmcc0nn3ll
Copy link
Contributor

Avatica SPNEGO implementation is using Jetty's SPNEGO integration, but the some of the Jetty classes are inherited and extended to work around some bugs/limitations in Jetty's SPENGO handling.

Gentle reminder that we are happy to take issues, and even better PRs, for things like this.

@stoty
Copy link
Author

stoty commented Jun 11, 2021

@joakime Thank you. Using that constructor works, even if it was a bit awkaward to figure out all the defaults.
Some of the thread parameters set on QueuedThreadPool.java won't apply this way, but it's not a real limitation, as we can set those directly in the ThreadFactory.

@stoty
Copy link
Author

stoty commented Jun 11, 2021

If the Jetty team agrees that the new behaviour is the correct one, then I guess there's not much left to do with this issue.

@stoty
Copy link
Author

stoty commented Jun 12, 2021

For future reference, this is the custom ThreadFactory solution for Avatica:
apache/calcite-avatica#147

Thanks again for the prompt and very constructive help from the team.

@stoty stoty closed this as completed Jun 12, 2021
@FlyingSheepOnSailfish
Copy link

FlyingSheepOnSailfish commented Jul 30, 2021

We too have just suffered from this problem (though not related to Avatica).

We use Jetty as an Embedded WebServer in a Java Server using the Spark Java microframework.

The “GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400)“ is an issue that can have many causes. Identifying the cause can be like looking for a needle in several haystacks!

It took us a long time and much testing and experimentation to realise the root cause was not a Kerberos / SPNEGO issue, but down to the Subject security context in which Jetty had been launched being lost. i.e. because the security context is lost, the Subject cannot be found; thus the Subject’s credentials (in our case Kerberos Keys) are not available, and SPNEGO tokens cannot be authenticated as a consequence.

We were able to replicate the issue with in throwaway project that launched Spark in the context of a simple non-Kerberos Subject.

Once we had discovered that, it took us a little more hunting to prove that an updated Jetty Version was the cause.

For others who might land on this thread: How did we narrow down the issue to a Jetty Version change?

As the problem occurred on customer site, where we could not do debugging from the Eclipse IDE, we decided to patch the key JRE classes in the module java.security.jgss using the java 9+ setting --patch-module

The patched classes did not change the behaviour: we just added log out and stack-traces so we could then use desk-checking to work out the route taken through the JRE security code, and thus why an older setup worked, but a newer one did not.

The critical point in the JRE security code is:

https://github.com/openjdk/jdk11u-dev/blob/c1411113b396f468963a1deacc3b57ed366e735a/src/java.security.jgss/share/classes/sun/security/jgss/GSSUtil.java#L332
 
It took us a while for it to click that the JRE is retrieving the Subject of the current security context, and that clearly no Subject was found: i.e. the security context (and thus the Subject) in which we had launched Spark and Jetty had got lost somewhere along the way.

We now have a similar function in our own code which allows us to test the subject / security context at key points, in particular before we even initiate the GSSContext.

Having understood this, and proved that our code for Kerberos Authentication and to launch Spark / Jetty was identical, we then created a minimal viable Spark project, and used Maven to quickly upgrade and downgrade Jetty versions.

Having done that, a quick google with “jetty Subject doAs” quickly led us to this thread

Our workaround is to do the SPNEGO authentication explicitly in the Subject’s Security Context instead of starting Jetty in the Context. i.e the Authentication is directly in a Subject.doAs{} structure.

Edit: As we use Jetty Indirectly via Spark, we use our own Kerberos authentication code rather than Jetty's. However, as the Kerberos problem is a follow-on problem, any Kerberos implementation will be affected likewise.

@joakime
Copy link
Contributor

joakime commented Jul 30, 2021

@FlyingSheepOnSailfish thanks for the details. What did you do to fix your situation on Spark? Did you use a custom ThreadFactory as @stoty did?

@sbordet
Copy link
Contributor

sbordet commented Jul 30, 2021

@FlyingSheepOnSailfish are you using the deprecated SpnegoAuthenticator?
Because the new ConfigurableSpnegoAuthenticator should do the right thing with respect to Subject.doAs() calls.

@FlyingSheepOnSailfish
Copy link

FlyingSheepOnSailfish commented Jul 30, 2021

@joakime I only discovered this thread once we already had a viable workaround, and had even nailed down the exact Jetty version. So our workaround is different. I can now configure my application to do either of:
a) start Spark (and thus Jetty) in the Subject.doAs()
b) call our SPNEGO authentication code in its own Subject.doAs() - Actually this was the approach we started with many years ago, before at some point we decided to put everything that Spark and Jetty do (and everything in our code called by Spark into a Subject.doAs(). Having found this thread, I can also investigate the solution @stoty took.

@sbordet No, we use our own custom SPNEGO code. We use Jetty "under-the-hood" of the Spark micro-framework, so we are a little limited as to which parts of Jetty we can easily use / configure. We do have some classes over-loading the Spark classes interacting with Jetty, but not for SPNEGO authentication. I will look into the ConfigurableSpnegoAuthenticator, thanks for bringing that to my attention.

As I have a viable workaround (plus some others above to investigate), and have identified the root cause I am happy: For me the issue is closed. I commented mainly for the benefit of others who may face the same issue.

@FlyingSheepOnSailfish
Copy link

FlyingSheepOnSailfish commented Jul 30, 2021

Having established that the problem was not really a Kerberos problem, but caused by the Subject context getting lost, we called the function below at several points in our code to narrow down where. This function was inspired by one in the JRE class GSSUtils. Essentially it asks the question "in the context of which subject, if any, are we running at this point?".

  public static boolean testSubjectFound(Subject subject, String spn) {

    final AccessControlContext acc = AccessController.getContext();
    try {
      Boolean found = AccessController.doPrivileged(new PrivilegedExceptionAction<Boolean>() {
        public Boolean run() throws Exception {
          Subject accSubj = Subject.getSubject(acc);

          if (accSubj != null) {
            if (accSubj.equals(subject)) {
              // we have found a subject, and it is our subject
              System.out.println("[testSubjectFound] Expected subject found with SPN '" + spn + "'."); 
              return true;
            } else {
              // we have another subject
              for (Principal p : accSubj.getPrincipals()) {
                String accSPN = p.getName();
                System.out.println("[testSubjectFound] Unexpected subject found with SPN '" + accSPN + "'."); 
              }
              return false;
            }
          } else {
            // no subject found!
            System.out.println("[testSubjectFound]  No Subject found! We are no longer in the context of '" + spn + "'. Authentication of tokens will fail!"); 
            return false;
          }
        }
      });
      return found;
    } catch (PrivilegedActionException pae) {
      System.out.println("[testSubjectFound] Exception testing for subject!"); 
      return false;
    }
  }

@joakime
Copy link
Contributor

joakime commented Jul 30, 2021

Oh no. AccessController and AccessController.doPrivileged use ...

Those are currently deprecated in Java 17 and will be removed entirely in Java 18.

@FlyingSheepOnSailfish
Copy link

FlyingSheepOnSailfish commented Jul 30, 2021

@joakime tx, we have just moved from Java 8 to Java 11: that code snippet was inspired by Java 11 JRE code. I can look into what newer JREs do instead. I will have to read this 8-)
...and it looks like Subject.doAs() will be deprecated or replaced in the future too.

@gregw
Copy link
Contributor

gregw commented Jul 30, 2021

@FlyingSheepOnSailfish do keep us informed with your findings after researching newer JREs. If there is something we should be doing better with our managed threads, then do ask!

@FlyingSheepOnSailfish
Copy link

@gregw

Here is the summary of some quick research into future JDKs and the changes they plan for Security Manager et al.

https://openjdk.java.net/jeps/411

In Java 17, we will:
▪ Deprecate, for removal, most Security Manager related classes and methods.
▪ ….
▪ We will not deprecate the javax.security.auth.Subject::doAs method since it can be used to transport a Subject across API boundaries by attaching it to the thread's AccessControlContext, serving a purpose similar to a ThreadLocal. The credentials of the Subject can then be obtained by an underlying authentication mechanism (e.g., a Kerberos implementation of GSSAPI) by calling Subject::getSubject. These credentials can be used for authentication or authorization purposes and do not require the Security Manager to be enabled. However, Subject::doAs depends on APIs tightly related to the Security Manager, such as AccessControlContext and DomainCombiner. Thus, we plan to create a new API that does not depend on the Security Manager APIs; subsequently we will then deprecate the Subject::doAs API for removal.

https://bugs.openjdk.java.net/browse/JDK-8267108
Suggested fix:

    public static Subject current() {
        return getSubject(AccessController.getContext());
    }

Meanwhile, OpenJDK 16 GSSUtil.searchSubject(), which inspired my function testSubjectFound(), is still using AccessController.

https://github.com/openjdk/jdk16/blob/4de3a6be9e60b9676f2199cd18eadb54a9d6e3fe/src/java.security.jgss/share/classes/sun/security/jgss/GSSUtil.java#L308

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

No branches or pull requests

7 participants