Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAXB Implementation not found in Java 11 when using Failsafe with Retries & Spring Boot #953

Closed
tobias-bahls opened this issue Jul 5, 2020 · 23 comments

Comments

@tobias-bahls
Copy link
Contributor

When using riptide-soap together with retries from riptide-failsafe in Java 11 on a packaged Spring Boot jar, the call fails with javax.xml.bind.JAXBException: Implementation of JAXB-API has not been found on module path or classpath. despite the correct JARs being in the classpath.

Description

In this specific combination:

  • riptide-soap
  • riptide-failsafe with retries enabled
  • Java 11
  • Packaged spring boot jar

The call to create a JAXBContext with the aforementioned message. If retries are disabled, everything works as expected.

The root cause of the issue seems like that Failsafe is using ForkJoinPool.commonPool() internally, resulting us in hitting spring-projects/spring-boot#15737.

Possible Fix

I can think of two options:

  • Explicitly passing the current class' class loader to JAXBContext.newInstance(). This requires us to switch to creating JAXBContext with contextPath instead of the class to be bound.
  • Explicitly supplying an Executor to Failsafe that creates threads with the proper context class loader.

I don't know which of these options is better, but I'd be happy to implement the change if there is some consensus.

Steps to Reproduce

I can provide a sample project if needed

Your Environment

  • Version used: 3.0.0-RC.14
@fatroom
Copy link
Member

fatroom commented Jul 5, 2020

First proposal seems a bit more appropriate from my point of view.

@whiskeysierra
Copy link
Collaborator

whiskeysierra commented Jul 5, 2020 via email

@fatroom
Copy link
Member

fatroom commented Jul 6, 2020

@whiskeysierra can you give a bit more highlight on why it's "poor choice"?

@whiskeysierra
Copy link
Collaborator

whiskeysierra commented Jul 6, 2020 via email

@tobias-bahls
Copy link
Contributor Author

I didn't look too close on this yet, but at a first glance it didn't look like IO is performed on the common pool. It looked like the task in the common pool was instantly completed because AsyncPlugin would create another task on the 'real' executor for this client. The original exception definitely occurred on a thread from the clients' pool.

@tobias-bahls
Copy link
Contributor Author

It turns out that

Explicitly passing the current class' class loader to JAXBContext.newInstance(). This requires us to switch to creating JAXBContext with contextPath instead of the class to be bound.

Would be a breaking change. Using contextPath requires jaxb.index or an ObjectFactory to be present. If one uses hand-rolled classes (like the tests do), the usage of contextPath breaks their implementation.

@fatroom
Copy link
Member

fatroom commented Jul 6, 2020

would you be able to create non breaking change utilising second approach?

@tobias-bahls
Copy link
Contributor Author

At this point, I'm not sure what even the correct fix is. We could:

  1. Create a ForkJoinPool with a custom ForkJoinWorkerThreadFactory.
  2. Create an additional Executor just for Failsafe.
  3. Re-use the Executor of AsyncPlugin for Failsafe.
  4. Don't use AsyncPlugin if FailsafePlugin is active and has configured Policies

(1) Would fix the issue I'm seeing, but it feels to me like there is something deeper going on here. For (2), we'd need to have the Executor configurable. Configuring two thread pools seems wasteful to me and would also confuse potential users. I did look briefly into (3), it seems like this requires a lot of refactoring for which I currently don't have the capacity. Same goes for (4).

(3) and (4) also rely on the assumption that there is no need for having two separate Thread Pools for Failsafe and the actual request execution. I don't know enough about Failsafe and Riptide though to judge if this assumption is correct.

@whiskeysierra
Copy link
Collaborator

Regarding (2), the Failsafe plugin currently takes an optional scheduler, right? But no executor.

@tobias-bahls
Copy link
Contributor Author

That's right, I got the classes confused. However it looks like Scheduler is able to just wrap Executor, so the points still stand.

I feel I should also document our workaround in case someone else has this problem: We're essentially running fix (1). We added this class:

public class JaxbForkJoinWorkerThreadFactory implements ForkJoinWorkerThreadFactory {
    @Override
    public ForkJoinWorkerThread newThread(ForkJoinPool pool) {
        ForkJoinWorkerThread thread = new JaxbForkJoinWorkerThread(pool);
        thread.setContextClassLoader(Thread.currentThread().getContextClassLoader());
        return thread;
    }

    static class JaxbForkJoinWorkerThread extends ForkJoinWorkerThread {
        protected JaxbForkJoinWorkerThread(ForkJoinPool pool) {
            super(pool);
        }
    }
}

And configured this ThreadFactory in main before booting the Spring Application. It needs to be in main (as opposed to passing the system property via command line) because the class is only loaded after the spring boot launcher has run.

System.setProperty("java.util.concurrent.ForkJoinPool.common.threadFactory",JaxbForkJoinWorkerThreadFactory::class.qualifiedName!!)

In our case we also needed to override the default parallelism of the ForkJoinPool to force Failsafe to use the common pool.

System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", Math.max(2, Runtime.getRuntime().availableProcessors()).toString())

It's all super hacky, but it works. But I will be able to sleep easier at night if we can remove this code and get this fixed in Riptide ;)

@whiskeysierra
Copy link
Collaborator

Could we have a default TaskDecorator that we apply to each execution?

/**
 * @see <a href="https://github.com/zalando/riptide/issues/953">JAXB + ForkJoinPool</a>
 */
@API(status = EXPERIMENTAL)
@AllArgsConstructor
private final class PreserveContextClassLoader implements TaskDecorator {

    private final ClassLoader loader = Thread.currentThread().getContextClassLoader();

    @Override
    public <T> ContextualSupplier<T> decorate(final ContextualSupplier<T> supplier) {
        final Span span = tracer.activeSpan();

        return context -> {
            Thread.currentThread().setContextClassLoader(loader);
            return supplier.get(context);
        };
    }

}

@buzzlighty
Copy link
Contributor

I can not reproduce it locally, however I experience this bug on production containerised environment. Could JRE version influence appearance of this bug?

@whiskeysierra
Copy link
Collaborator

whiskeysierra commented Jan 26, 2021 via email

@buzzlighty
Copy link
Contributor

I was able to reproduce it locally. I can confirm both circuit breaker and retries cause the bug.

@whiskeysierra
Copy link
Collaborator

Can you note down the steps to reproduce it here?

@buzzlighty
Copy link
Contributor

The application is built with Spring Boot and following libraries versions:
Spring Boot version: 2.3.7.RELEASE
Riptide version: 3.0.0-RC.15
Java version: openjdk version "11.0.5" 2019-10-15

I have the following Riptide configuration:

riptide:
  defaults:
    auth:
      enabled: true
    tracing:
      enabled: true
    metrics:
      enabled: false   
    transient-fault-detection:
      enabled: true
  clients:    
    test-ws-soap:
      base-url: "https://example.com"
      threads:
        enabled: true
        min-size: 2
        max-size: 10
        keep-alive: 1 minute
        queue-size: 10
      circuit-breaker:
        enabled: true
        failure-threshold: 5 out of 10
        success-threshold: 5
        delay: 5 seconds    
      soap:
        enabled: true
      tracing:
        tags:
          peer.service: customer
          span.kind: client
      # only for idempotent calls
      retry:
        enabled: false
        max-retries: 1

And the following call:

final var capture = Capture.<SoapResponse>empty();

return customerWsClient.post(PATH)
    .attribute(OPERATION_NAME, "op_name")
    .headers(customHeaders)
    .header("Authorization", "Bearer ...")
    .body(new SoapRequest())
    .dispatch(series(),
        on(SUCCESSFUL).call(SoapResponse.class, capture),
        on(SERVER_ERROR).dispatch(status(),
            on(INTERNAL_SERVER_ERROR).call(SOAPFault.class, fault -> {
              throw new ExternalServiceException(new SOAPFaultException(fault));
            }),
            on(SERVICE_UNAVAILABLE).call(httpResponse -> {
              throw new ExternalServiceUnavailableException(httpResponse.getStatusText());
            })))
    .thenApply(capture)
    .exceptionally(e -> {
      log.error("soap exception: ", e);
      return new SoapResponse;
    })
    .join()
    .isSuccess();

Enabling circuit breaker and retires or any of will cause an exception:

java.util.concurrent.CompletionException: com.google.common.util.concurrent.UncheckedExecutionException: javax.xml.bind.JAXBException: Implementation of JAXB-API has not been found on module path or classpath.
 - with linked exception:
[java.lang.ClassNotFoundException: com.sun.xml.bind.v2.ContextFactory]

@buzzlighty
Copy link
Contributor

buzzlighty commented Jan 27, 2021

@whiskeysierra how soon can we apply a TaskDecorator solution? Do you still see any drawbacks of that?

@whiskeysierra
Copy link
Collaborator

whiskeysierra commented Jan 27, 2021 via email

@buzzlighty
Copy link
Contributor

buzzlighty commented Jan 29, 2021

@whiskeysierra do you have any preference in:

  1. Reusing AsyncPlugin's executor for Failsafe.
  2. Separate executor for Failsafe.
  3. Task decorator

I don't have any strong opinion, but I could have invested some time into PR.
edit: I think 1 and 2 looks better as we don't mess with switching TCCL and refactorings to be done are not that huge.

@buzzlighty
Copy link
Contributor

@whiskeysierra I have created PR from your idea with task decorator. However I have no idea how to test it currently.

@buzzlighty
Copy link
Contributor

@whiskeysierra when it's going to be released?

@fatroom
Copy link
Member

fatroom commented Oct 7, 2021

@buzzlighty sorry, I missed this fix. I will take care and cut release till the end of this week

@buzzlighty
Copy link
Contributor

Unfortunately we experience the same issue again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants