Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutdown hook causes classloader leak when library is used inside a servlet container #376

Closed
vimil opened this issue Oct 18, 2019 · 28 comments

Comments

@vimil
Copy link

vimil commented Oct 18, 2019

Scanresult class adds a shutdown hook. This causes a classloader leak when tomcat unloads a webapplication that uses classgraph library.

Is there a way to prevent the shutdownhook from being added?

@lukehutch
Copy link
Member

Hi @vimil, I just pushed some partial fixes for this -- I say "partial fixes" because all ScanResult instances still have to be closed for all references to the context classloader to be dropped. If you have an app that forgets to close a ScanResult, and the garbage collector doesn't collect the ScanResult before tomcat unloads the webapplication, there's still a chance that a classloader ref will be hanging around (but I think that's true of any classes loaded in the context classloader anyway, so it shouldn't hopefully be a problem).

Can you please test the git master version and let me know if it resolves your issue?

@lukehutch
Copy link
Member

@vimil Actually the more I think about it, the more I think that the change I checked in might completely fix the problem. Assuming that the changes actually work, the only references that ClassGraph (or its shutdown hook) will hold to objects that reference the context classloader should be weak references, so the context classloader should no longer be bound to the lifetime of the JVM.

@lukehutch
Copy link
Member

It's very difficult to drop all refs to context classloaders! But I think the latest checkin should do it.

@vimil
Copy link
Author

vimil commented Oct 18, 2019

Let me test it out and get back to you

@vimil
Copy link
Author

vimil commented Oct 19, 2019

The issue is not resolved because of the following reason.

The anonymous thread class is loaded by the webapp's classloader and put into a shutdownhook map that is part of the system class loader when the addshutdownhook method is called.

So the webappclassloader cannot be garbage collected because there is a strong reference from system classloader via the shutdownhook map to instance of anonymous thread class whose classloader is the webappclassloader.

Unless the shutdownhook is removed the webappclassloader won't get garbage collected.

I think library should just expose a teardown or release method and leave it the to clients using the library to either call the method using a shutdownhook or from a oncontextdestroyed servletcontext listener when running inside a servlet container.

here is link to some approaches to solve this issue

microsoft/ApplicationInsights-Java#513

@lukehutch
Copy link
Member

lukehutch commented Oct 19, 2019

@vimil OK, but where exactly is the classloader reference being held? Previously I already overrode the context classloader reference in the new Thread object by passing the system classloader (not the context classloader) from ScanResult#initShutdownHook() into AutoCloseableExecutorService, then into SimpleThreadFactory, which called thread.setContextClassLoader(threadContextClassLoader). And the thread that is added to the shutdown hook map also has its context classloader overridden in initShutdownHook().

I have a hunch that maybe the reference to the context classloader was being held in the ThreadGroup that the new Thread was part of, since that was inherited from the parent. So I just checked in a change that creates a new ThreadGroup rooted directly below the parent (system) ThreadGroup. Can you please check if this works? If not, can you please locate where exactly the reference to the parent ClassLoader is being held?

There is no reason why the shutdown hook needs to retain a reference to the context classloader, so I would much rather first fix this by dropping that reference. However once that is fixed, I will also provide an option for disabling the shutdown hook if you want to do that, since if you're in the business of loading and reloading apps dynamically, you probably don't want every load to permanently add a shutdown hook.

However I would still like to run the same shutdown logic when a web application is unloaded. Is there another hook I can use in tomcat to run a function when an application is unloaded?

@vimil
Copy link
Author

vimil commented Oct 19, 2019

@lukehutch all java objects including the hookthread instance in this case have a reference to their class object. (the getclass() method returns this reference). The class object has a reference back to the classloader that loaded the class. This is different from the threads contextclass loader and cannot be changed. The thread's contextclassloader is a mechanism used to load other classes from the the threads run method. It is not a way to change the classloader that was used load the anonymous thread class itself. The reference to the classloader that loaded an object cannot be changed after the object is created.

Tomcat provides servletcontext listeners that can be implemented to execute logic during unemployment of a webapp.

Have a look at this link
https://java.jiderhamn.se/2012/01/01/classloader-leaks-ii-find-and-work-around-unwanted-references/

@lukehutch
Copy link
Member

@vimil The master version now has ClassGraph#disableShutdownHook() (call this before the first call to the ClassGraph() constructor) and ScanResult#closeAll() (call this right before web app unload).

The latest checkin also defers the loading of the ScanResult class itself, and loads the class in a thread rooted in the root thread group and with the system classloader set as its context classloader, so that should be loaded within the system classloader. Can you please check if that gets around the issue you described in your last comment?

I will further move the setup of shutdown hooks to the first invocation of scan(), so that the classloader can be queried to see if the class is running in tomcat (and I will make this generic, so other runtime environments can also prevent the use of shutdown hooks).

Can you please give some example code that shows how to detect if the current class is running inside a tomcat container?

@vimil
Copy link
Author

vimil commented Oct 19, 2019

@lukehutch Thanks for looking into this.

If we call getClass().getClassLoader() we will get the classloader the current class is running under.

In the case of tomcat this will be

https://tomcat.apache.org/tomcat-7.0-doc/api/org/apache/catalina/loader/WebappClassLoader.html

However we would want code that is more generic and not specific to just tomcat.

Let me look to see what others have done.

Thanks
Vimil

@lukehutch
Copy link
Member

lukehutch commented Oct 19, 2019

OK, I assumed that calling this on a system class would return the classloader the system class was loaded under.

It is also possible to get the parent classloader of the current classloader. Would that solve the problem?

The Tomcat-specific code can be put here:

https://github.com/classgraph/classgraph/blob/master/src/main/java/nonapi/io/github/classgraph/classloaderhandler/TomcatWebappClassLoaderBaseHandler.java

I can add a method to ClassLoaderHandler that can be overridden to override the shutdown hook registration. Each classloader environment could then decide whether or not to use shutdown hooks.

@vimil
Copy link
Author

vimil commented Oct 19, 2019

From what I read it is not recomended to detect this.

https://stackoverflow.com/questions/2976884/detect-if-running-in-servlet-container-or-standalone

You dont have to use a system class. you can do ScanResult.class.getClassLoader(). That will return tomcats webappclassloader reference.

However as I found out writing this detection logic is not foolproof and cannot be made generic

@lukehutch
Copy link
Member

OK, please take a look at the latest changes. I get the root classloader, and set up a thread that runs with that classloader rather than the context loader, and within that thread, ScanResult is loaded for the first time (so its class should not be tied to the tomcat classloader).

@lukehutch
Copy link
Member

Hmm, actually I'm not sure if those changes are enough, since new Runnable() will also create an object that uses the context classloader. This should not be this hard! I'll have to look up how to run code in the bootstrap classloader. It might require using an InvocationHandler or something exotic like that.

@lukehutch
Copy link
Member

@vimil Actually I now see that it is impossible to do what I was trying to do. Any shutdown hook code must be loaded by the container classloader, otherwise the class wouldn't be able to be found in the general case. So the shutdown hook will always hold a reference to the context classloader. That really sucks :-/

However as I found out writing this detection logic is not foolproof and cannot be made generic

Actually ClassGraph already has to detect if the classloader is a WebappClassLoaderBase or one of its subclasses (e.g. WebappClassLoader). So we already detect if we're running in Tomcat, and that should be 100% reliable. I will work on replacing the shutdown hook with register/unregister logic.

@lukehutch
Copy link
Member

@vimil Please check the latest commit, it registers a ServletContextListener to call ScanResult.closeAll() on servlet shutdown.

I would like to do this without creating a runtime dependency upon Tomcat. In pom.xml I put <scope>provided</scope>, which should create only a compiletime dependency. However having references to classes that are not defined in the classpath for non-Tomcat users is sketchy, and may break something. Is there a way to set up a ServletContextListener dynamically, using reflection and an InvocationHandler?

I tried doing this, but then I noticed this comment: https://stackoverflow.com/a/32887891/3950982

However this can only be done from ServletContainerInitializer.onStartup which is run before any ServletContextListeners are called.

lukehutch added a commit that referenced this issue Oct 19, 2019
@lukehutch
Copy link
Member

lukehutch commented Oct 19, 2019

OK, the latest version should have a fairly clean registration/unregistration mechanism for Tomcat (and I added the same mechanism for Spring):

https://github.com/classgraph/classgraph/blob/master/src/main/java/nonapi/io/github/classgraph/classloaderhandler/lifecycle/ServletLifeCycleListener.java

https://github.com/classgraph/classgraph/blob/master/src/main/java/nonapi/io/github/classgraph/classloaderhandler/lifecycle/SpringLifeCycleListener.java

https://github.com/classgraph/classgraph/blob/master/pom.xml#L139

These should be discovered at runtime by the Tomcat and Spring annotation scanners.

Please take a look. You should see in the log (at level INFO), on container startup:

Tomcat container detected -- disabling ClassGraph shutdown hook

And at shutdown:

Closing any remaning open ClassGraph ScanResult instances

@lukehutch
Copy link
Member

lukehutch commented Oct 19, 2019

The same change should be made for other classloader environments that may frequently load and unload classes. Unfortunately I know very little about any of these. Can anyone suggest other classloader environments that may need proper lifecycle support in this way?:

  • JBoss?
  • OSGi?
  • Equinox?
  • Plexus?
  • WebSphere?
  • Weblogic?
  • Felix?

@vimil
Copy link
Author

vimil commented Oct 19, 2019

since all servlet containers implement servlet 3.0 spec you just need to have a provided dependency on servlet-api
https://mvnrepository.com/artifact/javax.servlet/javax.servlet-api/3.0.1

the weblistener annotation will work for all servlet containers like jboss, weblogic websphere

@lukehutch
Copy link
Member

lukehutch commented Oct 19, 2019

Changed, thanks. Are there any environments on the above list that are not servlet environments, and may need their own lifecycle management?

Also, please let me know if the master version with ServletLifeCycleListener solves the problem for you.

@vimil
Copy link
Author

vimil commented Oct 19, 2019

Sure, I will do that.

osgi equinox felix are osgi containers and implement the bundle listener interface https://osgi.org/javadoc/r4v42/org/osgi/framework/BundleListener.html

I think you could implement that interface for osgi containers

@lukehutch
Copy link
Member

Thanks. I see there is also BundleTracker. However I can't see how to configure BundleListener or BundleTracker with annotations. Is there a way?

Also, is there a possibility, for either servlets, Spring, or OSGi, that a container will be destroyed, but the previously-loaded classes will still be loaded in some parent classloader? The reason I ask is that I have made the assumption that the container lifecycle will be exactly the same as the classloader lifecycle -- so ScanResult.closeAll() is called when the container is destroyed. If there is a possibility that there will be two classloaders, running separate containers, that each share a parent classloader, which ScanResult was loaded into, then this could allow one container to close all the open ScanResult instances in the other container (and that would be bad).

@vimil
Copy link
Author

vimil commented Oct 20, 2019

in the case of servlet containers each webapp loads its own copy of classes from the classgraph library. The classes loaded by one webapp are not visible to another webapp. The only situation when two webapps can share the same library is when the library is placed in the servlet container's classpath. In the case of tomcat this is done by placing the jar in tomcat/lib folder.

However if the classgraph library is placed in tomcat/lib folder then the servletcontextlistener won't get registered because it is not part of any webapp's classpath. So ScanResult.closeall will only get called by the shutdownhook which is what is desired.

@vimil
Copy link
Author

vimil commented Oct 20, 2019

The changes look good to me. I am closing this issue

@vimil vimil closed this as completed Oct 20, 2019
@lukehutch
Copy link
Member

lukehutch commented Oct 20, 2019

OK, thanks for verifying. I should make this work for Spring and OSGi too before pushing out a release, so I'll reopen for now.

If you have any knowledge of those two environments, please comment on the listener registration mechanism (i.e. for Spring, whether the code already committed to master should work, and for OSGi, how to register a BundleListener or BundleTracker in the cleanest way possible, preferably using annotations so that only a provided runtime dependency needs to be added to ClassGraph).

Please also comment if you know if the ScanResult.closeAll() issue is likely to be a problem for these environments (i.e. if one container could end up closing ScanResult instances in another container).

Thanks!

@lukehutch
Copy link
Member

Hmm. Turns out this is not as easy to do in OSGi. For each bundle, a BundleListener will have to be created (using an InvocationHandler, to avoid adding a hard dependency upon an OSGi implementation). I'm going to assume that bundles are not unloaded / uninstalled often enough in OSGi to cause a classloader leak problem (and if it is a problem in the future, we'll cross that bridge when we come to it).

I will also assume that the SpringLifecycleListener is doing the right thing, since it actually wraps a ServletLifecycleListener, and that is doing the right thing.

So I will close this again without OSGi support, but with servlet and (untested) Spring support, and hope it doesn't break Spring for anyone!

@lukehutch
Copy link
Member

Released in version 4.8.51. Thanks @vimil for reporting.

@lukehutch
Copy link
Member

lukehutch commented Oct 20, 2019

@vimil Actually I took this one step further, and entirely removed the shutdown hook, because all resources that ClassGraph may create or keep open should be cleaned up on proper JVM shutdown, even without a shutdown hook, making the shutdown hook redundant:

  • Any temporary files that are created (to extract deflated inner jars from within jars) should be deleted by the system's own shutdown hook, since these files are marked for removal on JVM shutdown by File#deleteOnExit().
  • Any MappedByteBuffer instances should be unmapped by the JVM (or the OS) on exit.
  • Any open files should be closed by the JVM (or the OS) on exit.
  • Any open modules should be closed by the JVM on exit.
  • Memory allocated to ScanResult and its linked objects will be freed on JVM exit.

The shutdown hook has been a source of several problems in the past, so the new method of registering a lifecycle listener and calling ScanResult.closeAll() on container/classloader shutdown is a better system, and allows the JVM to last beyond the lifetime of the classloader that loaded the ScanResult class.

The only issue with this approach is that all runtime environments that do support containerized applications will need to add a lifecycle listener, otherwise ScanResult resources that are not closed may be kept open until the JVM exits. But actually that was the situation already, even with the shutdown hook in place.

Released in version 4.8.52.

@vimil
Copy link
Author

vimil commented Oct 20, 2019

Thanks for getting this issue fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants