Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java 17: resource.isReadable() with concurrency leaks large amounts of non-heap memory #30955

Closed
peterdnight opened this issue Jul 26, 2023 · 4 comments
Assignees
Labels
in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement
Milestone

Comments

@peterdnight
Copy link

peterdnight commented Jul 26, 2023

Scenario:

  • converting application from java 8 to java 17 a large increase in non heap memory occurred.
  • determined critical region: during startup, application scans all classes looking for annotations. This is based on configuration settings, different threads initiate the scan for respective Annotations
  • Prior to scanning the class - resource.isReadable is run.
  • commenting out resource.isReadable check resolved issue.
  • putting the application method (findAnnotation) in a synchronized block also resolved the issue

To reproduce:

  • add the controller snippet below into any Spring RestController
  • verified on both SpringBoot 2.x and 3.x (but no dependencies on SpringBoot)
  • verified using threads = 20, iterations = 10 on a 50 core bare metal centos 9 host
  • ideally there are many largish jars (verified with 90+), but increasing iteration count should expose the issue.
  • verified on the latest java 17, with a smaller heap (-Xmx1G) to clearly identify non-heap memory
  • a single invokation of /javaNativeMemory?threads=20&iterations=10 showed peak of 35g allocation, that settles on ~20G that is never released

Stacktrace:

 [ 0] /usr/lib64/libz.so.1.2.7
  [ 1] [unknown]
  [ 2] java.util.zip.Inflater.inflateBytesBytes
  [ 3] java.util.zip.Inflater.inflate
  [ 4] java.util.zip.InflaterInputStream.read
  [ 5] java.io.InputStream.readNBytes
  [ 6] java.util.jar.JarFile.getBytes
  [ 7] java.util.jar.JarFile.checkForSpecialAttributes
  [ 8] java.util.jar.JarFile.isMultiRelease
  [ 9] java.util.jar.JarFile.getEntry
  [10] sun.net.www.protocol.jar.URLJarFile.getEntry
  [11] sun.net.www.protocol.jar.JarURLConnection.connect
  [12] sun.net.www.protocol.jar.JarURLConnection.getContentLengthLong
  [13] org.springframework.core.io.AbstractFileResolvingResource.checkReadable
  [14] org.springframework.core.io.AbstractFileResolvingResource.isReadable

Controller:

    String SEARCH_PACKAGE = ""; // com, org.sample, ...
    final String packageSearchPath = ResourcePatternResolver.CLASSPATH_ALL_URL_PREFIX + SEARCH_PACKAGE + "/**/*.class";
    ResourceLoader RESOURCE_LOADER = new DefaultResourceLoader(ApiRoot.class.getClassLoader());

    @GetMapping("/native-memory/run")
    public ObjectNode javaNativeMemory(
            int threads,
            int iterations) {

        logger.info("running using: {} threads  and invoking each: {} iterations", threads, iterations);

        ObjectNode report = jacksonMapper.createObjectNode();
        report.put("started", LocalDateTime.now().format(DateTimeFormatter.ofPattern("hh:mm:ss")));
        report.put("duration", -1);

        Timer.Sample theTimer = metricUtilities.startTimer();

        try {
            Resource[] resources = new PathMatchingResourcePatternResolver(RESOURCE_LOADER).getResources(packageSearchPath);
            int loaded = resources.length;

            // isReadable is expensive
            //  - triggering deep call stack to zip.inflate,
            //  - allocates libz.so native memory, and leaks on zing ( thread concurrency )
            ForkJoinPool customThreadPool = new ForkJoinPool(threads);
            List<Integer> listOfNumbers = IntStream.range(0, threads)
                    .boxed()
                    .collect(Collectors.toList());

            customThreadPool.submit(() ->
                    listOfNumbers.parallelStream().forEach(number -> {

                        IntStream.range(0, iterations).forEach(count -> {
                            logger.info("started: {} , name: {}", count, Thread.currentThread().getName());
                            for (final Resource resource : resources) {
                                if (resource.isReadable()) {
                                    // do nothing
                                }
                            }
                            logger.info("completed: {} , name: {}", count, Thread.currentThread().getName());
                        });
                    })
            );

            customThreadPool.shutdown();
            customThreadPool.awaitTermination(2, TimeUnit.MINUTES);
            report.put("resourcesLocated", loaded);
            report.put("totalChecked", loaded * iterations * threads);

        } catch (final Exception e) {
            logger.warn( "Failure while invoking resource.isReadable() {}", e) ;
        }
        long timeNanos = metricUtilities.stopTimer(theTimer, "csap.heap-tester.consume-heap");
        report.put("duration", CSAP.autoFormatNanos(timeNanos));

        doGc();
        long freeBytes = Runtime.getRuntime().freeMemory();

        report.put("heap-post-gc", CSAP.printBytesWithUnits(freeBytes));
        return report;
    }
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Jul 26, 2023
@azul-jf
Copy link

azul-jf commented Oct 27, 2023

We have some further analysis to offer from the JDK side. We would like the Springboot project team to evaluate the feasibility of the change suggested below to ResourceUtils::useCachesIfNecessary, and implement it if possible.

Issue Statement

Methods org.springframework.core.io.UrlResource.isReadable() and org.springframework.c.t.c.SimpleMetadataReaderFactory.getMetadataReader(Resource resource) end up calling org.springframework.u.ResourceUtils.useCachesIfNecessary(URLConnection con), which calls sun.net.w.p.j.JarURLConnection.setUseCaches(false) for resources in Jar files.

When isReadable and getMetadataReader then call JarURLConnection.connect(), there are no cached connections, and each JarURLConnection for a particular Jar file gets its own, additional, j.u.z.Inflater instance with a reference to an off-heap buffer. Modern Java reference handling delays collection of these buffers for two gc cycles after use, and even then the glibc per-thread malloc() cache retains the memory after it’s been free()’d.

These buffers are in addition to the per-jar buffer created in order to provision the MRJAR feature introduced in Java 9 (JEP 238).

In large Springboot applications which call either of these methods for the jar population, the extra RSS usage for these j.u.z.Inflater buffers can amount to tens of gigabytes. In a Linux environment where Xmx == Xms, swap is disabled, and the RSS budget is tight, this risks the process being reaped by the OOM killer.

We have shown, by patching JarURLConnection.setUseCaches to ignore a false setting, that not setting useCaches to false for a JarURLConnection is sufficient to avoid RSS growth. However, this change of the API's behavior is not an acceptable solution.

Two Workarounds and a Proposed Resolution

  1. Disable MRJAR (-Djdk.util.jar.enableMultiRelease=false) if possible
  2. Use malloc_trim (jcmd <pid> System.trim_native_heap, Prime: -XX:MallocTrimPeriodicTaskIntervalSecs=300 -XX:+UnlockExperimentalVMOptions)
  3. Modify org.springframework.util.ResourceUtils.useCachesIfNecessary() not to call setUseCaches(false) for jar files (which it already avoids for JNLP resources. Do these exist? We can't find any.)

Disclaimer

We are not aware of the origin of or justification for the JarURLConnection.setUseCaches(false) call in ResourceUtils.useCachesIfNecessary(), and can only speculate.

@sdeleuze sdeleuze self-assigned this Nov 20, 2023
@sdeleuze
Copy link
Contributor

ResourceUtils.useCachesIfNecessary() was introduced for web app hot reloading in a Servlet container (see also #13755 for JNLP resources special handling), but that's not a very common use case anymore, so using cache by default could be an option to consider if there is indeed an impact on RSS. I will check and provide an update.

@sdeleuze sdeleuze added in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Nov 20, 2023
@sdeleuze sdeleuze added this to the 6.1.1 milestone Nov 20, 2023
@sdeleuze
Copy link
Contributor

sdeleuze commented Nov 20, 2023

I can indeed reproduce with Spring Boot 3.2.0-RC2 and Spring Framework 6.1.0 and confirm this change make a huge difference in term of RSS consumption.

sdeleuze added a commit to sdeleuze/spring-framework that referenced this issue Nov 21, 2023
In order to prevent leaks of large amounts of non-heap
memory (and potential other efficiency and performance side
effects), this commit turns ResourceUtils#useCachesIfNecessary
into a no-op method and deprecates it. It is recommended to
not use it anymore.

This method has been originally introduced via spring-projectsgh-9316 and
spring-projectsgh-13755 to avoid I/O failure during webapp hot reloading in
Servlet containers. This is not a popular deployment mode anymore
and we have not been able to reproduce the original issue with
a Java 17 JVM and Tomcat 10.

Closes spring-projectsgh-30955
@sdeleuze
Copy link
Contributor

sdeleuze commented Nov 21, 2023

This is now fixed on main and will be available as of Spring Framework 6.1.1.

@peterdnight I have not been able to reproduce with Spring Framework 6.0, so for now no backport is planned. I suspect this could be related to the fact the issue happen only with multi release JARs (which is the case for Spring Framework 6.1 JARs but not 6.0 JARs). Maybe your repro was using multi release JARs in the application dependencies. Feel free to send a Spring Boot 3.1 + Framework 6.0 self contained repro as an attached archive or a Git repository if you want us to reconsider if we should backport or not. If possible, I think we would like to keep this change 6.1 only to avoid unwanted side effects on existing deployments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

4 participants