Java 17: resource.isReadable() with concurrency leaks large amounts of non-heap memory #30955

peterdnight · 2023-07-26T18:43:51Z

Scenario:

converting application from java 8 to java 17 a large increase in non heap memory occurred.
determined critical region: during startup, application scans all classes looking for annotations. This is based on configuration settings, different threads initiate the scan for respective Annotations
Prior to scanning the class - resource.isReadable is run.
commenting out resource.isReadable check resolved issue.
putting the application method (findAnnotation) in a synchronized block also resolved the issue

To reproduce:

add the controller snippet below into any Spring RestController
verified on both SpringBoot 2.x and 3.x (but no dependencies on SpringBoot)
verified using threads = 20, iterations = 10 on a 50 core bare metal centos 9 host
ideally there are many largish jars (verified with 90+), but increasing iteration count should expose the issue.
verified on the latest java 17, with a smaller heap (-Xmx1G) to clearly identify non-heap memory
a single invokation of /javaNativeMemory?threads=20&iterations=10 showed peak of 35g allocation, that settles on ~20G that is never released

Stacktrace:

 [ 0] /usr/lib64/libz.so.1.2.7
  [ 1] [unknown]
  [ 2] java.util.zip.Inflater.inflateBytesBytes
  [ 3] java.util.zip.Inflater.inflate
  [ 4] java.util.zip.InflaterInputStream.read
  [ 5] java.io.InputStream.readNBytes
  [ 6] java.util.jar.JarFile.getBytes
  [ 7] java.util.jar.JarFile.checkForSpecialAttributes
  [ 8] java.util.jar.JarFile.isMultiRelease
  [ 9] java.util.jar.JarFile.getEntry
  [10] sun.net.www.protocol.jar.URLJarFile.getEntry
  [11] sun.net.www.protocol.jar.JarURLConnection.connect
  [12] sun.net.www.protocol.jar.JarURLConnection.getContentLengthLong
  [13] org.springframework.core.io.AbstractFileResolvingResource.checkReadable
  [14] org.springframework.core.io.AbstractFileResolvingResource.isReadable

Controller:

    String SEARCH_PACKAGE = ""; // com, org.sample, ...
    final String packageSearchPath = ResourcePatternResolver.CLASSPATH_ALL_URL_PREFIX + SEARCH_PACKAGE + "/**/*.class";
    ResourceLoader RESOURCE_LOADER = new DefaultResourceLoader(ApiRoot.class.getClassLoader());

    @GetMapping("/native-memory/run")
    public ObjectNode javaNativeMemory(
            int threads,
            int iterations) {

        logger.info("running using: {} threads  and invoking each: {} iterations", threads, iterations);

        ObjectNode report = jacksonMapper.createObjectNode();
        report.put("started", LocalDateTime.now().format(DateTimeFormatter.ofPattern("hh:mm:ss")));
        report.put("duration", -1);

        Timer.Sample theTimer = metricUtilities.startTimer();

        try {
            Resource[] resources = new PathMatchingResourcePatternResolver(RESOURCE_LOADER).getResources(packageSearchPath);
            int loaded = resources.length;

            // isReadable is expensive
            //  - triggering deep call stack to zip.inflate,
            //  - allocates libz.so native memory, and leaks on zing ( thread concurrency )
            ForkJoinPool customThreadPool = new ForkJoinPool(threads);
            List<Integer> listOfNumbers = IntStream.range(0, threads)
                    .boxed()
                    .collect(Collectors.toList());

            customThreadPool.submit(() ->
                    listOfNumbers.parallelStream().forEach(number -> {

                        IntStream.range(0, iterations).forEach(count -> {
                            logger.info("started: {} , name: {}", count, Thread.currentThread().getName());
                            for (final Resource resource : resources) {
                                if (resource.isReadable()) {
                                    // do nothing
                                }
                            }
                            logger.info("completed: {} , name: {}", count, Thread.currentThread().getName());
                        });
                    })
            );

            customThreadPool.shutdown();
            customThreadPool.awaitTermination(2, TimeUnit.MINUTES);
            report.put("resourcesLocated", loaded);
            report.put("totalChecked", loaded * iterations * threads);

        } catch (final Exception e) {
            logger.warn( "Failure while invoking resource.isReadable() {}", e) ;
        }
        long timeNanos = metricUtilities.stopTimer(theTimer, "csap.heap-tester.consume-heap");
        report.put("duration", CSAP.autoFormatNanos(timeNanos));

        doGc();
        long freeBytes = Runtime.getRuntime().freeMemory();

        report.put("heap-post-gc", CSAP.printBytesWithUnits(freeBytes));
        return report;
    }

azul-jf · 2023-10-27T02:28:22Z

We have some further analysis to offer from the JDK side. We would like the Springboot project team to evaluate the feasibility of the change suggested below to ResourceUtils::useCachesIfNecessary, and implement it if possible.

Issue Statement

Methods org.springframework.core.io.UrlResource.isReadable() and org.springframework.c.t.c.SimpleMetadataReaderFactory.getMetadataReader(Resource resource) end up calling org.springframework.u.ResourceUtils.useCachesIfNecessary(URLConnection con), which calls sun.net.w.p.j.JarURLConnection.setUseCaches(false) for resources in Jar files.

When isReadable and getMetadataReader then call JarURLConnection.connect(), there are no cached connections, and each JarURLConnection for a particular Jar file gets its own, additional, j.u.z.Inflater instance with a reference to an off-heap buffer. Modern Java reference handling delays collection of these buffers for two gc cycles after use, and even then the glibc per-thread malloc() cache retains the memory after it’s been free()’d.

These buffers are in addition to the per-jar buffer created in order to provision the MRJAR feature introduced in Java 9 (JEP 238).

In large Springboot applications which call either of these methods for the jar population, the extra RSS usage for these j.u.z.Inflater buffers can amount to tens of gigabytes. In a Linux environment where Xmx == Xms, swap is disabled, and the RSS budget is tight, this risks the process being reaped by the OOM killer.

We have shown, by patching JarURLConnection.setUseCaches to ignore a false setting, that not setting useCaches to false for a JarURLConnection is sufficient to avoid RSS growth. However, this change of the API's behavior is not an acceptable solution.

Two Workarounds and a Proposed Resolution

Disable MRJAR (-Djdk.util.jar.enableMultiRelease=false) if possible
Use malloc_trim (jcmd <pid> System.trim_native_heap, Prime: -XX:MallocTrimPeriodicTaskIntervalSecs=300 -XX:+UnlockExperimentalVMOptions)
Modify org.springframework.util.ResourceUtils.useCachesIfNecessary() not to call setUseCaches(false) for jar files (which it already avoids for JNLP resources. Do these exist? We can't find any.)

Disclaimer

We are not aware of the origin of or justification for the JarURLConnection.setUseCaches(false) call in ResourceUtils.useCachesIfNecessary(), and can only speculate.

sdeleuze · 2023-11-20T15:19:56Z

ResourceUtils.useCachesIfNecessary() was introduced for web app hot reloading in a Servlet container (see also #13755 for JNLP resources special handling), but that's not a very common use case anymore, so using cache by default could be an option to consider if there is indeed an impact on RSS. I will check and provide an update.

sdeleuze · 2023-11-20T17:01:07Z

I can indeed reproduce with Spring Boot 3.2.0-RC2 and Spring Framework 6.1.0 and confirm this change make a huge difference in term of RSS consumption.

In order to prevent leaks of large amounts of non-heap memory (and potential other efficiency and performance side effects), this commit turns ResourceUtils#useCachesIfNecessary into a no-op method and deprecates it. It is recommended to not use it anymore. This method has been originally introduced via spring-projectsgh-9316 and spring-projectsgh-13755 to avoid I/O failure during webapp hot reloading in Servlet containers. This is not a popular deployment mode anymore and we have not been able to reproduce the original issue with a Java 17 JVM and Tomcat 10. Closes spring-projectsgh-30955

sdeleuze · 2023-11-21T11:13:30Z

This is now fixed on main and will be available as of Spring Framework 6.1.1.

@peterdnight I have not been able to reproduce with Spring Framework 6.0, so for now no backport is planned. I suspect this could be related to the fact the issue happen only with multi release JARs (which is the case for Spring Framework 6.1 JARs but not 6.0 JARs). Maybe your repro was using multi release JARs in the application dependencies. Feel free to send a Spring Boot 3.1 + Framework 6.0 self contained repro as an attached archive or a Git repository if you want us to reconsider if we should backport or not. If possible, I think we would like to keep this change 6.1 only to avoid unwanted side effects on existing deployments.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Jul 26, 2023

sdeleuze self-assigned this Nov 20, 2023

sdeleuze added in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Nov 20, 2023

sdeleuze added this to the 6.1.1 milestone Nov 20, 2023

sdeleuze closed this as completed in bec385d Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java 17: resource.isReadable() with concurrency leaks large amounts of non-heap memory #30955

Java 17: resource.isReadable() with concurrency leaks large amounts of non-heap memory #30955

peterdnight commented Jul 26, 2023 •

edited

azul-jf commented Oct 27, 2023 •

edited

sdeleuze commented Nov 20, 2023

sdeleuze commented Nov 20, 2023 •

edited

sdeleuze commented Nov 21, 2023 •

edited

Java 17: resource.isReadable() with concurrency leaks large amounts of non-heap memory #30955

Java 17: resource.isReadable() with concurrency leaks large amounts of non-heap memory #30955

Comments

peterdnight commented Jul 26, 2023 • edited

azul-jf commented Oct 27, 2023 • edited

sdeleuze commented Nov 20, 2023

sdeleuze commented Nov 20, 2023 • edited

sdeleuze commented Nov 21, 2023 • edited

peterdnight commented Jul 26, 2023 •

edited

azul-jf commented Oct 27, 2023 •

edited

sdeleuze commented Nov 20, 2023 •

edited

sdeleuze commented Nov 21, 2023 •

edited