Memory leak when request body is anything but JSON #2665

carlosbarragan · 2020-01-20T14:15:32Z

Step to reproduce

The bug cannot be easily reproduced locally. However I have data that supports the problem at hand.

Some information about our application:

Our micronaut application runs in production on a K8s cluster and gets around 5K request per minute. We use kotlin and Reactor in our application.

We have been observing that the memory consumption increases with time and after around 6 hours the Pods need to be restarted by K8s because they are out of memory.
If the load increases, peaks etc., then the restart happens before 6 hours.

The application gets requests in XML and calls other services (using JSON) and does some Database (MongoDB) calls. Really nothing extraordinary. In order to handle the XML, we inject the body in the controller as a String as follows:

fun createSession(@Body rawBody: String, httpHeader: HttpHeaders): Mono<out HttpResponse<String>>

and afterwards we parse the request using the XmlMapper from Jackson.

Since the clients making the requests to the application are programmed in a way that they always expect an HTTP 200, the application always returns 200 even if there are errors. That is also another reason why we parse the payload inside the controller (the first reason being lack of support for parsing XML directly by Micronaut), in order to handle exceptions that may happen when parsing the request.

A link to a stripped down version of the applicaiton can be found at the bottom of this issue. Please notice that we removed a lot of code that was not necessary to describe our problem.

Analysis

We did a memory dump analysis (see screenshot) and observed that obects of type io.netty.buffer.PoolChunk were consuming a lot of memory and, somehow, they weren't being released. These objects were actually holding the request body as seen in the screenshot (actual request is not shown in full due to privacy agreement with our customer). The screenshot was taken around 30 minutes after deployment. We observed that these objects were accumulating during the livespan of the application and therefore were the cause of the high memory consumption that leads to the Pods being restarted by K8s every 6 hours.

Objects in Heap

Content of the PoolChunk object

We also observed several log statements as follows:
LEAK: ByteBuf.release() was not called before it's garbage-collected.

We tried our best to reproduce the problem locally. To do that we needed to create similar load as in production. That was not the difficult part. The problem was that we did not want to stress-load the other systems that were being called by the application for a long time, therefore it was not possible for us to reproduce the bug locally.

At this point in time our assumption was that most of the request bodies were not being collected by GC but we didn't know why.

However, while doing a debug session, with a lot of breakpoints all over the place, suddenly the Netty leak detector was called and we could determine the exact location of the leak (see screenshot). We were testing with several requests and we noticed one request failed because one of the external services responded with error. We then execute the exact same request and then, the leak detector was called. Could it be that the error of the external service caused Netty not to clean up the request?

Netty Leak detector:

The location of the reported leak was the ByteBufConverters. Apparently, the convereter transforms a ByteBuffer into a String but it never releases the ByteBuffer. After further investigation, we found out that when Micronaut processes a JSON payload, it does release the payload. In the class JsonContentProcessor method onData() you can find the follwoing code:

 protected void onData(ByteBufHolder message) {
      ByteBuf content = message.content();

      try {
          byte[] bytes = ByteBufUtil.getBytes(content);
          this.jacksonProcessor.onNext(bytes);
      } finally {
          ReferenceCountUtil.release(content);
      }

  }

So, it looks like Micronaut reads the payload from the ByteBuffer, parses it and then releases the ByteBuffer afterwards. That does not happen with any other kind of payload or at least we could not find it.

That piece of code gave us the idea for our workaround

Workaraound

Instead of injecting the body as String, we inject it as a CompositeByteBuf. By doing this, Micronaut does not do any kind of conversion, so we got the payload as is and we get the contents in a similar way like the JsonContentProcessor meaning, we read the ByteBuffer and release it ourselves afterwards:

fun createSession(@Body rawBody: CompositeByteBuf, httpHeader: HttpHeaders): Mono<out HttpResponse<String>> {
    val rawBody: String = bodyBytes.releaseAndGetString()

}

The releaseAndGetString() extension function is implemented as follows:

fun ByteBuf.releaseAndGetString(): String {
    return try {
        val byteArray = ByteBufUtil.getBytes(this)
        String(byteArray)
    } finally {
        ReferenceCountUtil.release(this)
    }
}

We applied this workaround and deployed to production and since then everything is running smoothly. No Pods restarts in the last 5 days and memory consumption remains stable at around 200 MB of heap. We think, this is a strong indicator that the payload was not being properly realeased either by Micronaut or by Netty.

Expected Behaviour

No memory leaks should happen when injecting a body that is not json. If the developer is expected to take care of releasing the ByteBuffer it should be documented that way.

Environment Information

Operating System: Running in Docker based on image openjdk:11-jdk-slim-stretch
Micronaut Version: 1.2.6
JDK Version: 11

Example Application

https://github.com/carlosbarragan/micronaut-memleak-example

The text was updated successfully, but these errors were encountered:

jameskleeh · 2020-01-20T14:42:18Z

The body should be cleaned up here https://github.com/micronaut-projects/micronaut-core/blob/1.2.x/http-server-netty/src/main/java/io/micronaut/http/server/netty/RoutingInBoundHandler.java#L198

jameskleeh · 2020-01-20T14:45:42Z

Also I don't think this is an issue in 1.3 because of the changes there to binding to raw bodies

carlosbarragan · 2020-01-21T14:49:48Z

The body should be cleaned up here https://github.com/micronaut-projects/micronaut-core/blob/1.2.x/http-server-netty/src/main/java/io/micronaut/http/server/netty/RoutingInBoundHandler.java#L198

@jameskleeh I debugged our application by setting a breakpoint there but the code was never called.

Is there already a snapshot / mileston release for 1.3? I would like to test our application with 1.3.

jameskleeh · 2020-01-21T14:58:13Z

@carlosbarragan Yes there are 2 milestones and snapshots for 1.3

jameskleeh · 2020-02-11T16:24:44Z

Closing this due to a lack of feedback. If you can reproduce this issue with 1.3, please let me know and I'll reopen.

carlosbarragan · 2020-02-12T10:36:13Z

We'll try to test with 1.3 as soon as possible.

BTW, we noticed we cannot pass the body as @Body rawBody: CompositeByteBuf in Micronaut 1.2.7 and later. We get an UnsatisfiedRouteResolutionException. Apparently, the converters cannot handle that. It worked with 1.2.6

carlosbarragan · 2020-02-12T12:16:54Z

We tested with 1.3 and, sadly, the memory leak is still there.

Here are the stats:

With 1.2.6 and passing the body as CompositeByteBuf as described in the workaround and after running for 20 hours:

Heap usage around 180 MB
Total Memory requested by the pod around 600 MB

With 1.3.0 and passing the body as String (passing it as CompositeByteBuf does not work) after running for 90 minutes:

Heap usage around 500 MB
Total memory requested by the pod around 900 MB.

@jameskleeh please reopen as the problem still persists.

carlosbarragan · 2020-02-12T12:21:05Z

I find it strange that the JsonContentProcessor.onData releases the ByteBuf after passing the payload to the jacksonProcessor but nothing similar happens for other kind of payloads. Maybe it is done somewhere else but I just wanted to mention it.

graemerocher · 2020-02-12T13:18:36Z

So far I took the example application and enabled paranoid level resource leak detection:

  @JvmStatic
    fun main(args: Array<String>) {
        ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID)

Then created a postdata.txt file as follows:

<Session-Request><device-id>test1</device-id><device-context>test2</device-context></Session-Request>

Then ran ab and put the application under load:

ab -p postdata.txt -T application/xml -c 20 -n 10000 http://localhost:8080/api/sessions

I observed that in the profiler there were indeed some byte buf objects records around and memory consumption went up to 800mb but they were placed in "Objects unreachable from GC roots but not yet collected" in YourKit.

Running the app with a memory cap java -Xmx64m -jar build/libs/mnleak-0.1-all.jar kept the memory consumption at around 250mb and triggered the GC to collect these objects which in fact were collected and didn't appear in YourKit.

graemerocher · 2020-02-12T13:24:46Z

Within a debugger the bytebuf that produced the string was released here:

micronaut-core/http-server-netty/src/main/java/io/micronaut/http/server/netty/NettyHttpRequest.java

Line 249 in 1761882

Object body = getBody().orElse(null);

I observed no leak messages from Netty's ResourceLeakDetector whatsoever even when the application was placed under load.

Other than GC pauses the application was never unavailable at any point.

graemerocher · 2020-02-12T13:41:44Z

In the example app I monitored it via VisualVM and after a GC cycle no classes in io.netty.buffer.* are retained.

carlosbarragan · 2020-02-12T15:09:36Z

We also tried to reproduce the bug locally using the real application and putting it under load, but we could not successfully reproduce it.

We observed on production that the application starts consuming memory under load after some minutes ( > 30 mins). We have not tried to run it locally that long, but maybe this could help.

We also have observed that the memory increases rapidly when the application is getting way too much load (10,000 req / min per Pod). With the normal load, it takes way longer for the memory consumption to increase

carlosbarragan · 2020-02-12T15:48:30Z

Within a debugger the bytebuf that produced the string was released here:

micronaut-core/http-server-netty/src/main/java/io/micronaut/http/server/netty/NettyHttpRequest.java

Line 249 in 1761882

Object body = getBody().orElse(null);

I observed no leak messages from Netty's ResourceLeakDetector whatsoever even when the application was placed under load.

Other than GC pauses the application was never unavailable at any point.

@graemerocher I debugged a single request on our actual application and the release() method in NettyHttpRequest was never called.

graemerocher · 2020-02-12T15:49:20Z

@carlosbarragan Interesting. What we need is an example that reproduces that

carlosbarragan · 2020-02-12T16:01:45Z

@graemerocher how can I provide an example without posting a real request due to privacy and NDA with our customer? I am open to do a video session with you or one of your team members if you believe this could help. Perhaps we are missing some configuration or something like that.

graemerocher · 2020-02-12T16:13:57Z

I'm investigating some other approaches that moves that logic that calls release() elsewhere. Once I have something I will ask you to try a snapshot

carlosbarragan · 2020-02-12T16:24:24Z

Ok.

FYI the application also has a Json API. I debugged a request to this API and the release() method wasn't called either.

graemerocher · 2020-02-12T16:37:10Z

@carlosbarragan Can you try the 1.3.1.BUILD-SNAPSHOT version and report back? See https://docs.micronaut.io/latest/guide/index.html#usingsnapshots

carlosbarragan · 2020-02-12T16:56:15Z

@graemerocher it turns out, the release() method was not being called because our REST Client was keeping the connection alive. As soon as we closed the REST Client the channelInactive() method was called as well as the release() method.

Nevertheless, we noticed only the last request body we sent was being released. All other requests we sent before closing the client were not clean up.

I don't know the internal workings of Netty but could it be that it reuses the same channel under load and therefore the channel never becomes inactive? This could lead to the situation where the requests are not properly released.

graemerocher · 2020-02-12T17:01:36Z

@carlosbarragan does the snapshot resolve the issue?

carlosbarragan · 2020-02-12T17:04:11Z

Haven’t tried yet. I’ll try it tomorrow morning

graemerocher · 2020-02-12T17:26:11Z

@carlosbarragan I have pushed another potential fix since I am not 100% sure my first fix will solve the problem based on the information you supplied that is relates to persistent connections.

If you can try the snapshot tomorrow that would be great. Thanks for collaborating on this.

carlosbarragan · 2020-02-12T18:01:47Z

@graemerocher Im almost certain this problem is related to Keep-alive connections. I quickly verified and could confirm that the keep-alive header is there in our prod environment.

carlosbarragan · 2020-02-13T09:30:13Z

Fix looks promising. I tested it locally with persistent connection and the request was correctly cleaned up. I will deploy it and give feedback as soon as possible.

carlosbarragan · 2020-02-13T11:00:53Z

@graemerocher we deployed the fix and let it run for about an hour and it seems it works!

Some stats of the application after running for 60 minutes:

Heap consumption is about 140 MB! (Better than with the workaround)
Pod memory consumption about 500 MB.

Thank you very much for your quick support!

graemerocher · 2020-02-13T11:06:15Z

Great news, thanks for the feedback

carlosbarragan changed the title ~~Memory leak when request body is anyhting but JSON~~ Memory leak when request body is anything but JSON Jan 21, 2020

graemerocher added the status: awaiting feedback label Jan 23, 2020

jameskleeh closed this as completed Feb 11, 2020

graemerocher reopened this Feb 12, 2020

graemerocher added the status: awaiting validation Waiting to be validated as a real issue label Feb 12, 2020

graemerocher added this to the 1.3.1 milestone Feb 12, 2020

graemerocher added a commit that referenced this issue Feb 12, 2020

Potential fix for #2665

f942456

graemerocher added a commit that referenced this issue Feb 12, 2020

Cleanup after writing responses. Potential fix for #2665

c072bc5

graemerocher added a commit that referenced this issue Feb 12, 2020

Cleanup after writing responses. Potential fix for #2665

e2ec047

graemerocher closed this as completed Feb 13, 2020

jameskleeh assigned graemerocher Feb 13, 2020

barancev mentioned this issue Feb 7, 2021

Server has a memory leak, io.netty.buffer.PoolChunk consumes a lot of memory SeleniumHQ/selenium#9152

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when request body is anything but JSON #2665

Memory leak when request body is anything but JSON #2665

carlosbarragan commented Jan 20, 2020 •

edited

jameskleeh commented Jan 20, 2020

jameskleeh commented Jan 20, 2020

carlosbarragan commented Jan 21, 2020

jameskleeh commented Jan 21, 2020

jameskleeh commented Feb 11, 2020

carlosbarragan commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020 •

edited

graemerocher commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

carlosbarragan commented Feb 13, 2020

carlosbarragan commented Feb 13, 2020

graemerocher commented Feb 13, 2020

Memory leak when request body is anything but JSON #2665

Memory leak when request body is anything but JSON #2665

Comments

carlosbarragan commented Jan 20, 2020 • edited

Step to reproduce

Some information about our application:

Analysis

Workaraound

Expected Behaviour

Environment Information

Example Application

jameskleeh commented Jan 20, 2020

jameskleeh commented Jan 20, 2020

carlosbarragan commented Jan 21, 2020

jameskleeh commented Jan 21, 2020

jameskleeh commented Feb 11, 2020

carlosbarragan commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020 • edited

graemerocher commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

graemerocher commented Feb 12, 2020

carlosbarragan commented Feb 12, 2020

carlosbarragan commented Feb 13, 2020

carlosbarragan commented Feb 13, 2020

graemerocher commented Feb 13, 2020

carlosbarragan commented Jan 20, 2020 •

edited

graemerocher commented Feb 12, 2020 •

edited