Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on Micronaut HTTP server #10677

Open
loicmathieu opened this issue Apr 3, 2024 · 24 comments
Open

Memory leak on Micronaut HTTP server #10677

loicmathieu opened this issue Apr 3, 2024 · 24 comments
Labels

Comments

@loicmathieu
Copy link

loicmathieu commented Apr 3, 2024

Expected Behavior

No memory leak.

Actual Behaviour

Heap histograms show a potential memory leak.
The following part of the heap histograms are relevant:

   1:      57108507     1370604168  io.micronaut.core.execution.DelayedExecutionFlowImpl$Map
   2:      57108505     1370604120  io.micronaut.core.execution.DelayedExecutionFlowImpl$OnErrorResume
   3:      38072339      913736136  io.micronaut.core.execution.DelayedExecutionFlowImpl$FlatMap
   4:      19036169      456868056  io.micronaut.core.execution.DelayedExecutionFlowImpl$OnComplete

There are more than 50 millions of io.micronaut.core.execution.DelayedExecutionFlowImpl$Map in memory! And this heap histogram is on an application with very few request (so there cannot be 50 millions of file currently uploading).

I think it may be related to this endpoint that bind one part with @Part Publisher<StreamingFileUpload> files then use a raw HttpRequest<?> inputs as we have parts both as files and String attributes.

@ExecuteOn(TaskExecutors.IO)
    @Post(uri = "/{namespace}/{id}", consumes = MediaType.MULTIPART_FORM_DATA)
    public Execution create(
        @Parameter(description = "The inputs") HttpRequest<?> inputs, 
        @Parameter(description = "The inputs of type file") @Nullable @Part Publisher<StreamingFileUpload> files
    ) throws IOException {
        Map<String, Object> inputMap = (Map<String, Object>) inputs.getBody(Map.class).orElse(null);
        // do something with the files ..
    }

The memory leak is new in Micronaut 4, in Micronaut 3 we bind multiple times the body, once in a part as today, and once in an @Body Map<String, Object> inputMap which is no more possible in Micronaut 4.

Reference GitHUb discussion: https://github.com/micronaut-projects/micronaut-core/discussions/10662GitHub

Steps To Reproduce

No response

Environment Information

Example Application

No response

Version

4.3.4

@loicmathieu loicmathieu changed the title Memory leak on multipart file upload Memory leak on Micronaut HTTP server Apr 4, 2024
@loicmathieu
Copy link
Author

loicmathieu commented Apr 4, 2024

I don't know if it is of any help but I notice on an heap dump that it appears that in the DelayedExecutionFlowImpl there is a head attribute which contains a next attribute which contains a next attribute... recursively without apparent ends, looks like all the DelayedExecutionFlowImpl are next of a parent one ...

@loicmathieu
Copy link
Author

cc @yawkat

@loicmathieu
Copy link
Author

loicmathieu commented Apr 4, 2024

More information to help diagnose the issue.
A single StreamingByetBody is handling 6 millions DelayedExecutionFloImplt$OnErrorResume objects into a RequestLifecycle lambda retaining 1.6GB.
image

@sdelamo sdelamo added the type: bug Something isn't working label Apr 8, 2024
@tchiotludo
Copy link
Contributor

Just raw information, our whole application is broken due to this memory leak and customers and users are complaining, we try multiple workaround with no success 😭
We also try to make a PR, but definitely http server part are really complex for new comers.
If you have any workaround advice, it will be awesome

@yawkat
Copy link
Member

yawkat commented Apr 9, 2024

@tchiotludo please give us some way to reproduce this. The form/multipart code is very complex and I don't see a starting point for debugging here

@loicmathieu
Copy link
Author

@yawkat it's very problematic as I didn't succeed in reproducing the problem.

That's why I added as much information as I could; users seem to not using form/multipart that much, and the memory leak points to RequestLifecycle so I'm not sure it is linked to form/multipart at all.

I can ask if I can share the dump if you want, but as a memory dump can contain sensitive data, I need to check first with the user and share it privately.

I can ask our users to provide more information but creating a reproducer seems to be very complex.

@yawkat
Copy link
Member

yawkat commented Apr 9, 2024

you could try setting micronaut.server.netty.server-type: full_content

@loicmathieu
Copy link
Author

Thanks @yawkat we will test it, meanwhile I'll try my best to make a reproducer

@katoquro
Copy link

katoquro commented Apr 9, 2024

Hello.
We don't use multipart data at all. Recently I've deployed a new service that answers only health checks, promethus metrics, and rare POSTs with data to store it mongo. It is a very simple micro so I gave 0.5 Gb of RAM to it and I see 1 per day or 2 days OOM there
We use MN 4.2.0, netty, NO GraalVM, and Project Reactor everywhere
I'll try to investigate a bit deeper later

@loicmathieu
Copy link
Author

@yawkat we cannot use micronaut.server.netty.server-type: full_content it crash for all requests with:

2024-04-09 11:33:36,466 WARN  default-nioEventLoopGroup-1-3 io.netty.channel.ChannelInitializer Failed to initialize a channel. Closing: [id: 0x646fd7cb, L:/[0:0:0:0:0:0:0:1]:8080 - R:/[0:0:0:0:0:0:0:1]:48850]
java.lang.IllegalArgumentException: maxContentLength : -2147483648 (expected: >= 0)
	at io.netty.util.internal.ObjectUtil.checkPositiveOrZero(ObjectUtil.java:144)
	at io.netty.handler.codec.MessageAggregator.validateMaxContentLength(MessageAggregator.java:88)
	at io.netty.handler.codec.MessageAggregator.<init>(MessageAggregator.java:77)
	at io.netty.handler.codec.http.HttpObjectAggregator.<init>(HttpObjectAggregator.java:128)
	at io.micronaut.http.server.netty.HttpPipelineBuilder$StreamPipeline.insertMicronautHandlers(HttpPipelineBuilder.java:608)
	at io.micronaut.http.server.netty.HttpPipelineBuilder$StreamPipeline.insertHttp1DownstreamHandlers(HttpPipelineBuilder.java:638)
	at io.micronaut.http.server.netty.HttpPipelineBuilder$ConnectionPipeline.configureForHttp1(HttpPipelineBuilder.java:380)
	at io.micronaut.http.server.netty.HttpPipelineBuilder$ConnectionPipeline.initChannel(HttpPipelineBuilder.java:299)
	at io.micronaut.http.server.netty.NettyHttpServer$Listener.initChannel(NettyHttpServer.java:892)
	at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
	at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
	at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:1130)
	at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
	at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
	at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
	at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
	at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:514)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:840)

@loicmathieu
Copy link
Author

@katoquro to check if it's the same issue you can try the following command to see if the same objects are accumulating:

jmap -histo:live <pid> | grep io.micronaut.core.execution

@yawkat
Copy link
Member

yawkat commented Apr 9, 2024

@loicmathieu only works if you lower your max-request-size to something that fits in memory.

@loicmathieu
Copy link
Author

@yawkat on user confirm that using the following configuration fixes the issue (or works around it):

configuration:
  micronaut:
    server:
      max-request-size: 1GB
      netty:
        server-type: full_content

@loicmathieu
Copy link
Author

@yawkat with this configuration, file of more than 1GB lead to a request that seems to be "blocked forever" without an exception. So it's a workaround for some of our users but not a long term solution.

Do you still need a reproducer (I'm working on it but still didn't make it reproduce the issue)?

@yawkat
Copy link
Member

yawkat commented Apr 9, 2024

yes i still need a reproducer, either from you or from @katoquro .

full_content buffers the full request and bypasses most places that use DelayedExecutionFlow. but it's not recommended for permanent use.

@katoquro
Copy link

katoquro commented Apr 9, 2024

@loicmathieu
from the first glance, not my case. The micro is run for 5 hours

4245:             1             24  io.micronaut.core.execution.ImperativeExecutionFlowImpl

I will look for a leak in another place 🤔
image

@loicmathieu
Copy link
Author

@katoquro remove the grep and look at the most present objects in the histogram: jmap -histo:live <pid> you took multiple one and check which objects grow in number this could be an easy way to find a leak.

And if it's a different leak, better to open a new issue ;)

@dstepanov
Copy link
Contributor

Can you analyze the memory dump and see what is being leaked? You can try https://eclipse.dev/mat/

@katoquro
Copy link

katoquro commented Apr 9, 2024

@dstepanov @loicmathieu
I think my case is really different. I have next graph where green line is metrics about total committed heap provided by micronenter (sum of jvm_memory_committed_bytes) and yellow is the consumed memory by the java process taken from /proc/<pid>/stat
image

it's something out of heap, non-heap, etc... 🤔

@graemerocher
Copy link
Contributor

@loicmathieu any luck on a reproducer?

@loicmathieu
Copy link
Author

@graemerocher unfortunately, no, that's why I added as much information as I could have.

@graemerocher
Copy link
Contributor

@loicmathieu is there a way to run a Kestra locally to reproduce?

@loicmathieu
Copy link
Author

@graemerocher yes, you can either run it from its repository or its docker image.

But what very annoy me is that I cannot reproduce it myself, some users report the issue, I tried to setup Kestra locally with the same configuration and use it with the same scenario but didn't succeed in triggering the issue.

I'll try to take some time this week to try to reproduce the issues I opened lately.

@graemerocher
Copy link
Contributor

ok thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

7 participants