Figure out why `LockFreePool` appears to cause unintended object retention (~= memory leak) #1260

cowtowncoder · 2024-04-15T01:46:17Z

Based on report problem like:

it looks as if the new recycler pool default (see #1117) -- Lock-free Pool -- results in (at least in some cases) unbounded increasing memory retention over time. From symptoms it would appear as if actual releases to pool happened more often than succesful reuse from the pool, as otherwise 2 operations should generally balance out over time (resulting in maximum pool size relative to maximum concurrent use).

We need to figure out what could cause such behavior, or if not, what is the root cause of observed problems.

In the meantime we also will need to do #1256 to allow use of 2.17(.1) without this issue.

cowtowncoder · 2024-04-15T01:49:13Z

/cc @mariofusco @pjfanning -- I hope to learn if there is some specific usage scenario and create multi-threaded reproduction. I am suspecting that somehow reuse from the pool is failing, but perhaps not for every call. jackson-benchmarks, for example, does not exhibit this problem: but it also does not by default use heavy concurrency and tests do not run long enough to necessarily accumulate humongous buffers.

Also: it is interesting that the one concern I did have wrt unbounded pools ended up happening, it seems.
So it feels like my insistence on ideally having maximum size to avoid this problem (... although, doing so might hide an issue to) seems warranted. For whatever that is worth.

mariofusco · 2024-04-15T07:42:30Z

HI @cowtowncoder I would like to investigate this memory issue, but at the moment I have no clue on how to reproduce it on my side. I also asked here and I'm waiting for an answer. I'm reviewing its implementation again, and in all honesty it seems quite straightforward, so at the moment I don't know what to do about it. About making the LockFree pool bounded, this is of course feasible, but I'm against for a few reasons:

In general I don't like blind fixes
In this specific case making the pool bounded won't help as to investigate the problem, but only to put it under the carpet
Making that pool bounded won't be free however, both adding some further complexity to it and introducing a performance cost
We already have a bounded pool implementation, with performance very similar to the LockFree, so if we want a bounded pool as default implementation, let's just use it

mariofusco · 2024-04-15T08:18:50Z

Thinking twice, now I see an admittedly weird situation where the LockFree pool can grow indefinitely. In that pool there is an heuristic to limit the number of retries under heavy contention. When this happens it gives up attempting to retrieve a resource from the pool and creates a new one instead. However this heuristic is implemented symmetrically on both acquire and release, so similarly on release it gives up trying to put back a resource into the pool after a few failed attempts. In essence the pool will grow if there is a huge asymmetry in its usage, when the acquire is always under heavy contention but the release isn't. I'm not sure how to deal with this situation to be honest, but still I would like to see it happens into the wild before deciding on any action.

cowtowncoder · 2024-04-16T03:12:26Z

@mariofusco Your thoughts align remarkably closely with my thinking :)
(not that this guarantees we are necessarily right but it is ok sanity check I think)

As to reproduction, what I was thinking was indeed running a multi-threaded scenario with N threads, reading and writing, against modified copy of pool which allows accessing size, printing it periodically and checking if size would be growing.
I guess one unfortunate part is that any "lost" acquire will add to size and although in some use cases the opposite could occur too (release does not happen on some code path, I assume that is possible too), there are probably cases where release works reliably so even small incremental unbalance will lead to increase retention over time.

As to adding bounded size: I 100% agree that we need to understand the problem first and not add something that would likely just hide the problem.
Once we know what is going on we can reconsider it.
(I have some ideas on how it could work but not actual plan; avoiding need to use single synchronized counter)

franz1981 · 2024-04-16T20:03:23Z

I am now reading past the issues and probably giving up on contention is not what we want.
Contention is an inherent problem of concurrent primitives like (compareAndSet-like), but allocating while still being able to release it back, can just causes unbounded growth, which eventually is going again to consume the cpu we tried hard to save (spending it in GC marking and eventually moving, given that when grow it doesn't shrink).
Right now the easier choice imo is to use a counter on acquire/release to keep on trying with spinning and Thread::onSpinWait or yield is there are too many attempts, but without giving up.

Anothe solution can be to spread the contention or uses better primitives which "cannot fail" eg getAndSet

But that means redesigning what it does; just serialising while too contended seems the easier choice to me.

cowtowncoder · 2024-04-16T20:54:25Z

I may be wrong, but to me it seems like ConcurrentDequePool might be the safer but similar implementation to use -- it uses Deque for storing pooled objects.
And compared to BoundedPool it has the benefit of lower overhead for cases where pool itself is not used (since there's no allocations with Deque, it being linked list).

…blem with LockFreePool

cowtowncoder · 2024-04-17T01:43:11Z

@mariofusco @franz1981: I created #1265 which sort of hints at possible problem, but isn't smoking gun necessarily. When running through 3 main implementations, LockFreePool occasionally gets a REALLY high pooled count -- I have seen 18,000 once (!) -- but seems to stabilize over time. Other implementations remain typically under thread count (going over isn't unexpected since test runs read and write for some loops, so getting to 2x is theoretically possible).

So no, this does not show monotonically increasing behavior, but suggests that spikes are possible.
It is of course possible (likely?) that contention behavior is heavily platform-dependant and maybe running on Eclipse on Mac mini just doesn't exhibit extreme contention.
I'll see if increasing thread count from 100 up (or, going down to, say, 20) can jog other behavior.

But if anyone else has time to run test, and/or read code and suggest changes, that'd be useful.

kkolyan · 2024-04-17T07:40:31Z

Yet another report with "memory leak" reproduction: https://github.com/kkolyan/jackson_leak.
It uses ObjectMapper from 1000 threads and during an hour occupy heap of 8G (with initial occupation by test data of 600M), leading to permanent freeze.

cowtowncoder · 2024-04-17T16:47:15Z

@kkolyan Thank you very much for creating this! I hope to try it out locally, and see if the other 2 implementations (and in particular, Deque backed ones) fared better.

EDIT: looks like it's not trivial to change impl to measure other impls due to RecyclerPoolSizeCalculator (I used 2.18.0-SNAPSHOT where RecyclerPool.pooledCount() was added).

…blem with pool retention (#1265)

cowtowncoder · 2024-04-18T00:56:56Z

@kkolyan Trying to run the reproduction, but since I am not a gradle user, not sure what invocation is expected. Guessing from build.gradle tries

./gradlew test
./gradlew application

neither of which works (build succeeds with ./gradlew build).
It'd be good to add invocation on README.md of the repo?

Also had to add mavenLocal() repository to use jackson-core locally built 2.17.1-SNAPSHOT (or could have added Sonatype OSS repository that has those built too).

cowtowncoder · 2024-04-18T01:14:43Z

I'm able to run the test on IDE just fine. Not yet seeing much of a trend, assuming I read output correctly.

But one related thing is this: I backported addition of

    default int pooledCount() {
        return -1;
    }

in RecyclerPool for 2.17 branch, so locally built (or one from Sonatype OSS repo) jackson-core has that available. This is useful to:

Remove need for RecyclerPoolSizeCalculator since RecyclerPool.pooledCount() can be used
Allow use of all RecyclerPool implementations to see that they don't (... or do?) exhibit similar leakage

cowtowncoder · 2024-04-20T01:35:12Z

Ok: created and resolved work (#1256 for 2.17.1 revert, #1266 for 2.18 update to concurrentDequePool).

One last thing: should we try to fix "lockFreePool", or deprecate? (and remove from 3.0).

Create a placeholder ticket: #1271.

cowtowncoder · 2024-05-05T00:37:47Z

Ok, I think we have a reasonable idea as to how the retention may happen: due to imbalanced failures between acquiring reusable buffer (fails more often) vs releasing buffer to the pool (fails less often).
Given this, closing the issue; will proceed with #1271.

mariofusco mentioned this issue Apr 15, 2024

Jackson LockFreePool recycler pool causes memory issues trinodb/trino#21356

Closed

cowtowncoder added a commit that referenced this issue Apr 17, 2024

Part of #1260: write a manually run concurrency test to tease out pro…

266b941

…blem with LockFreePool

cowtowncoder added a commit that referenced this issue Apr 18, 2024

Part of #1260: write a manually run concurrency test to tease out pro…

3010ad2

…blem with pool retention (#1265)

cowtowncoder added a commit that referenced this issue Apr 18, 2024

Part of #1260: write a manually run concurrency test to tease out pro…

4b8d399

…blem with pool retention (#1265)

cowtowncoder mentioned this issue Apr 20, 2024

Revert #1117: change default recycler pool back to threadLocalPool() for 2.17.1 #1256

Closed

JooHyukKim mentioned this issue Apr 20, 2024

Deprecate LockFreePool implementation from 2.18, remove from 3.0? #1271

Open

zerda mentioned this issue Apr 23, 2024

Consider delaying Jackson 2.17.0 upgrade in Spring Boot 3.3.0 spring-projects/spring-boot#40489

Closed

cowtowncoder mentioned this issue Apr 26, 2024

RecyclerPool Memory leak in 2.17.0 FasterXML/jackson-databind#4500

Open

1 task

cowtowncoder closed this as completed May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out why `LockFreePool` appears to cause unintended object retention (~= memory leak) #1260

Figure out why `LockFreePool` appears to cause unintended object retention (~= memory leak) #1260

cowtowncoder commented Apr 15, 2024 •

edited

cowtowncoder commented Apr 15, 2024

mariofusco commented Apr 15, 2024

mariofusco commented Apr 15, 2024

cowtowncoder commented Apr 16, 2024

franz1981 commented Apr 16, 2024 •

edited

cowtowncoder commented Apr 16, 2024

cowtowncoder commented Apr 17, 2024

kkolyan commented Apr 17, 2024 •

edited

cowtowncoder commented Apr 17, 2024 •

edited

cowtowncoder commented Apr 18, 2024

cowtowncoder commented Apr 18, 2024

cowtowncoder commented Apr 20, 2024 •

edited

cowtowncoder commented May 5, 2024

Figure out why LockFreePool appears to cause unintended object retention (~= memory leak) #1260

Figure out why LockFreePool appears to cause unintended object retention (~= memory leak) #1260

Comments

cowtowncoder commented Apr 15, 2024 • edited

cowtowncoder commented Apr 15, 2024

mariofusco commented Apr 15, 2024

mariofusco commented Apr 15, 2024

cowtowncoder commented Apr 16, 2024

franz1981 commented Apr 16, 2024 • edited

cowtowncoder commented Apr 16, 2024

cowtowncoder commented Apr 17, 2024

kkolyan commented Apr 17, 2024 • edited

cowtowncoder commented Apr 17, 2024 • edited

cowtowncoder commented Apr 18, 2024

cowtowncoder commented Apr 18, 2024

cowtowncoder commented Apr 20, 2024 • edited

cowtowncoder commented May 5, 2024

Figure out why `LockFreePool` appears to cause unintended object retention (~= memory leak) #1260

Figure out why `LockFreePool` appears to cause unintended object retention (~= memory leak) #1260

cowtowncoder commented Apr 15, 2024 •

edited

franz1981 commented Apr 16, 2024 •

edited

kkolyan commented Apr 17, 2024 •

edited

cowtowncoder commented Apr 17, 2024 •

edited

cowtowncoder commented Apr 20, 2024 •

edited