You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.
Recently encountered a memory leak in Pulsar Broker. I've tried to identify the issue with logs like below.
I've enabled io.netty.leakDetectionLevel up to paranoid and io.netty.leakDetection.targetRecords=40, didn't find any leak messages appear.
We have 4 pulsar clusters:
version 2.7.3
all running in Kubernetes 1.20. Only one cluster have encountered such issue.
The only difference of the cluster in question is that I've enabled Geo-Replication on all namespaces. So I disable it, but the memory leak remains.
14:44:01.497 [pulsar-io-23-2] ERROR org.apache.pulsar.PulsarBrokerStarter - -- Shutting down - Received OOM exception: failed to allocate 16777216 byte(s) of direct memory (used: 1073741824, max: 1073741824)
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 1073741824, max: 1073741824)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:776) ~[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:731) ~[io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645) ~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621) ~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204) ~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:188) ~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) ~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:128) ~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:378) ~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) ~[io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl.newDirectBuffer(ByteBufAllocatorImpl.java:164) [org.apache.bookkeeper-bookkeeper-common-allocator-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl.newDirectBuffer(ByteBufAllocatorImpl.java:158) [org.apache.bookkeeper-bookkeeper-common-allocator-4.12.0.jar:4.12.0]
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) [io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) [io.netty-netty-buffer-4.1.60.Final.jar:4.1.60.Final]
at io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53) [io.netty-netty-transport-native-unix-common-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) [io.netty-netty-transport-4.1.60.Final.jar:4.1.60.Final]
at io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75) [io.netty-netty-transport-native-epoll-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:780) [io.netty-netty-transport-native-epoll-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:425) [io.netty-netty-transport-native-epoll-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [io.netty-netty-transport-native-epoll-4.1.60.Final-linux-x86_64.jar:4.1.60.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_302]
Screenshots
It is only a testing cluster with not much memory configured. But I believe the screenshots can show the leak.
Additional context
I might take a wild guess that the memory leak may come from Bookie side data loss.
I've found some error logs in it:
11:51:51.568 [BookieReadThreadPool-OrderedExecutor-3-0] INFO org.apache.bookkeeper.proto.ReadEntryProcessorV3 - No ledger found while reading entry: 0 from ledger: 242603
11:51:52.828 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x5d119911, L:/10.119.139.211:3181 ! R:/10.119.131.152:34376]
11:52:10.560 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.AuthHandler - Authentication success on server side
11:52:10.560 [bookie-io-1-2] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channel connected [id: 0xb66b4346, L:/10.119.139.211:3181 - R:/10.119.164.211:54744]
11:52:11.149 [BookieReadThreadPool-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.proto.ReadLacProcessorV3 - No ledger found while performing readLac from ledger: 242603
org.apache.bookkeeper.bookie.Bookie$NoLedgerException: Ledger 242603 not found
at org.apache.bookkeeper.bookie.LedgerDescriptor.createReadOnly(LedgerDescriptor.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.bookie.HandleFactoryImpl.getReadOnlyHandle(HandleFactoryImpl.java:61) ~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.bookie.Bookie.getExplicitLac(Bookie.java:1364) ~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.proto.ReadLacProcessorV3.getReadLacResponse(ReadLacProcessorV3.java:71) [org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.proto.ReadLacProcessorV3.safeRun(ReadLacProcessorV3.java:118) [org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.12.0.jar:4.12.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_302]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_302]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_302]
11:52:11.150 [BookieReadThreadPool-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.proto.ReadLacProcessorV3 - No ledger found while trying to read last entry: 242603
org.apache.bookkeeper.bookie.Bookie$NoLedgerException: Ledger 242603 not found
at org.apache.bookkeeper.bookie.LedgerDescriptor.createReadOnly(LedgerDescriptor.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.bookie.HandleFactoryImpl.getReadOnlyHandle(HandleFactoryImpl.java:61) ~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1441) ~[org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.proto.ReadLacProcessorV3.getReadLacResponse(ReadLacProcessorV3.java:86) [org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.proto.ReadLacProcessorV3.safeRun(ReadLacProcessorV3.java:118) [org.apache.bookkeeper-bookkeeper-server-4.12.0.jar:4.12.0]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.12.0.jar:4.12.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_302]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_302]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_302]
11:52:11.163 [BookieReadThreadPool-OrderedExecutor-3-0] INFO org.apache.bookkeeper.proto.ReadEntryProcessorV3 - No ledger found while reading entry: 0 from ledger: 242603
This is because we've encountered some Kubernetes Node issue long times ago and some of the ledgers are unintentionally deleted.
And I can confirm that all the bookies are working just fine without any memory leak issue (except the above exceptions)
Tell me if there's any clue or anything I can do to diagnostic the memory leak.
Thanks.
The text was updated successfully, but these errors were encountered:
Original Issue: apache#12716
Hi,
Describe the bug
Recently encountered a memory leak in Pulsar Broker. I've tried to identify the issue with logs like below.
I've enabled
io.netty.leakDetectionLevel
up toparanoid
andio.netty.leakDetection.targetRecords=40
, didn't find any leak messages appear.We have 4 pulsar clusters:
The only difference of the cluster in question is that I've enabled Geo-Replication on all namespaces. So I disable it, but the memory leak remains.
Screenshots
It is only a testing cluster with not much memory configured. But I believe the screenshots can show the leak.
Additional context
I might take a wild guess that the memory leak may come from Bookie side data loss.
I've found some error logs in it:
This is because we've encountered some Kubernetes Node issue long times ago and some of the ledgers are unintentionally deleted.
And I can confirm that all the bookies are working just fine without any memory leak issue (except the above exceptions)
Tell me if there's any clue or anything I can do to diagnostic the memory leak.
Thanks.
The text was updated successfully, but these errors were encountered: