[Bug] weird packet len issues on broker <-> metadatastore #22373

KannarFr · 2024-03-27T22:48:51Z

Search before asking

I searched in the issues and found nothing similar.

Read release policy

I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

3.2.1

Minimal reproduce step

N/A

What did you expect to see?

N/A

What did you see instead?

Random Packet len exceptions for some days after months of running. A part of the logs https://gist.githubusercontent.com/KannarFr/8aec4e4100c422563aacb3b3b404cd8c/raw/c5844217d386f2267b402099829a893c10f41d50/gistfile1.txt.

I'm pretty sure it can't be namespace policies/topics metadata as we limited the number of topics per namespace to 3000. Can it be ledgers listing over packet len from ZK to broker?

Maybe it's related to loadBalancerReportUpdateThresholdPercentage=10 which we should reduce on "big" brokers that manage thousands of topics in different namespaces?

But it looks isn't the case

[zk: zookeeper-c6-n1:2181(CONNECTED) 12] ls -s /loadbalance/brokers/broker-n24:8080 
[]
cZxid = 0x280003ac27
ctime = Wed Mar 20 16:35:39 UTC 2024
mZxid = 0x2a0012b515
mtime = Wed Mar 27 23:02:37 UTC 2024
pZxid = 0x280003ac27
cversion = 0
dataVersion = 5252
aclVersion = 0
ephemeralOwner = 0x2001e92442303e0
dataLength = 851441
numChildren = 0

So 10 684 825 bytes packet len but 851 441 bytes in existing load report. Or am I missing something?

Anything else?

I'm putting brokers to debug log level to get more information.

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

lhotari · 2024-03-28T06:49:29Z

Thanks for reporting this.

As a temporary workaround, you could increase the jute.maxbuffer setting by adding a system property into the Pulsar JVM options. the default is -Djute.maxbuffer=10485760 for Pulsar, defined at

pulsar/bin/pulsar

Line 253 in 6f9c8e7

OPTS="$OPTS -Djute.maxbuffer=10485760 -Djava.net.preferIPv4Stack=true"

When changing jute.maxbuffer, it is recommended to be applied to all ZK servers and clients (running in brokers, bk, autorecovery).

The benefit of increasing is that possible ZNodes that hit the previous limit will now be accessible.

To find out what is causing the problem, it might require improving logging. Contributions are welcome!

I'm pretty sure it can't be namespace policies/topics metadata as we limited the number of topics per namespace to 3000. Can it be ledgers listing over packet len from ZK to broker?

I guess so, but I'd assume that you could fit a massive amount into the default jute.maxbuffer of 10MB.
However, it is likely that it is a ZNode listing. It shouldn't be possible to create a larger Znode than the jute.maxbuffer and at least I'm in the understanding that the main way to exceed it is to list Znodes.

Pulsar does use batching for Zookeeper operations, so perhaps batching could result in exceeding jute.maxbuffer? That is something to verify. The limits are low for batching, so it shouldn't happen:

pulsar/conf/broker.conf

Lines 810 to 820 in 2803ba2

    
           # Whether we should enable metadata operations batching 
        
           metadataStoreBatchingEnabled=true 
        
           # Maximum delay to impose on batching grouping 
        
           metadataStoreBatchingMaxDelayMillis=5 
        
           # Maximum number of operations to include in a singular batch 
        
           metadataStoreBatchingMaxOperations=1000 
        
           # Maximum size of a batch 
        
           metadataStoreBatchingMaxSizeKb=128

KannarFr added the type/bug The PR fixed a bug or issue reported a bug label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] weird packet len issues on broker <-> metadatastore #22373

[Bug] weird packet len issues on broker <-> metadatastore #22373

KannarFr commented Mar 27, 2024 •

edited

lhotari commented Mar 28, 2024

[Bug] weird packet len issues on broker <-> metadatastore #22373

[Bug] weird packet len issues on broker <-> metadatastore #22373

Comments

KannarFr commented Mar 27, 2024 • edited

Search before asking

Read release policy

Version

Minimal reproduce step

What did you expect to see?

What did you see instead?

Anything else?

Are you willing to submit a PR?

lhotari commented Mar 28, 2024

KannarFr commented Mar 27, 2024 •

edited