Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] weird packet len issues on broker <-> metadatastore #22373

Open
2 of 3 tasks
KannarFr opened this issue Mar 27, 2024 · 1 comment
Open
2 of 3 tasks

[Bug] weird packet len issues on broker <-> metadatastore #22373

KannarFr opened this issue Mar 27, 2024 · 1 comment
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@KannarFr
Copy link
Contributor

KannarFr commented Mar 27, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

3.2.1

Minimal reproduce step

N/A

What did you expect to see?

N/A

What did you see instead?

Random Packet len exceptions for some days after months of running. A part of the logs https://gist.githubusercontent.com/KannarFr/8aec4e4100c422563aacb3b3b404cd8c/raw/c5844217d386f2267b402099829a893c10f41d50/gistfile1.txt.

I'm pretty sure it can't be namespace policies/topics metadata as we limited the number of topics per namespace to 3000. Can it be ledgers listing over packet len from ZK to broker?

Maybe it's related to loadBalancerReportUpdateThresholdPercentage=10 which we should reduce on "big" brokers that manage thousands of topics in different namespaces?

But it looks isn't the case

[zk: zookeeper-c6-n1:2181(CONNECTED) 12] ls -s /loadbalance/brokers/broker-n24:8080 
[]
cZxid = 0x280003ac27
ctime = Wed Mar 20 16:35:39 UTC 2024
mZxid = 0x2a0012b515
mtime = Wed Mar 27 23:02:37 UTC 2024
pZxid = 0x280003ac27
cversion = 0
dataVersion = 5252
aclVersion = 0
ephemeralOwner = 0x2001e92442303e0
dataLength = 851441
numChildren = 0

So 10 684 825 bytes packet len but 851 441 bytes in existing load report. Or am I missing something?

Anything else?

I'm putting brokers to debug log level to get more information.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@KannarFr KannarFr added the type/bug The PR fixed a bug or issue reported a bug label Mar 27, 2024
@lhotari
Copy link
Member

lhotari commented Mar 28, 2024

Thanks for reporting this.

As a temporary workaround, you could increase the jute.maxbuffer setting by adding a system property into the Pulsar JVM options. the default is -Djute.maxbuffer=10485760 for Pulsar, defined at

OPTS="$OPTS -Djute.maxbuffer=10485760 -Djava.net.preferIPv4Stack=true"

When changing jute.maxbuffer, it is recommended to be applied to all ZK servers and clients (running in brokers, bk, autorecovery).

The benefit of increasing is that possible ZNodes that hit the previous limit will now be accessible.

To find out what is causing the problem, it might require improving logging. Contributions are welcome!

I'm pretty sure it can't be namespace policies/topics metadata as we limited the number of topics per namespace to 3000. Can it be ledgers listing over packet len from ZK to broker?

I guess so, but I'd assume that you could fit a massive amount into the default jute.maxbuffer of 10MB.
However, it is likely that it is a ZNode listing. It shouldn't be possible to create a larger Znode than the jute.maxbuffer and at least I'm in the understanding that the main way to exceed it is to list Znodes.

Pulsar does use batching for Zookeeper operations, so perhaps batching could result in exceeding jute.maxbuffer? That is something to verify. The limits are low for batching, so it shouldn't happen:

pulsar/conf/broker.conf

Lines 810 to 820 in 2803ba2

# Whether we should enable metadata operations batching
metadataStoreBatchingEnabled=true
# Maximum delay to impose on batching grouping
metadataStoreBatchingMaxDelayMillis=5
# Maximum number of operations to include in a singular batch
metadataStoreBatchingMaxOperations=1000
# Maximum size of a batch
metadataStoreBatchingMaxSizeKb=128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants