Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PulsarClientException$BrokerMetadataException error reoccurs in release 2.8.1 #12455

Open
Raven888888 opened this issue Oct 21, 2021 · 11 comments
Labels
lifecycle/stale type/bug The PR fixed a bug or issue reported a bug

Comments

@Raven888888
Copy link

Describe the bug
Exactly same as #11774. Unable to start broker.
Also similar to #10967.

Error log

17:01:09.826 [main] ERROR org.apache.pulsar.PulsarBrokerStarter - Failed to start pulsar service.
org.apache.pulsar.broker.PulsarServerException: java.lang.RuntimeException: java.lang.RuntimeException: Failed to initialize Metadata Manager
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:787) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:259) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:331) [org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Failed to initialize Metadata Manager
        at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:571) ~[org.apache.pulsar-pulsar-functions-worker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1569) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:759) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
        ... 2 more
Caused by: java.lang.RuntimeException: Failed to initialize Metadata Manager
        at org.apache.pulsar.functions.worker.FunctionMetaDataManager.initialize(FunctionMetaDataManager.java:115) ~[org.apache.pulsar-pulsar-functions-worker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:496) ~[org.apache.pulsar-pulsar-functions-worker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1569) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:759) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
        ... 2 more
Caused by: org.apache.pulsar.client.api.PulsarClientException$BrokerMetadataException: The subscription c-pulsar-cluster-1-fw-192.168.3.3-8080-function-metadata-tailer-reader-7223e8a3e9 of the topic persistent://public/functions/metadata gets the last message id was failed
Failed to get batch size for entry java.lang.IllegalArgumentException: Invalid unknonwn tag type: 6
        at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:987) ~[org.apache.pulsar-pulsar-client-api-2.8.1.jar:2.8.1]
        at org.apache.pulsar.client.impl.ConsumerImpl.hasMessageAvailable(ConsumerImpl.java:1869) ~[org.apache.pulsar-pulsar-client-original-2.8.1.jar:2.8.1]
        at org.apache.pulsar.client.impl.ReaderImpl.hasMessageAvailable(ReaderImpl.java:168) ~[org.apache.pulsar-pulsar-client-original-2.8.1.jar:2.8.1]
        at org.apache.pulsar.functions.worker.FunctionMetaDataManager.initialize(FunctionMetaDataManager.java:109) ~[org.apache.pulsar-pulsar-functions-worker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:496) ~[org.apache.pulsar-pulsar-functions-worker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1569) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:759) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
        ... 2 more

Expected behavior
Able to restart and run broker without any error.
Claimed to be fixed by #10968, but I still face the same issue.

Additional context
Pulsar version: 2.8.1
Pulsar mode: cluster, 3 nodes
KoP: enabled (#10950)

@BewareMyPower
Copy link
Contributor

Could you provide your KoP version? And can you reproduce it?

@Raven888888
Copy link
Author

KoP: 2.8.1.4
I faced this issue after upgrading from pulsar 2.7.0 to 2.8.1. Will try to reproduce when I have the capacity.

@BewareMyPower
Copy link
Contributor

upgrading from pulsar 2.7.0 to 2.8.1

Is it the similar issue like streamnative/kop#765?

@Raven888888
Copy link
Author

Nope @BewareMyPower . That issue only happens in pulsar + kop 2.7.0, and it is due to how kop handles offset calculation as you mentioned. Once upgraded to 2.8.1, I have not seen the offset gap thus far.

This current issue somehow causes broker unable to start at all. Consequently, kop is also not functional.

@BewareMyPower
Copy link
Contributor

BewareMyPower commented Nov 2, 2021

Yeah, I also noticed it.

Caused by: org.apache.pulsar.client.api.PulsarClientException$BrokerMetadataException: The subscription c-pulsar-cluster-1-fw-192.168.3.3-8080-function-metadata-tailer-reader-7223e8a3e9 of the topic persistent://public/functions/metadata gets the last message id was failed
Failed to get batch size for entry java.lang.IllegalArgumentException: Invalid unknonwn tag type: 6
        at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:987) ~[org.apache.pulsar-pulsar-client-api-2.8.1.jar:2.8.1]
        at org.apache.pulsar.client.impl.ConsumerImpl.hasMessageAvailable(ConsumerImpl.java:1869) ~[org.apache.pulsar-pulsar-client-original-2.8.1.jar:2.8.1]
        at org.apache.pulsar.client.impl.ReaderImpl.hasMessageAvailable(ReaderImpl.java:168) ~[org.apache.pulsar-pulsar-client-original-2.8.1.jar:2.8.1]
        at org.apache.pulsar.functions.worker.FunctionMetaDataManager.initialize(FunctionMetaDataManager.java:109) ~[org.apache.pulsar-pulsar-functions-worker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.functions.worker.PulsarWorkerService.start(PulsarWorkerService.java:496) ~[org.apache.pulsar-pulsar-functions-worker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1569) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:759) ~[org.apache.pulsar-pulsar-broker-2.8.1.jar:2.8.1]

It looks like there's something wrong with topic persistent://public/functions/metadata. If you enabled Pulsar Functions before enabling KoP, this topic might not contain the BrokerEntryMetadata part. After enabling KoP, the reader of this topic might fail by ConsumerImpl.hasMessageAvailable.

I'm not familiar with Pulsar Functions, it might take some time to look into the reason.

@Raven888888
Copy link
Author

@BewareMyPower yes you are right. Pulsar function was enabled before enabling KoP.
However, just a side note, back in pulsar 2.7.0, such setup wasn't an issue.

Correct me if I am wrong, pulsar functions are stateless, so I could just drop persistent://public/functions/metadata and re-install the pulsar function again?

@BewareMyPower
Copy link
Contributor

It looks like I only replied via email so it didn't appear in this issue.

IMO, it's correct and you can delete this topic manually. Have you tried it?

@Raven888888
Copy link
Author

No I haven't @BewareMyPower , I will try it when I have the capacity, thanks.

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@rkaw92
Copy link

rkaw92 commented Jul 28, 2022

Hey, for what it's worth, I'm seeing this in Pulsar 2.10.1 with KoP:

2022-07-28T12:54:19,013+0000 [BookKeeperClientWorker-OrderedExecutor-7-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [adtonos-platform-prod/platform/persistent/AvailablePlayoutRecognized-partition-12] Opened new cursor: NonDurableCursorImpl{ledger=adtonos-platform-prod/platform/persistent/AvailablePlayoutRecognized-partition-12, ackPos=42:-1, readPos=42:0}
2022-07-28T12:54:19,016+0000 [BookKeeperClientWorker-OrderedExecutor-9-0] ERROR io.streamnative.pulsar.handlers.kop.MessageFetchContext - [persistent://adtonos-platform-prod/platform/AvailablePlayoutRecognized-14] Failed to peekOffsetFromEntry from position 35:4: No BrokerEntryMetadata found
2022-07-28T12:54:19,016+0000 [BookKeeperClientWorker-OrderedExecutor-7-0] ERROR io.streamnative.pulsar.handlers.kop.MessageFetchContext - [persistent://adtonos-platform-prod/platform/AvailablePlayoutRecognized-12] Failed to peekOffsetFromEntry from position 42:4: No BrokerEntryMetadata found
2022-07-28T12:54:19,016+0000 [BookKeeperClientWorker-OrderedExecutor-9-0] ERROR io.streamnative.pulsar.handlers.kop.MessageFetchContext - Read entry error on (offset=774095, logStartOffset=-1, maxBytes=1048576)
io.streamnative.pulsar.handlers.kop.exceptions.MetadataCorruptedException$NoBrokerEntryMetadata: No BrokerEntryMetadata found
        at io.streamnative.pulsar.handlers.kop.utils.MessageMetadataUtils.peekOffset(MessageMetadataUtils.java:127) ~[?:?]
        at io.streamnative.pulsar.handlers.kop.utils.MessageMetadataUtils.peekOffsetFromEntry(MessageMetadataUtils.java:118) ~[?:?]
        at io.streamnative.pulsar.handlers.kop.MessageFetchContext$2.readEntriesComplete(MessageFetchContext.java:555) ~[?:?]
        at org.apache.bookkeeper.mledger.impl.OpReadEntry.lambda$checkReadCompletion$2(OpReadEntry.java:153) ~[org.apache.pulsar-managed-ledger-2.10.1.jar:2.10.1]
        at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.10.1.jar:2.10.1]
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.5.jar:4.14.5]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
        at java.lang.Thread.run(Thread.java:829) [?:?]
2022-07-28T12:54:19,016+0000 [BookKeeperClientWorker-OrderedExecutor-7-0] ERROR io.streamnative.pulsar.handlers.kop.MessageFetchContext - Read entry error on (offset=897452, logStartOffset=-1, maxBytes=1048576)
io.streamnative.pulsar.handlers.kop.exceptions.MetadataCorruptedException$NoBrokerEntryMetadata: No BrokerEntryMetadata found
        at io.streamnative.pulsar.handlers.kop.utils.MessageMetadataUtils.peekOffset(MessageMetadataUtils.java:127) ~[?:?]
        at io.streamnative.pulsar.handlers.kop.utils.MessageMetadataUtils.peekOffsetFromEntry(MessageMetadataUtils.java:118) ~[?:?]
        at io.streamnative.pulsar.handlers.kop.MessageFetchContext$2.readEntriesComplete(MessageFetchContext.java:555) ~[?:?]
        at org.apache.bookkeeper.mledger.impl.OpReadEntry.lambda$checkReadCompletion$2(OpReadEntry.java:153) ~[org.apache.pulsar-managed-ledger-2.10.1.jar:2.10.1]
        at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.10.1.jar:2.10.1]
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.5.jar:4.14.5]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.77.Final.jar:4.1.77.Final]
        at java.lang.Thread.run(Thread.java:829) [?:?]
2022-07-28T12:54:19,019+0000 [BookKeeperClientWorker-OrderedExecutor-10-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [adtonos-platform-prod/platform/persistent/AvailablePlayoutRecognized-partition-0] Unable to find position for predicate FindEntryByOffset{ 778995}. Use the first position 27:0 instead.

But I'm able to start the broker - this is at run-time.

It kind of works, but I'm getting a constant spam of this message in the logs.

@fishisnow
Copy link

the same problem i meet, the broker cluster is unavailable because of function worker. the error is "The subscription c-pulsar-fw-pulsar-broker-0.pulsar-broker.pulsar.svc.cluster.local.-8080-function-metadata-tailer-reader-28f8e65105 of the topic persistent://public/functions/metadata gets the last message id was failed"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

4 participants