Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker][WIP] Don't check replication clusters for ownership of system topics #22597

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Apr 26, 2024

WIP, not ready for review. PR opened in draft mode for initial discussion and feedback.

Motivation

When investigating another issue and reproducing the problem using these instructions, #21948 (comment), I noticed this WARN log message:

2024-04-26T09:34:01,535 - WARN  - [pulsar-io-35-3:ServerCnx] - Failed to get Partitioned Metadata [/127.0.0.1:56820] persistent://pulsar/global/removeClusterTest/__change_events: Namespace missing local cluster name in clusters list: local_cluster=r1 ns=pulsar/global/removeClusterTest clusters=[r2, r3]
org.apache.pulsar.broker.web.RestException: Namespace missing local cluster name in clusters list: local_cluster=r1 ns=pulsar/global/removeClusterTest clusters=[r2, r3]
        at org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$27(PulsarWebResource.java:922) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:757) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:735) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2182) ~[?:?]
        at org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$29(PulsarWebResource.java:912) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:757) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:735) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2182) ~[?:?]
        at org.apache.pulsar.broker.web.PulsarWebResource.checkLocalOrGetPeerReplicationCluster(PulsarWebResource.java:896) ~[classes/:?]
        at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.unsafeGetPartitionedTopicMetadataAsync(PersistentTopicsBase.java:4304) ~[classes/:?]
        at org.apache.pulsar.broker.service.ServerCnx.lambda$handlePartitionMetadataRequest$7(ServerCnx.java:610) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture.uniApplyNow(CompletableFuture.java:684) [?:?]
        at java.base/java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:662) [?:?]
        at java.base/java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:2168) [?:?]
        at org.apache.pulsar.broker.service.ServerCnx.handlePartitionMetadataRequest(ServerCnx.java:607) [classes/:?]
        at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:134) [pulsar-common-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]

Modifications

  • Don't limit system topics with the checkLocalOrGetPeerReplicationCluster checks.

Additional Context

This is related to #20304 changes since that made some relaxation on the checks.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari marked this pull request as draft April 26, 2024 06:39
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 26, 2024
@lhotari lhotari force-pushed the lh-dont-check-replication-clusters-for-system-topics branch from 1b91da8 to 648b5f0 Compare April 26, 2024 06:41
@lhotari
Copy link
Member Author

lhotari commented Apr 26, 2024

@poorbarcode @heesung-sn Do you think that relaxing the checkLocalOrGetPeerReplicationCluster checks for system topics would be useful?

@heesung-sn
Copy link
Contributor

I realized that Pulsar has some loopholes that unnecessarily try to replicate, migrate or do other operations on system topics.

We need to check all system topics and define their behaviors for each feature.

System namespace and system topics

  • load report
  • blue-green migration
  • geo-replication
  • cross-cluster ownership check
  • and others

I am checking the system topics from the new load balancer.

@heesung-sn
Copy link
Contributor

To answer your questions, Im not sure. I assume all system topics should live in local cluster only,but I could be wrong.

@lhotari
Copy link
Member Author

lhotari commented Apr 29, 2024

Same exception with current master branch version 93afd89

2024-04-29T08:47:36,525 - WARN  - [pulsar-io-35-4:ServerCnx] - Failed to get Partitioned Metadata [/127.0.0.1:61388] persistent://pulsar/global/removeClusterTest/__change_events: Namespace missing local cluster name in clusters list: local_cluster=r1 ns=pulsar/global/removeClusterTest clusters=[r2, r3]
org.apache.pulsar.broker.web.RestException: Namespace missing local cluster name in clusters list: local_cluster=r1 ns=pulsar/global/removeClusterTest clusters=[r2, r3]
        at org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$27(PulsarWebResource.java:922) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:757) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:735) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2182) ~[?:?]
        at org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$29(PulsarWebResource.java:912) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:757) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:735) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2182) ~[?:?]
        at org.apache.pulsar.broker.web.PulsarWebResource.checkLocalOrGetPeerReplicationCluster(PulsarWebResource.java:896) ~[classes/:?]
        at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.unsafeGetPartitionedTopicMetadataAsync(PersistentTopicsBase.java:4307) ~[classes/:?]
        at org.apache.pulsar.broker.service.ServerCnx.lambda$handlePartitionMetadataRequest$7(ServerCnx.java:610) ~[classes/:?]
        at java.base/java.util.concurrent.CompletableFuture.uniApplyNow(CompletableFuture.java:684) [?:?]
        at java.base/java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:662) [?:?]
        at java.base/java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:2168) [?:?]
        at org.apache.pulsar.broker.service.ServerCnx.handlePartitionMetadataRequest(ServerCnx.java:607) [classes/:?]
        at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:134) [pulsar-common-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.108.Final.jar:4.1.108.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.108.Final.jar:4.1.108.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4
.1.108.Final.jar:4.1.108.Final]

@lhotari lhotari force-pushed the lh-dont-check-replication-clusters-for-system-topics branch from 648b5f0 to 23d8b53 Compare April 29, 2024 06:28
@lhotari
Copy link
Member Author

lhotari commented Apr 29, 2024

It seems that __change_events has special handling in replication:

public CompletableFuture<Void> checkReplication() {
if (SystemTopicNames.isTopicPoliciesSystemTopic(topic)) {
return super.checkReplication();
}
return CompletableFuture.completedFuture(null);
}

public static boolean isTopicPoliciesSystemTopic(String topic) {
if (topic == null) {
return false;
}
return TopicName.getPartitionedTopicName(topic).getLocalName().equals(NAMESPACE_EVENTS_LOCAL_NAME);
}

Therefore I'm not sure if the solution in this PR is correct or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants