Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CPU 100% when deleting namespace (#10337) #10454

Merged
merged 1 commit into from May 3, 2021

Conversation

315157973
Copy link
Contributor

@315157973 315157973 commented Apr 30, 2021

Cherry-pick PR to branch-2.7

Motivation

When deleting the namespace, the namespace Policies will be marked as deleted.
This will trigger topic's onPoliciesUpdate
However, in onPoliciesUpdate, the data of the Policies node on zk will be read, such as: checkReplicationAndRetryOnFailure
Due to the deletion of the namespace, the zk node may no longer exist at this time.
Failure to read data will trigger infinite retries.

private CompletableFuture<Void> checkReplicationAndRetryOnFailure() {
CompletableFuture<Void> result = new CompletableFuture<Void>();
checkReplication().thenAccept(res -> {
log.info("[{}] Policies updated successfully", topic);
result.complete(null);
}).exceptionally(th -> {
log.error("[{}] Policies update failed {}, scheduled retry in {} seconds", topic, th.getMessage(),
POLICY_UPDATE_FAILURE_RETRY_TIME_SECONDS, th);
if (!(th.getCause() instanceof TopicFencedException)) {
// retriable exception
brokerService.executor().schedule(this::checkReplicationAndRetryOnFailure,
POLICY_UPDATE_FAILURE_RETRY_TIME_SECONDS, TimeUnit.SECONDS);
}
result.completeExceptionally(th);
return null;
});
return result;
}

If there are many topics, there will be a short-term CPU spike

image

Conflicts:
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/persistent/PersistentTopicTest.java

When deleting the namespace, the namespace Policies will be marked as deleted.
This will trigger topic's `onPoliciesUpdate`
However, in onPoliciesUpdate, the data of the Policies node on zk will be read, such as: `checkReplicationAndRetryOnFailure`
Due to the deletion of the namespace, the zk node may no longer exist at this time.
Failure to read data will trigger infinite retries.
https://github.com/apache/pulsar/blob/e970c2947aff9231202ab72bdbad047d85c55633/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L1175-L1193

If there are many topics, there will be a short-term CPU spike

![image](https://user-images.githubusercontent.com/9758905/115834541-ebc32480-a447-11eb-887a-95c4a3d1adf1.png)
@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label May 3, 2021
@merlimat merlimat added this to the 2.8.0 milestone May 3, 2021
@merlimat merlimat merged commit 7bf14b5 into apache:branch-2.7 May 3, 2021
@315157973 315157973 deleted the cpu-100 branch May 11, 2021 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants