Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CPU 100% when deleting namespace #10337

Merged
merged 2 commits into from Apr 24, 2021
Merged

Conversation

315157973
Copy link
Contributor

Motivation

When deleting the namespace, the namespace Policies will be marked as deleted.
This will trigger topic's onPoliciesUpdate
However, in onPoliciesUpdate, the data of the Policies node on zk will be read, such as: checkReplicationAndRetryOnFailure
Due to the deletion of the namespace, the zk node may no longer exist at this time.
Failure to read data will trigger infinite retries.

private CompletableFuture<Void> checkReplicationAndRetryOnFailure() {
CompletableFuture<Void> result = new CompletableFuture<Void>();
checkReplication().thenAccept(res -> {
log.info("[{}] Policies updated successfully", topic);
result.complete(null);
}).exceptionally(th -> {
log.error("[{}] Policies update failed {}, scheduled retry in {} seconds", topic, th.getMessage(),
POLICY_UPDATE_FAILURE_RETRY_TIME_SECONDS, th);
if (!(th.getCause() instanceof TopicFencedException)) {
// retriable exception
brokerService.executor().schedule(this::checkReplicationAndRetryOnFailure,
POLICY_UPDATE_FAILURE_RETRY_TIME_SECONDS, TimeUnit.SECONDS);
}
result.completeExceptionally(th);
return null;
});
return result;
}

If there are many topics, there will be a short-term CPU spike

image

Modifications

Verifying this change

@315157973 315157973 changed the title Fix CPU 100% in some cases Fix CPU 100% when deleting namespace Apr 23, 2021
Policies policies = new Policies();
policies.deleted = true;
persistentTopic.onPoliciesUpdate(policies);
verify(persistentTopic, times(0)).checkReplicationAndRetryOnFailure();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CN we add a positive test, that verifies that this method is called in case of non deleted Policies?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect
Thanks

@eolivelli
Copy link
Contributor

@315157973
this patch does not apply to branch-2.7
can you please create a new PR for branch-2.7 ?

315157973 added a commit to 315157973/pulsar that referenced this pull request Apr 30, 2021
### Motivation
When deleting the namespace, the namespace Policies will be marked as deleted.
This will trigger topic's `onPoliciesUpdate`
However, in onPoliciesUpdate, the data of the Policies node on zk will be read, such as: `checkReplicationAndRetryOnFailure`
Due to the deletion of the namespace, the zk node may no longer exist at this time.
Failure to read data will trigger infinite retries.
https://github.com/apache/pulsar/blob/e970c2947aff9231202ab72bdbad047d85c55633/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L1175-L1193

If there are many topics, there will be a short-term CPU spike

![image](https://user-images.githubusercontent.com/9758905/115834541-ebc32480-a447-11eb-887a-95c4a3d1adf1.png)

# Conflicts:
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/service/persistent/PersistentTopicTest.java
315157973 added a commit to 315157973/pulsar that referenced this pull request Apr 30, 2021
When deleting the namespace, the namespace Policies will be marked as deleted.
This will trigger topic's `onPoliciesUpdate`
However, in onPoliciesUpdate, the data of the Policies node on zk will be read, such as: `checkReplicationAndRetryOnFailure`
Due to the deletion of the namespace, the zk node may no longer exist at this time.
Failure to read data will trigger infinite retries.
https://github.com/apache/pulsar/blob/e970c2947aff9231202ab72bdbad047d85c55633/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L1175-L1193

If there are many topics, there will be a short-term CPU spike

![image](https://user-images.githubusercontent.com/9758905/115834541-ebc32480-a447-11eb-887a-95c4a3d1adf1.png)
315157973 added a commit to 315157973/pulsar that referenced this pull request Apr 30, 2021
When deleting the namespace, the namespace Policies will be marked as deleted.
This will trigger topic's `onPoliciesUpdate`
However, in onPoliciesUpdate, the data of the Policies node on zk will be read, such as: `checkReplicationAndRetryOnFailure`
Due to the deletion of the namespace, the zk node may no longer exist at this time.
Failure to read data will trigger infinite retries.
https://github.com/apache/pulsar/blob/e970c2947aff9231202ab72bdbad047d85c55633/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L1175-L1193

If there are many topics, there will be a short-term CPU spike

![image](https://user-images.githubusercontent.com/9758905/115834541-ebc32480-a447-11eb-887a-95c4a3d1adf1.png)
merlimat pushed a commit that referenced this pull request May 3, 2021
When deleting the namespace, the namespace Policies will be marked as deleted.
This will trigger topic's `onPoliciesUpdate`
However, in onPoliciesUpdate, the data of the Policies node on zk will be read, such as: `checkReplicationAndRetryOnFailure`
Due to the deletion of the namespace, the zk node may no longer exist at this time.
Failure to read data will trigger infinite retries.
https://github.com/apache/pulsar/blob/e970c2947aff9231202ab72bdbad047d85c55633/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L1175-L1193

If there are many topics, there will be a short-term CPU spike

![image](https://user-images.githubusercontent.com/9758905/115834541-ebc32480-a447-11eb-887a-95c4a3d1adf1.png)
@eolivelli eolivelli added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label May 4, 2021
@eolivelli
Copy link
Contributor

@315157973 thank you for porting this patch to branch-2.7.
I have included it in 2.7.2rc1

@315157973 315157973 deleted the del branch July 19, 2021 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.7 Archived: 2.7 is end of life release/2.7.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants