Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][meta] Fix invalid use of drain API and race condition in closing metadata store #22585

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Apr 25, 2024

Motivation

There's currently some memory leaks in tests and while investigating the issue, I found out that there's a large number of uncompleted CompletableFutures in the heap dump.

Currently the metadata store doesn't complete all pending operations when it is closed.
There are multiple problems:

  • MpscUnboundedArrayQueue.drain will limit the number of entries to 4096 at a time
  • MpscUnboundedArrayQueue.drain uses relaxedPoll which is eventually consistent
  • There might be batches in flight, which need to be terminated

Modifications

  • replace drain with a while loop in close
  • replace drain with a for loop in batch flushing
  • add isClosed checks

Additional Context

CompletableFutures in the heap dump:

image

The instances are related to org.apache.pulsar.broker.resources.NamespaceResources$PartitionedTopicResources$$Lambda$1819+0x00007f08a8b65ee8:
image

org.apache.pulsar.broker.resources.NamespaceResources$PartitionedTopicResources$$Lambda$1819+0x00007f08a8b65ee8 seems to be this code block:

markPartitionedTopicDeletedAsync(topic).whenCompleteAsync((markResult, markExc) -> {
final boolean mdFound;
if (markExc != null) {
if (markExc.getCause() instanceof MetadataStoreException.NotFoundException) {
mdFound = false;
} else {
log.error("Failed to mark the topic {} as deleted", topic, markExc);
future.completeExceptionally(markExc);
return;
}
} else {
mdFound = true;
}
supplier.get().whenComplete((deleteResult, deleteExc) -> {
if (deleteExc != null && mdFound) {
unmarkPartitionedTopicDeletedAsync(topic)
.thenRun(() -> future.completeExceptionally(deleteExc))
.exceptionally(ex -> {
log.warn("Failed to unmark the topic {} as deleted", topic, ex);
future.completeExceptionally(deleteExc);
return null;
});
} else if (deleteExc != null) {
future.completeExceptionally(deleteExc);
} else {
future.complete(deleteResult);
}
});
});

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari added this to the 3.3.0 milestone Apr 25, 2024
@lhotari lhotari self-assigned this Apr 25, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 25, 2024
@lhotari
Copy link
Member Author

lhotari commented Apr 25, 2024

It seems that the OOME is another issue. #22586

}
while ((op = writeOps.poll()) != null) {
op.getFuture().completeExceptionally(ex);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -99,7 +103,13 @@ public void close() throws Exception {
private void flush() {
while (!readOps.isEmpty()) {
List<MetadataOp> ops = new ArrayList<>();
readOps.drain(ops::add, maxOperations);
for (int i = 0; i < maxOperations; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be ok, since it's already done in a loop: while (!readOps.isEmpty()) {...}

@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants