Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event time based compaction #22517

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

marekczajkowski
Copy link

Draft with Event time based topic compaction

Copy link

@marekczajkowski Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @marekczajkowski . I added an initial review.

public static List<ImmutableTriple<MessageId, String, Integer>> extractIdsAndKeysAndSize(
RawMessage msg)
throws IOException {
return extractMessageCompactionData(msg).stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java Streams are intentionally avoided in Pulsar code that is performance critical.
I think it's better to remove this completely since RawBatchConverter isn't part of the public API.

Comment on lines +1296 to +1297

compactionServiceFactoryClassName=org.apache.pulsar.compaction.EventTimeCompactionServiceFactory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this isn't intended to be part of the change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this PublishingOrderCompactor and call the other one EventTimeOrderCompactor. I don't think that TwoPhase is needed in the class name since that's an implementation detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think EventTimeOrderCompactor could be a better name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants