Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.x: Additional CQv2 message store optimisations #11112

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lhoguin
Copy link
Contributor

@lhoguin lhoguin commented Apr 29, 2024

Planning to include these for 4.0.

@essen essen force-pushed the loic-faster-cq-shared-store-gc branch from b2f11a2 to 66ad60e Compare April 29, 2024 08:48
@michaelklishin michaelklishin changed the title CQ: Additional message store GC optimisations DO NOT MERGE 4.x: DO NOT MERGE Additional CQv2 message store GC optimisations Apr 29, 2024
@lhoguin
Copy link
Contributor Author

lhoguin commented Apr 30, 2024

After testing the commit CQ: Don't scan shared store files before deleting them does help avoid a backlog of compaction/delete operations in the store GC process. With 24 millions message, 4 queues (2 normal, 2 fan-out), I get about a minute and a half of backlog that is still being done after the queues have consumed all messages without the patch, and no backlog with the patch.

I will see if the other commit helps at all next.

@essen essen force-pushed the loic-faster-cq-shared-store-gc branch from 66ad60e to 7427cfe Compare May 2, 2024 09:36
@lhoguin

This comment was marked as outdated.

@essen essen force-pushed the loic-faster-cq-shared-store-gc branch from 4351976 to 84695ff Compare May 6, 2024 15:49
@lhoguin
Copy link
Contributor Author

lhoguin commented May 6, 2024

The second commit greatly improves dirty recovery times. On my machine, with data that's 24 million messages spread over many files and two queues, node recovery goes from 4min30 to less than 2min. The problem was that the old code was gathering messages from queues one by one (meaning 3 or 4 Erlang messages per AMQP message!!). Now it does so per segment file.

There are still parts that could be improved for making dirty recovery blazingly fast but they require storing additional state on disk and so will not be investigated fully for now.

@essen essen force-pushed the loic-faster-cq-shared-store-gc branch 2 times, most recently from 07be784 to fbf11f5 Compare May 7, 2024 10:46
@lhoguin lhoguin changed the title 4.x: DO NOT MERGE Additional CQv2 message store GC optimisations 4.x: Additional CQv2 message store optimisations May 7, 2024
@lhoguin lhoguin marked this pull request as ready for review May 7, 2024 12:09
@lhoguin
Copy link
Contributor Author

lhoguin commented May 7, 2024

I will work on merging my 4.x PRs next week.

@michaelklishin michaelklishin added this to the 4.0.0 milestone May 7, 2024
@lhoguin
Copy link
Contributor Author

lhoguin commented May 14, 2024

Making one more addition to this PR so it is not ready to be merged yet.

@essen essen force-pushed the loic-faster-cq-shared-store-gc branch 2 times, most recently from fd3a118 to 817c59e Compare May 14, 2024 14:57
@essen essen force-pushed the loic-faster-cq-shared-store-gc branch 2 times, most recently from 5a7610a to d8a5536 Compare May 15, 2024 11:30
This only applies to v2 because modifying this part of the v1
code is seen as too risky considering v1 will soon get removed.
@essen essen force-pushed the loic-faster-cq-shared-store-gc branch from d8a5536 to 0575002 Compare May 16, 2024 09:20
@lhoguin
Copy link
Contributor Author

lhoguin commented May 16, 2024

I have dropped the first commit. It turns out that the function doing the scanning also removes entries from the ets index, and we need to keep that. So unfortunately we cannot avoid scanning for the time being. To better handle that we have little choice other than reworking the file format on disk.

@lhoguin
Copy link
Contributor Author

lhoguin commented May 24, 2024

I will do some refactoring that should help detect problems like https://github.com/rabbitmq/rabbitmq-server/pull/11288/files better as well as to clearly separate parts of the code that relate with each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants