New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4.x: Additional CQv2 message store optimisations #11112
base: main
Are you sure you want to change the base?
Conversation
b2f11a2
to
66ad60e
Compare
After testing the commit CQ: Don't scan shared store files before deleting them does help avoid a backlog of compaction/delete operations in the store GC process. With 24 millions message, 4 queues (2 normal, 2 fan-out), I get about a minute and a half of backlog that is still being done after the queues have consumed all messages without the patch, and no backlog with the patch. I will see if the other commit helps at all next. |
66ad60e
to
7427cfe
Compare
This comment was marked as outdated.
This comment was marked as outdated.
4351976
to
84695ff
Compare
The second commit greatly improves dirty recovery times. On my machine, with data that's 24 million messages spread over many files and two queues, node recovery goes from 4min30 to less than 2min. The problem was that the old code was gathering messages from queues one by one (meaning 3 or 4 Erlang messages per AMQP message!!). Now it does so per segment file. There are still parts that could be improved for making dirty recovery blazingly fast but they require storing additional state on disk and so will not be investigated fully for now. |
07be784
to
fbf11f5
Compare
I will work on merging my 4.x PRs next week. |
Making one more addition to this PR so it is not ready to be merged yet. |
fd3a118
to
817c59e
Compare
5a7610a
to
d8a5536
Compare
This only applies to v2 because modifying this part of the v1 code is seen as too risky considering v1 will soon get removed.
d8a5536
to
0575002
Compare
I have dropped the first commit. It turns out that the function doing the scanning also removes entries from the ets index, and we need to keep that. So unfortunately we cannot avoid scanning for the time being. To better handle that we have little choice other than reworking the file format on disk. |
I will do some refactoring that should help detect problems like https://github.com/rabbitmq/rabbitmq-server/pull/11288/files better as well as to clearly separate parts of the code that relate with each other. |
Planning to include these for 4.0.