Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbit: Stop Khepri store and Ra systems in stop/1 #10372

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dumbbell
Copy link
Member

@dumbbell dumbbell commented Jan 19, 2024

Why

The Khepri store and the coordinators are Ra servers and thus run under the Ra supervision tree. Therefore we need to explicitly stop them otherwise they will continue to run after rabbit was stopped.

Likewise, the Ra systems are under the Ra supervision tree. We also need to stop them.

How

We need to stop Ra servers and Ra systems that we started but live under
the Ra supervision tree. The order is important:

  1. The quorum_queues Ra system because it may host Ra servers that depend on Khepri.
  2. The stream coordinator because it depends on Khepri.
  3. The Khepri store; it could be stopped automatically with the termination of the underlying Ra system, but Khepri needs to do some cleanup too.
  4. The remaining Ra systems.

@dumbbell dumbbell added this to the 3.13.0 milestone Jan 19, 2024
@dumbbell dumbbell self-assigned this Jan 19, 2024
@dumbbell dumbbell changed the title rabbit: Stop Khepri store and Ra systems in stop/1 rabbit: Stop Khepri store and Ra systems in stop/1 Jan 19, 2024
@dumbbell dumbbell force-pushed the stop-khepri-store-and-ra-systems branch 5 times, most recently from 4d8fa17 to e030196 Compare January 22, 2024 11:17
@dumbbell dumbbell removed this from the 3.13.0 milestone Jan 22, 2024
[Why]
The Khepri store and the coordinators are Ra servers and thus run under
the Ra supervision tree. Therefore we need to explicitly stop them
otherwise they will continue to run after `rabbit` was stopped.

Likewise, the Ra systems are under the Ra supervision tree. We also need
to stop them.

[How]
We need to stop Ra servers and Ra systems that we started but live under
the Ra supervision tree. The order is important:
1. The `quorum_queues` Ra system because it may host Ra servers that
   depend on Khepri.
2. The stream coordinator because it depends on Khepri.
3. The Khepri store; it could be stopped automatically with the
   termination of the underlying Ra system, but Khepri needs to do some
   cleanup too.
4. The remaining Ra systems.
[Why]
This function blindly relied on the fact that the Ra `coordination`
system was not stopped by `rabbit` when it stopped. That Ra system is
finally correctly stopped in the parent commit. This breaks this
function and it must be adapted to work with the Ra system stopped.

This is even safer because the function deletes the Ra system directory
as its last step.

[How]
The Ra system data directory is queried from RabbitMQ configuration
instead of the running Ra system.

We add a couple assertions to ensure the conditions are the ones we
expect.
@dumbbell dumbbell force-pushed the stop-khepri-store-and-ra-systems branch from e030196 to ab671b7 Compare January 22, 2024 16:44
@dumbbell
Copy link
Member Author

Stopping Ra systems and servers with the rabbit application has some unfortunate consequences. We discovered that several areas of RabbitMQ willingly or unwillingly rely on the fact the these processes still run after rabbit has stopped.

One of them is the forget_cluster_node CLI command: changing the Ra cluster membership requires the Ra servers to run.

Ideally, we want to redesign the operations that require a stop_app/start_app pair of commands around them today to work. We would like to replace that with the maintenance mode and let rabbit run durring the operation otherwise. However to achieve, we need to stop using Mnesia.

Therefore this pull request is put on hold until we come back to it after Khepri is the only database we use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant