Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some doc for shim reap orphan process #10002

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions core/runtime/v2/README.md
Expand Up @@ -537,3 +537,9 @@ It works with standard protobufs and GRPC services as well as generating clients
The only difference between grpc and ttrpc is the wire protocol.
ttrpc removes the http stack in order to save memory and binary size to keep shims small.
It is recommended to use ttrpc in your shim but grpc support is currently an experimental feature.

#### runc-shim reap orphan process created by exec-init
Copy link
Member

@fuweid fuweid Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

containerd-shim-runc-v2 as sub-reaper

The shim process takes responsibility as a sub-reaper to cleanup exited containers or setns(2) processes.

When container is running in new PID namespace, the container should cleanup orphaned processes before it exits.
If container uses the same PID namespace with shim process, its descendant processes will be reparented to shim process. The shim process will reap them when they exit.

However, [PATCH] exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction prevents any cross-namespace reparenting in kernel. Assume that container is in X-namespace and P in root-namespace setns into X-namespace. P forks child C. The child C forks a grandchild G and exits. The G will be reparented to X instead of P's reaper.

If the PID namespace is different from shim process, the container init process should cleanup any orphaned reparented processes created by setns process (exec operation).

cc @AkihiroSuda @dmcgowan

After this commit https://github.com/torvalds/linux/commit/c6c70f4455d1eda91065e93cc4f7eddf4499b105 (merged in kernel 4.11)
fuweid marked this conversation as resolved.
Show resolved Hide resolved
reaper can't cross the namespaces,so runc-shim is forbidden to reap orphan process created by exec-init even shim has process to reap,process1 in container will become reaper.
If orphan process exited it will become zombie process(if process1 in container doesn't call wait4 to reap zombie process).
We can let tini https://github.com/krallin/tini as process1 in container to reap zombie process,or use "shareProcessNamespace: true" for kubernetes let pause process to reap zombie process.