Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable setgroups #157

Merged
merged 1 commit into from
Feb 13, 2023
Merged

Enable setgroups #157

merged 1 commit into from
Feb 13, 2023

Conversation

akihikodaki
Copy link
Contributor

@akihikodaki akihikodaki commented Nov 4, 2022

Not tested.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>

Copy link
Contributor

@YanVugenfirer YanVugenfirer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akihikodaki Do you see CodeQL alert?

@akihikodaki
Copy link
Contributor Author

@YanVugenfirer Yes, it will be fixed with #156.

@kostyanf14
Copy link
Contributor

Failed to test on Ubuntu 20.04 LTS and Fedora 34.

unshare: unrecognized option '--map-user=root'

Ubuntu 20.04

$ unshare --version
unshare from util-linux 2.34
$ unshare --help

Usage:
 unshare [options] [<program> [<argument>...]]

Run a program with some namespaces unshared from the parent.

Options:
 -m, --mount[=<file>]      unshare mounts namespace
 -u, --uts[=<file>]        unshare UTS namespace (hostname etc)
 -i, --ipc[=<file>]        unshare System V IPC namespace
 -n, --net[=<file>]        unshare network namespace
 -p, --pid[=<file>]        unshare pid namespace
 -U, --user[=<file>]       unshare user namespace
 -C, --cgroup[=<file>]     unshare cgroup namespace

 -f, --fork                fork before launching <program>
 -r, --map-root-user       map current user to root (implies --user)

 --kill-child[=<signame>]  when dying, kill the forked child (implies --fork)
                             defaults to SIGKILL
 --mount-proc[=<dir>]      mount proc filesystem first (implies --mount)
 --propagation slave|shared|private|unchanged
                           modify mount propagation in mount namespace
 --setgroups allow|deny    control the setgroups syscall in user namespaces

 -R, --root=<dir>           run the command with root directory set to <dir>
 -w, --wd=<dir>     change working directory to <dir>
 -S, --setuid <uid>         set uid in entered namespace
 -G, --setgid <gid>         set gid in entered namespace

 -h, --help                display this help
 -V, --version             display version

For more details see unshare(1).

Fedora 34:

# unshare --version
unshare from util-linux 2.36.2
# unshare --help

Usage:
 unshare [options] [<program> [<argument>...]]

Run a program with some namespaces unshared from the parent.

Options:
 -m, --mount[=<file>]      unshare mounts namespace
 -u, --uts[=<file>]        unshare UTS namespace (hostname etc)
 -i, --ipc[=<file>]        unshare System V IPC namespace
 -n, --net[=<file>]        unshare network namespace
 -p, --pid[=<file>]        unshare pid namespace
 -U, --user[=<file>]       unshare user namespace
 -C, --cgroup[=<file>]     unshare cgroup namespace
 -T, --time[=<file>]       unshare time namespace

 -f, --fork                fork before launching <program>
 --map-user=<uid>|<name>   map current user to uid (implies --user)
 --map-group=<gid>|<name>  map current group to gid (implies --user)
 -r, --map-root-user       map current user to root (implies --user)
 -c, --map-current-user    map current user to itself (implies --user)

 --kill-child[=<signame>]  when dying, kill the forked child (implies --fork)
                             defaults to SIGKILL
 --mount-proc[=<dir>]      mount proc filesystem first (implies --mount)
 --propagation slave|shared|private|unchanged
                           modify mount propagation in mount namespace
 --setgroups allow|deny    control the setgroups syscall in user namespaces
 --keep-caps               retain capabilities granted in user namespaces

 -R, --root=<dir>          run the command with root directory set to <dir>
 -w, --wd=<dir>            change working directory to <dir>
 -S, --setuid <uid>        set uid in entered namespace
 -G, --setgid <gid>        set gid in entered namespace
 --monotonic <offset>      set clock monotonic offset (seconds) in time namespaces
 --boottime <offset>       set clock boottime offset (seconds) in time namespaces

 -h, --help                display this help
 -V, --version             display version

For more details see unshare(1).

@akihikodaki akihikodaki force-pushed the akihikodaki/enable-setgroups branch 2 times, most recently from 33c01e8 to 7c868ad Compare November 8, 2022 06:31
@akihikodaki
Copy link
Contributor Author

@kostyanf14 Can you test commit 7c868ad? It got uglier but (hopefully) should work.

@kostyanf14
Copy link
Contributor

Ubuntu 20.04

/bin/ns:54:in `system': No such file or directory - newgidmap (Errno::ENOENT)

Need to install sudo apt install uidmap

This patch works.

Fedora 34

bundler: failed to load command: ./bin/ns (./bin/ns)
.../bin/ns:29:in `readline': end of file reached (EOFError)
# cat /etc/subgid
test:100000:65536

This patch does not work.

@akihikodaki
Copy link
Contributor Author

@kostyanf14 Looks like we need to require uidmap. newgidmap is a setuid helper and not something we can implement by ourselves.
In the Fedora 34 case, I think you use the root user while /etc/subgid only includes test user. You need to call the script from a user listed in /etc/subgid or add the current user to /etc/subgid. In this particular case, you may use su or sudo to switch the current user to test user (running this giant program on root is something better to avoid anyway).

@akihikodaki
Copy link
Contributor Author

Commit ae5b30a adds a friendly error message in case the current user is not listed in /etc/subgid.

@kostyanf14
Copy link
Contributor

running this giant program on root is something better to avoid anyway

Let's add some questions about this to the installer. For now, the temporary HCK-CI server installs automatically without additional users so this mechanism will be broken.

@akihikodaki
Copy link
Contributor Author

I think that won't be a problem for the most new installation because the instruction does no longer say it needs to be executed as root and a normal user is listed in /etc/subgid.
A user-friendly error message may be better-suited instead of a question asked for everyone to handle this kind of exceptional case.

@akihikodaki
Copy link
Contributor Author

virtiofsd breakage with the namespace change

Problem Description

virtiofsd required two privileges not provided in the namespace created by bin/ns.

setgroups()

setgroups() is usually prohibited for a namespace created by a normal user because the following security problem:
Consider the case that a file has fewer permission bits for “group” than for “other”. In this case, belonging to the group means less privilege. If a user belonging to the group use setgroups() to drop the group membership, it will be privilege escalation.

Access to the run directory

virtiofsd uses the run directory to remember what path is currently occupied by Unix domain sockets it created. It points to the path not writable for a normal user like /var/run with one exceptional case: when virtiofsd of QEMU version prior to 7.1 is executed from the build directory.

In older versions of QEMU, the default run directory shared the prefix with the binary directory. Concretely, the binary directory was at /usr/local/bin, and the run directory was at /usr/local/var/run. This made virtiofsd to relocate the run directory by generating the relative path to the run directory, and resolve the relative path with the actual binary location. For example, if virtiofsd is at /home/person/qemu/build/tools, the following procedure will be executed:

  1. It first generates the relative path to run directory: ../var/run.
  2. Prepends /home/person/qemu/build/tools/: /home/person/qemu/build/tools/../var/run.

As the consequence, the path to the run directory points to a directory writable for a normal user. In the other cases (running newer virtiofsd or running virtiofsd installed to the system), the run directory usually points to a directory not writable for a user.

Current Attempts to Solve the Problems

Solving setgroups()

This Pull Request

This pull request employed newgidmap setuid helper to allow setgroups() in the namespace. newgidmap reads /etc/subgid to determine the group ID range the current user can freely use (subgroup delegation). It is allowed to drop subgroups with setgroups() as long as the user is in the group ID range, and newgidmap configures the namespace accordingly.

unshare provides a convenient option to use newgidmap named --map-groups, but unfortunately the option is not available in Ubuntu and old Fedora. See the comments in bin/ns for details.

newgidmap is not available on Ubuntu by default. A user needs to install uidmap package.

Access to the run Directory

This is the difficult one.

Modify virtiofsd to use XDG_RUNTIME_DIR

Sent patches to the upstream, but it will take long until the change becomes available for everybody:

https://patchew.org/QEMU/20221110062329.13363-1-akihiko.odaki@daynix.com/

Mount the run directory

In theory, it is possible to mount tmpfs or something to the run directory as done for /etc/resolv.conf. However, I found significant challenges with this:

  • The run directory path to be used by virtiofsd is hard to guess.
  • The run directory may not contain virtiofsd directory, which is convenient to use as the mount point.
  • Mounting the entire run directory can break the system. For example, on my system, /etc/resolv.conf is a symlink to a file in /var/run. As /etc/resolv.conf mounting happens to the file pointed by the symlink, mounting the entire run directory effectively breaks /etc/resolv.conf.
  • overlayfs can preserve the content of the current tree while overlaying something on it, but it is just too complex.

Solving the run Directory Problem Now

The virtiofsd change is the long-term solution. There are two options for now:

Tell people to configure virtiofsd so that it will not try to write to a directory not writable

This requires people to compile virtiofsd by themselves. Not ideal.

Just use sudo

Currently, bin/ns drops the root privilege so using sudo does not make any difference. However, it is possible to preserve the root privilege by dropping CLONE_NEWUSER flag for unshare when the user is root. It will make the current user namespace maintained, and let the root user behave as root.

In fact, the namespace change was to reproduce the network environment, and removing the root requirement was just a nice side-effect. Perhaps we may just require to use sudo when running virtiofsd without the proposed XDG_RUNTIME_DIR change.

@akihikodaki
Copy link
Contributor Author

@kostyanf14 Pushed change to skip user namespace creation for root.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
@kostyanf14
Copy link
Contributor

Rebased PR to current master

@YanVugenfirer YanVugenfirer merged commit 66b52ca into master Feb 13, 2023
@YanVugenfirer YanVugenfirer deleted the akihikodaki/enable-setgroups branch February 13, 2023 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants