Profiling Implementation Notes - fapolicyd and profiling target sandboxing via systemd #523

tparchambault · 2022-06-30T17:07:36Z

tparchambault
Jun 30, 2022
Collaborator

Investigating systemd as the control mechanism underlying the AAC's file access profiler to potentially increase the profiling session isolation. I'll be focusing on the following areas, and whether using systemd units and sandboxing provide any distinct advantages over the current Python 3.6 subprocess.Popen() based implementation. The current implementation does address many initialization, runtime options, and runtime process management configurations with explicit code, however full runtime isolation is not available out-of-the-box.

Specifically the deltas between on-line, production fapolicyd instances, and off-line dry-run fapolicyd instances with a set of profiling target executables.

Distinct fapolicyd configuration and rules - A primary goal of the profiling tool is to provide an environment to verify the functionality of proposed rule changes. Additionally, the AAC's analysis backend requires a debug log with a specific format which is specified in `/etc/fapolicyd/fapolicyd.conf. This formating may differ from that used by the on-line production instance.
Filesystem isolation - The AAC pre- and post- profiling session state of the filesystem should ideally be consistent, other than session analysis artifacts. In other words, it would be nice if the filesystem is not modified by the execution of the profiling target(s), although data generated by the profiling instance of fapolicyd for downstream analysis are acceptable. I intend to investigate if an overlayfs mount is supported by systemd.
Redirection of stdout and stderr - These are both supported within systemd unit files. The challenge is to share a consistent naming convention across independent unit files.
Two independent systemd units sharing a single namespace. - According to the documentation this is a supported feature.
Profiling target and fapolicyd profiling instance controlled from within a single systemd unit or two independent units. Investigate dependency directives that can control unit ordering, and other exec directives that allow multiple executables to be invoked within a single unit e.g. ExecStartPre, ExecStartPost.
systemd unit types - The unit type dictates when systemd considers a service active. Since the fapolicyd daemon has a relatively long initialization period, it is important that the fapolicyd instance is fully available prior to invoking the profiling target executable.
Drop-in configuration files - These allow unit file directives to be dynamically overridden. Can they be used to specify all directives without a pre-existing unit file.This can allow JIT dynamic directive generation.

tparchambault · 2022-07-06T18:41:37Z

tparchambault
Jul 6, 2022
Collaborator Author

7. Drop-in configuration files -

Dynamically writing /usr/systemd/system/fapd_profiler.service.d/redirection.conf with the following text from fapd_manager.py works fine:

fpRedirection.write("""                                                     
[Service]                                                                       
StandardOutput=file:/tmp/fapd_profiling_override.stdout                         
StandardError=file:/tmp/fapd_profiling_override.stderr                          
""")

Provided the unit and drop-in file directives are reloaded and processed again with:
os.system("/usr/bin/systemctl daemon-reload")

Can they be used to specify all directives without a pre-existing unit file.

Re: Are pre-existing unit files required? The answer appears to be YES, they are required.

0 replies

tparchambault · 2022-07-06T21:34:18Z

tparchambault
Jul 6, 2022
Collaborator Author

3. Redirection of stdout and stderr -

Passiing the fapd stdout/stderr filenames w/embedded timestamps to drop-in file generator function. Good to go.

0 replies

tparchambault · 2022-07-18T14:51:32Z

tparchambault
Jul 18, 2022
Collaborator Author

4. Two independent systemd units sharing a single namespace. - According to the documentation this is a supported feature.

One issue that immediately became obvious is that bind mount unit directive result in a new host namespace, and that the profiling instance of fapolicyd no longer was communicating with the kernel to accept/deny file access events.

0 replies

tparchambault · 2022-07-18T15:08:14Z

tparchambault
Jul 18, 2022
Collaborator Author

2. Filesystem isolation - The AAC pre- and post- profiling session state of the filesystem should ideally be consistent, other than session analysis artifacts. In other words, it would be nice if the filesystem is not modified by the execution of the profiling target(s), although data generated by the profiling instance of fapolicyd for downstream analysis are acceptable. I intend to investigate if an overlayfs mount is supported by systemd.

As of writing this entry, getting full host filesystem isolation using an overlay/union mount of the host's filesystem was successful using

$ sudo systemd-nspawn -x -D / --overlay=/:/tmp/nspawn/upperfs/:/

with /tmp/nspawn/upperfs/ being a pre-existing and an (almost) arbitrary location. I say almost, because I'm not sure if there are any constraints on where the upper overlay layer can be located. The above example location /tmp/nspawn/upperfs did work for me, capturing all filesystem writes.

I'll investigate is a systemd service can support this same type of rootfs mounting strategy.

The bad news: fapolicyd is not starting (or staying up) in the systemd-nspawn container.

0 replies

tparchambault · 2022-07-20T15:00:25Z

tparchambault
Jul 20, 2022
Collaborator Author

Related to using the host's rootfs as nspawn container's rootfs, and mounting that same filesystem as an overlay fs. The fapolicyd-analyzer's profiling sessions are only meaningful on a specific system's filesystem with the associated debug log generated by the fapolicyd daemon during the time window when file access profiling i.e. running, the executable(s) of interest as they interact with the rootfs, i.e. CRUD filesystem entities. In this context, it would be desirable to have all CRUD operations be temporal, i.e. not persist, beyond the profiling session and subsequent analysis; to have the filesystem's state unchanged.

Using the host's rootfs as the rootfs of the nspawn container provides the appropriate runtime filesystem context for profiling arbitrary executables.
Mounting that filesystem as an overlay captures all CRUD operations to the upper layer, allowing the "real" host filesystem to be unmodified through the profiling process, after the upper layer is deleted

There were two issues in the systemd ecosystem in the 2019 timeframe, that addressed the two items above separately, with one addressing the use of the host's rootfs as the container's rootfs, and a second using overlays as the container's rootfs allowing writes to be captured w/o directly modifying the host's rootfs.

Some of the commenters were primarily interested only in the ability to use overlays when mounting the nspawn container's rootfs so that they could have multiple container's using a common base image that would be untouched by container instance filesystem writes.

Others were interested in mounting the host's rootfs but having the conainer's filesystem writes persist w/o modifying the host's filesystem possibly on another device.

Our PoC used a directory tree rooted under /tmp/ as our upperfs where all container filesystem writes would be captured (and possibly persist, if desired).

nspawn: overlayfs cannot be the root file system right now systemd/systemd#3847
Use host root as --template/reproduce firejail --overlay functionality using systemd-nspawn systemd/systemd#9044 # which is considered a dup of 3847, but it does present the usecase we desire.

And the PR performing the merge:

nspawn: Enable specifying root as the mount target directory. systemd/systemd#14269

1 reply

tparchambault Jul 20, 2022
Collaborator Author

And the PR performing the merge:

nspawn: Enable specifying root as the mount target directory. systemd/systemd#14269

The bad news: This functionality does not appear to be in systemd-nspawn as shipped in RHEL8.6.

The semi-good news:This functionality is in systemd-nspawn as shipped in Fedora 34.

tparchambault · 2022-07-21T17:13:02Z

tparchambault
Jul 21, 2022
Collaborator Author

With selinux in permissive mode, bind mounting the user's home directory, and allowing two fanotify related system calls allows fapolicyd to start and monitor file access operations within the nspawn container. Assuming that /tmp/nspawn/upperfs has been created, the following will create a container using the host's rootfs as the container's rootfs:

$ sudo systemd-nspawn -bxD / --overlay=/:/tmp/nspawn/upperfs:/ --bind=/home/toma  --capability=CAP_SYS_ADMIN --system-call-filter="fanotify_init fanotify_mark"

The capability option may or may not be needed. I believe it's superfluous, but the above line was tested extensively and performed correctly.so until proven otherwise I'll leave it in as working code.

The machinectl command can be used to execute arbitrary commands or have an interactive shell session within the nspawn container.

[toma@fedora ~]$ machinectl list
MACHINE                 CLASS     SERVICE        OS     VERSION ADDRESSES
fedora-b6b755bb4fec7199 container systemd-nspawn fedora 34      -        

1 machines listed.
[toma@fedora ~]$ machinectl shell toma@fedora-b6b755bb4fec7199
==== AUTHENTICATING FOR org.freedesktop.machine1.shell ====
Authentication is required to acquire a shell in a local container.
Authenticating as: Thomas Archambault (toma)
Password: 
==== AUTHENTICATION COMPLETE ====
Connected to machine fedora-b6b755bb4fec7199. Press ^] three times within 1s to exit session.
[toma@fedora-b6b755bb4fec7199 ~]$ !journ
journalctl -f -u fapolicyd.service 
Jul 21 12:41:29 fedora-b6b755bb4fec7199 fapolicyd[294]: Importing data from rpmdb backend
...

I also did not investigate virtual NICs and connectivity but I'm assuming based on the documentation, that standard container networking paradigms are supported, e.g. port translation, host/guest comms, etc.

In summary, systemd-nspawn with appropriate arguments, will allow file access profiling sessions using the host's rootfs, while allowing the pre-profiling filesystem state to be easily restored by the deletion of the upperfs layer after the profiling session and analysis is completed.

1 reply

tparchambault Jul 22, 2022
Collaborator Author

The above work was on Fedora 34

tparchambault · 2022-07-22T14:59:56Z

tparchambault
Jul 22, 2022
Collaborator Author

On RHEL90, systemd-nspawn is not available out of the box; It requires an explicit package installation. Also the above command blocks duplicating RHEL8 behavior so RHEL8's behavior may not be due to the version of systemd. Researching...

On FC36, systemd-nspawn is available out of the box, but fapolicyd does not complete its initialization. This symptom is similar to those experienced while debugging the FC34 platform issues above. Also researching currently...

0 replies

tparchambault · 2022-08-02T14:24:45Z

tparchambault
Aug 2, 2022
Collaborator Author

Update on FC36, using an additional command-line argument --volatile the FC36 is instantiated, initialized, and running. Bind mounts and associated options differ from those of nspawn shipper w/FC34.

On RHEL9.0, using logging at the debug level, and strace, the container is not instantiated, and the system calls indicate that it is repeatedly calling rt_sigtimedwait(). I don't want to put too much weight on this; it could be the monitoring thread looking at user stdin for the double ']' keystroke combination that sends a KILL signal to terminate the container. I'll run strace again but trying to monitor all threads (if they exist).

3 replies

tparchambault Aug 4, 2022
Collaborator Author

Communicating w/the developer's mailing list, it appears the issue is the file system type.

From: Lennart Poettering lennart@poettering.net
...
"-x" is ephemeral mode. This means nspawn will make a copy of the OS
tree before booting into it, and remove it afterwards.
"-x" on btrfs is very fast and space efficient, because btrfs supports
both snapshots and reflinks. nspawn will make a subvol snapshot if the
root you specify is a subvol. It will make reflink-based file copies
otherwise.
Other file systems have a more 1990's feature set, i.e. no reflinks
nor snapshots. (modern xfs on very new kernels can support reflinks if
this is opt-in'ed to.) In that case we have to copy the data files
with their contents, and that's slow.

My out-of-the-box RHEL9 install is using xfs iirc. I will briefly research Lennart's statement wrt xfs on very new kernels can support reflinks if this is opt-in'ed to and the process of enabling reflinks.

If we are talking about a boot-time kernal option or possibly setting a flag in the kernal/userspace interface tmpfs then that approach may be reasonable to inflict on client systems as a prerequisite. If we are requiring a custom config setting in the kernel then this approach is off the table.

tparchambault Aug 4, 2022
Collaborator Author

According to the mkfs.xfs documentation reflink support is enabled by default.

Update:
Our local RHEL90 rootfs also appears to have it enabled.

meta-data=/dev/mapper/rhel-root  isize=512    agcount=4, agsize=4185600 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1
data     =                       bsize=4096   blocks=16742400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=8175, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[toma@localhost ~]$

tparchambault Aug 10, 2022
Collaborator Author

Continuing to work with the developers via the nspawn devel mailing list. I believe that the issue is that nspawn (as currently shipped over RHEL9.0) does not support xfs reflink copies. Floated the idea of an enhancement request, because it is running correctly as coded / designed, it's just not useful over XFS with that inherent start-up latency due to full filesystem copies.

They may have already addressed the reflink issue and the shipped version of nspawn may just be dated. I expect to know the answer to that by tomorrow AM due to timezone differences. Developers are in Berlin.

Background: RHEL90's default fs type is xfs which has reflink support according to the RHEL docs: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html-single/managing_file_systems/index#the-xfs-file-system_assembly_overview-of-available-file-systems.

tparchambault · 2022-08-10T19:32:05Z

tparchambault
Aug 10, 2022
Collaborator Author

More background wrt btrfs and RHEL: https://access.redhat.com/discussions/3138231

We need to leverage our RH insider to determine if btrfs is in-play.

0 replies

tparchambault · 2022-08-18T14:20:20Z

tparchambault
Aug 18, 2022
Collaborator Author

Closing a research/testing draft PR that was used to generate rpms. It uses the Rust Handle and systemd/dbus to start/stop/query the fapd profiling instance.

See: #515

And the last comment from the above PR for posterity:

#515 (comment)

This is a working draft PR so that I could generate rpms via CI and stay in-sync w/master and unit-tests. I should close it now that containerization is in the discussion but I'd like to keep the branch around until we have a hard decision on the implementation tack. I'll sync w/master first so I get a final rpm built...

The problem with starting the fapd profiling instance as a service and using any santboxing/isolation feature declarations is that these rely on namespaces which can differ from those used by the profiling tgt executable. Supposedly (according to the docs) there are ways to get two services to be in the same namespaces but that would require more investigation/research.

It was during that investigation that I ran across the systemd-nspawn containerization approach. So service units and/or direct containerization w/nspawn or podman, maybe? If it supports using the host's rootfs in ephemeral mode. (Ephemeral being the nspawn option/tem; I have no idea if that is a standard OCI/containers term)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling Implementation Notes - fapolicyd and profiling target sandboxing via systemd #523

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 10 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Profiling Implementation Notes - fapolicyd and profiling target sandboxing via systemd #523

tparchambault Jun 30, 2022 Collaborator

Replies: 10 comments · 5 replies

tparchambault Jul 6, 2022 Collaborator Author

tparchambault Jul 6, 2022 Collaborator Author

tparchambault Jul 18, 2022 Collaborator Author

tparchambault Jul 18, 2022 Collaborator Author

tparchambault Jul 20, 2022 Collaborator Author

tparchambault Jul 20, 2022 Collaborator Author

tparchambault Jul 21, 2022 Collaborator Author

tparchambault Jul 22, 2022 Collaborator Author

tparchambault Jul 22, 2022 Collaborator Author

tparchambault Aug 2, 2022 Collaborator Author

tparchambault Aug 4, 2022 Collaborator Author

tparchambault Aug 4, 2022 Collaborator Author

tparchambault Aug 10, 2022 Collaborator Author

tparchambault Aug 10, 2022 Collaborator Author

tparchambault Aug 18, 2022 Collaborator Author

tparchambault
Jun 30, 2022
Collaborator

Replies: 10 comments 5 replies

tparchambault
Jul 6, 2022
Collaborator Author

tparchambault
Jul 6, 2022
Collaborator Author

tparchambault
Jul 18, 2022
Collaborator Author

tparchambault
Jul 18, 2022
Collaborator Author

tparchambault
Jul 20, 2022
Collaborator Author

tparchambault Jul 20, 2022
Collaborator Author

tparchambault
Jul 21, 2022
Collaborator Author

tparchambault Jul 22, 2022
Collaborator Author

tparchambault
Jul 22, 2022
Collaborator Author

tparchambault
Aug 2, 2022
Collaborator Author

tparchambault Aug 4, 2022
Collaborator Author

tparchambault Aug 4, 2022
Collaborator Author

tparchambault Aug 10, 2022
Collaborator Author

tparchambault
Aug 10, 2022
Collaborator Author

tparchambault
Aug 18, 2022
Collaborator Author