Profiling Implementation Notes - fapolicyd and profiling target sandboxing via systemd #523
Replies: 10 comments 5 replies
-
Dynamically writing /usr/systemd/system/fapd_profiler.service.d/redirection.conf with the following text from fapd_manager.py works fine:
Provided the unit and drop-in file directives are reloaded and processed again with:
Re: Are pre-existing unit files required? The answer appears to be YES, they are required. |
Beta Was this translation helpful? Give feedback.
-
Passiing the fapd stdout/stderr filenames w/embedded timestamps to drop-in file generator function. Good to go. |
Beta Was this translation helpful? Give feedback.
-
One issue that immediately became obvious is that bind mount unit directive result in a new host namespace, and that the profiling instance of fapolicyd no longer was communicating with the kernel to accept/deny file access events. |
Beta Was this translation helpful? Give feedback.
-
As of writing this entry, getting full host filesystem isolation using an overlay/union mount of the host's filesystem was successful using
with /tmp/nspawn/upperfs/ being a pre-existing and an (almost) arbitrary location. I say almost, because I'm not sure if there are any constraints on where the upper overlay layer can be located. The above example location I'll investigate is a systemd service can support this same type of rootfs mounting strategy. The bad news: fapolicyd is not starting (or staying up) in the systemd-nspawn container. |
Beta Was this translation helpful? Give feedback.
-
Related to using the host's rootfs as nspawn container's rootfs, and mounting that same filesystem as an overlay fs. The fapolicyd-analyzer's profiling sessions are only meaningful on a specific system's filesystem with the associated debug log generated by the fapolicyd daemon during the time window when file access profiling i.e. running, the executable(s) of interest as they interact with the rootfs, i.e. CRUD filesystem entities. In this context, it would be desirable to have all CRUD operations be temporal, i.e. not persist, beyond the profiling session and subsequent analysis; to have the filesystem's state unchanged.
There were two issues in the systemd ecosystem in the 2019 timeframe, that addressed the two items above separately, with one addressing the use of the host's rootfs as the container's rootfs, and a second using overlays as the container's rootfs allowing writes to be captured w/o directly modifying the host's rootfs. Some of the commenters were primarily interested only in the ability to use overlays when mounting the nspawn container's rootfs so that they could have multiple container's using a common base image that would be untouched by container instance filesystem writes. Others were interested in mounting the host's rootfs but having the conainer's filesystem writes persist w/o modifying the host's filesystem possibly on another device. Our PoC used a directory tree rooted under /tmp/ as our upperfs where all container filesystem writes would be captured (and possibly persist, if desired).
And the PR performing the merge: |
Beta Was this translation helpful? Give feedback.
-
With selinux in permissive mode, bind mounting the user's home directory, and allowing two fanotify related system calls allows fapolicyd to start and monitor file access operations within the nspawn container. Assuming that
The capability option may or may not be needed. I believe it's superfluous, but the above line was tested extensively and performed correctly.so until proven otherwise I'll leave it in as working code. The
I also did not investigate virtual NICs and connectivity but I'm assuming based on the documentation, that standard container networking paradigms are supported, e.g. port translation, host/guest comms, etc. In summary, systemd-nspawn with appropriate arguments, will allow file access profiling sessions using the host's rootfs, while allowing the pre-profiling filesystem state to be easily restored by the deletion of the upperfs layer after the profiling session and analysis is completed. |
Beta Was this translation helpful? Give feedback.
-
On RHEL90, systemd-nspawn is not available out of the box; It requires an explicit package installation. Also the above command blocks duplicating RHEL8 behavior so RHEL8's behavior may not be due to the version of systemd. Researching... On FC36, systemd-nspawn is available out of the box, but fapolicyd does not complete its initialization. This symptom is similar to those experienced while debugging the FC34 platform issues above. Also researching currently... |
Beta Was this translation helpful? Give feedback.
-
Update on FC36, using an additional command-line argument --volatile the FC36 is instantiated, initialized, and running. Bind mounts and associated options differ from those of nspawn shipper w/FC34. On RHEL9.0, using logging at the debug level, and strace, the container is not instantiated, and the system calls indicate that it is repeatedly calling rt_sigtimedwait(). I don't want to put too much weight on this; it could be the monitoring thread looking at user stdin for the double ']' keystroke combination that sends a KILL signal to terminate the container. I'll run strace again but trying to monitor all threads (if they exist). |
Beta Was this translation helpful? Give feedback.
-
More background wrt btrfs and RHEL: https://access.redhat.com/discussions/3138231 We need to leverage our RH insider to determine if btrfs is in-play. |
Beta Was this translation helpful? Give feedback.
-
Closing a research/testing draft PR that was used to generate rpms. It uses the Rust Handle and systemd/dbus to start/stop/query the fapd profiling instance. See: #515 And the last comment from the above PR for posterity:
|
Beta Was this translation helpful? Give feedback.
-
Investigating systemd as the control mechanism underlying the AAC's file access profiler to potentially increase the profiling session isolation. I'll be focusing on the following areas, and whether using systemd units and sandboxing provide any distinct advantages over the current Python 3.6 subprocess.Popen() based implementation. The current implementation does address many initialization, runtime options, and runtime process management configurations with explicit code, however full runtime isolation is not available out-of-the-box.
Specifically the deltas between on-line, production fapolicyd instances, and off-line dry-run fapolicyd instances with a set of profiling target executables.
Beta Was this translation helpful? Give feedback.
All reactions