Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
document usage and design of blockfile snapshotter
Signed-off-by: Avi Deitcher <avi@deitcher.net>
- Loading branch information
Showing
2 changed files
with
151 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
# Blockfile Snapshotter | ||
|
||
The blockfile snapshotter uses raw block files for each snapshot. Block files are | ||
copied from a parent or base empty block file. Mounting requires a virtual machine | ||
or support for loopback mounts. | ||
|
||
## Use Case | ||
|
||
Snapshotters serve the purpose of extracting an image from the OCI image store and | ||
creating a snapshot that is useful to containers. It handles setting up the | ||
underlying infrastructure, such as preparing a directory or other filesystem setup, | ||
applying the layers to create a single mountable directory to serve as the container | ||
base, and mounting into the container upon start. | ||
|
||
The most commonly used snapshotter is the overlayfs snapshotter, which is the default | ||
in containerd. The overlayfs snapshotter provides a directory on the host filesystem, | ||
which then is bind-mounted into the container. | ||
|
||
The blockfile snapshotter targets a use case where the container will run inside a | ||
VM. Specifically, the OCI image will be the filesystem for the container, like with | ||
a normal container, but the container itself will be run inside a VM. | ||
Since the VM cannot bind-mount directories from the host, the blockfile snapshotter | ||
creates a block device for the snapshot, which can be attached to the VM as a block | ||
device to facilitate getting the contents into the guest. | ||
|
||
## Alternatives | ||
|
||
There are alternatives to the blockfile snapshotter for mounting directories into a | ||
VM. One alternative is a [virtiofs](https://virtio-fs.gitlab.io) driver, | ||
assuming your VMM supports it. Similarly, you can use | ||
[9p](https://www.kernel.org/doc/Documentation/filesystems/9p.txt) to mount a local | ||
directory into the VM, assuming your VMM supports it. | ||
|
||
Additionally, the [devicemapper snapshotter](./devmapper.md) can be used to create | ||
snapshots on filesystem images in a devicemapper thin-pool. | ||
|
||
## Usage | ||
|
||
### Checking if the blockfile snapshotter is available | ||
|
||
To check if the blockfile snapshotter is available, run the following command: | ||
|
||
```bash | ||
$ ctr plugins ls | grep blockfile | ||
``` | ||
|
||
### Configuration | ||
|
||
To configure the snapshotter, you can use the following configuration options | ||
in your containerd `config.toml`. Don't forget to restart it after changing the | ||
configuration. | ||
|
||
```toml | ||
[plugins.'io.containerd.snapshotter.v1.blockfile'] | ||
scratch_file = "/opt/containerd/blockfile" | ||
root_path = "/somewhere/on/disk" | ||
fs_type = 'ext4' | ||
mount_options = [] | ||
recreate_scratch = true | ||
``` | ||
|
||
- `root_path`: The directory where the block files are stored. This directory must be writable by the containerd process. | ||
- `scratch_file`: The path to the empty file that will be used as the base for the block files. This file should exist before first using the snapshotter. | ||
- `fs_type`: The filesystem type to use for the block files. Currently supported are `ext4` and `xfs`. | ||
- `mount_options`: Additional mount options to use when mounting the block files. | ||
- `recreate_scratch`: If set to `true`, the snapshotter will recreate the scratch file if it is missing. If set to `false`, the snapshotter will fail if the scratch file is missing. | ||
|
||
### Creating the scratch file | ||
|
||
You can create a scratch file as follows. This example uses a 500MB scratch file. | ||
|
||
```bash | ||
$ # make a 500M file | ||
$ dd if=/dev/zero of=/opt/containerd/blockfile bs=1M count=500 | ||
500+0 records in | ||
500+0 records out | ||
524288000 bytes (524 MB, 500 MiB) copied, 1.76253 s, 297 MB/s | ||
|
||
$ # format the file with ext4 | ||
$ sudo mkfs.ext4 /opt/containerd/blockfile | ||
mke2fs 1.47.0 (5-Feb-2023) | ||
Discarding device blocks: done | ||
Creating filesystem with 512000 1k blocks and 128016 inodes | ||
Filesystem UUID: d9947ecc-722d-4627-9cf9-fa2a3b622106 | ||
Superblock backups stored on blocks: | ||
8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409 | ||
|
||
Allocating group tables: done | ||
Writing inode tables: done | ||
Creating journal (8192 blocks): done | ||
Writing superblocks and filesystem accounting information: done | ||
``` | ||
|
||
### Running a container | ||
|
||
To run a container using the blockfile snapshotter, you need to specify the | ||
snapshotter: | ||
|
||
```bash | ||
$ # ensure that the image we are using exists; it is a regular OCI image | ||
$ ctr image pull docker.io/library/busybox:latest | ||
$ # run the container with the provides snapshotter | ||
$ ctr run -rm -t --snapshotter blockfile docker.io/library/busybox:latest hello sh | ||
``` | ||
|
||
To use it via the go client API, it is identical to using any other snapshotter: | ||
|
||
```go | ||
// TODO: will fill in | ||
``` | ||
|
||
## How It Works | ||
|
||
The blockfile snapshotter functions similarly to other snapshotters. | ||
It unpacks each individual layer from a container image, with each layer unpack | ||
building on the content from its parent(s). | ||
|
||
The blockfile snapshotter is unique in two ways: | ||
|
||
1. It applies layers inside a disk image file, rather than on the host filesystem. | ||
1. It creates a block image file for each layer, applying the previous on top of it. | ||
|
||
Rather than a single directory with the contents, the end of the blockfile | ||
snapshotter's process is a single file, which has the contents of the full | ||
filesystem image. That image file can be loopback mounted, or attached to a virtual | ||
machine. | ||
|
||
For every layer the snapshotter creates a new blockfile, starting with a copy of the | ||
blockfile from the previous layer. If there is no previous layer, i.e. for the first | ||
layer, it copies the scratch file. | ||
|
||
For example, for an image with 3 layers - called A, B, C - the process is as follows: | ||
|
||
1. Copy the scratch file to a new blockfile for layer A. | ||
1. Loopback-mount the blockfile for layer A. | ||
1. Apply layer A to the mount. | ||
1. Unmount the blockfile for layer A. | ||
1. Copy the blockfile for layer A to a new blockfile for layer B. | ||
1. Loopback-mount the blockfile for layer B. | ||
1. Apply layer B to the mount. | ||
1. Unmount the blockfile for layer B. | ||
1. Copy the blockfile for layer B to a new blockfile for layer C. | ||
1. Loopback-mount the blockfile for layer C. | ||
1. Apply layer C to the mount. | ||
1. Unmount the blockfile for layer C. | ||
|
||
Each unpack of a layer builds upon the contents of the previous layers into a new | ||
blockfile. | ||
|
||
TODO: Does this mean a lot of wasted space? Does it remove previous layer blockfiles after they have been used? E.g. a 500MB scratch image copied for layer A, then for B, then for C, means 1.5GB of space used, even though the final disk image is only 500MB. |