Skip to content

Commit

Permalink
document usage and design of blockfile snapshotter
Browse files Browse the repository at this point in the history
Signed-off-by: Avi Deitcher <avi@deitcher.net>
  • Loading branch information
deitch committed Apr 26, 2024
1 parent 0426e3c commit 6e58402
Show file tree
Hide file tree
Showing 2 changed files with 151 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/snapshotters/README.md
Expand Up @@ -11,7 +11,7 @@ Generic:
- `native`: Native file copying driver. Akin to Docker/Moby's "vfs" driver.

Block-based:
- `blockfile`: A driver using raw block files for each snapshot. Block files are copied from a parent or base empty block file. Mounting requires a virtual machine or support for loopback mounts.
- [`blockfile`](./blockfile.md): A driver using raw block files for each snapshot. Block files are copied from a parent or base empty block file. Mounting requires a virtual machine or support for loopback mounts.
- `devmapper`: ext4/xfs device mapper. See [`devmapper.md`](./devmapper.md).

Filesystem-specific:
Expand Down
150 changes: 150 additions & 0 deletions docs/snapshotters/blockfile.md
@@ -0,0 +1,150 @@
# Blockfile Snapshotter

The blockfile snapshotter uses raw block files for each snapshot. Block files are
copied from a parent or base empty block file. Mounting requires a virtual machine
or support for loopback mounts.

## Use Case

Snapshotters serve the purpose of extracting an image from the OCI image store and
creating a snapshot that is useful to containers. It handles setting up the
underlying infrastructure, such as preparing a directory or other filesystem setup,
applying the layers to create a single mountable directory to serve as the container
base, and mounting into the container upon start.

The most commonly used snapshotter is the overlayfs snapshotter, which is the default
in containerd. The overlayfs snapshotter provides a directory on the host filesystem,
which then is bind-mounted into the container.

The blockfile snapshotter targets a use case where the container will run inside a
VM. Specifically, the OCI image will be the filesystem for the container, like with
a normal container, but the container itself will be run inside a VM.
Since the VM cannot bind-mount directories from the host, the blockfile snapshotter
creates a block device for the snapshot, which can be attached to the VM as a block
device to facilitate getting the contents into the guest.

## Alternatives

There are alternatives to the blockfile snapshotter for mounting directories into a
VM. One alternative is a [virtiofs](https://virtio-fs.gitlab.io) driver,
assuming your VMM supports it. Similarly, you can use
[9p](https://www.kernel.org/doc/Documentation/filesystems/9p.txt) to mount a local
directory into the VM, assuming your VMM supports it.

Additionally, the [devicemapper snapshotter](./devmapper.md) can be used to create
snapshots on filesystem images in a devicemapper thin-pool.

## Usage

### Checking if the blockfile snapshotter is available

To check if the blockfile snapshotter is available, run the following command:

```bash
$ ctr plugins ls | grep blockfile
```

### Configuration

To configure the snapshotter, you can use the following configuration options
in your containerd `config.toml`. Don't forget to restart it after changing the
configuration.

```toml
[plugins.'io.containerd.snapshotter.v1.blockfile']
scratch_file = "/opt/containerd/blockfile"
root_path = "/somewhere/on/disk"
fs_type = 'ext4'
mount_options = []
recreate_scratch = true
```

- `root_path`: The directory where the block files are stored. This directory must be writable by the containerd process.
- `scratch_file`: The path to the empty file that will be used as the base for the block files. This file should exist before first using the snapshotter.
- `fs_type`: The filesystem type to use for the block files. Currently supported are `ext4` and `xfs`.
- `mount_options`: Additional mount options to use when mounting the block files.
- `recreate_scratch`: If set to `true`, the snapshotter will recreate the scratch file if it is missing. If set to `false`, the snapshotter will fail if the scratch file is missing.

### Creating the scratch file

You can create a scratch file as follows. This example uses a 500MB scratch file.

```bash
$ # make a 500M file
$ dd if=/dev/zero of=/opt/containerd/blockfile bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 1.76253 s, 297 MB/s

$ # format the file with ext4
$ sudo mkfs.ext4 /opt/containerd/blockfile
mke2fs 1.47.0 (5-Feb-2023)
Discarding device blocks: done
Creating filesystem with 512000 1k blocks and 128016 inodes
Filesystem UUID: d9947ecc-722d-4627-9cf9-fa2a3b622106
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
```

### Running a container

To run a container using the blockfile snapshotter, you need to specify the
snapshotter:

```bash
$ # ensure that the image we are using exists; it is a regular OCI image
$ ctr image pull docker.io/library/busybox:latest
$ # run the container with the provides snapshotter
$ ctr run -rm -t --snapshotter blockfile docker.io/library/busybox:latest hello sh
```

To use it via the go client API, it is identical to using any other snapshotter:

```go
// TODO: will fill in
```

## How It Works

The blockfile snapshotter functions similarly to other snapshotters.
It unpacks each individual layer from a container image, with each layer unpack
building on the content from its parent(s).

The blockfile snapshotter is unique in two ways:

1. It applies layers inside a disk image file, rather than on the host filesystem.
1. It creates a block image file for each layer, applying the previous on top of it.

Rather than a single directory with the contents, the end of the blockfile
snapshotter's process is a single file, which has the contents of the full
filesystem image. That image file can be loopback mounted, or attached to a virtual
machine.

For every layer the snapshotter creates a new blockfile, starting with a copy of the
blockfile from the previous layer. If there is no previous layer, i.e. for the first
layer, it copies the scratch file.

For example, for an image with 3 layers - called A, B, C - the process is as follows:

1. Copy the scratch file to a new blockfile for layer A.
1. Loopback-mount the blockfile for layer A.
1. Apply layer A to the mount.
1. Unmount the blockfile for layer A.
1. Copy the blockfile for layer A to a new blockfile for layer B.
1. Loopback-mount the blockfile for layer B.
1. Apply layer B to the mount.
1. Unmount the blockfile for layer B.
1. Copy the blockfile for layer B to a new blockfile for layer C.
1. Loopback-mount the blockfile for layer C.
1. Apply layer C to the mount.
1. Unmount the blockfile for layer C.

Each unpack of a layer builds upon the contents of the previous layers into a new
blockfile.

TODO: Does this mean a lot of wasted space? Does it remove previous layer blockfiles after they have been used? E.g. a 500MB scratch image copied for layer A, then for B, then for C, means 1.5GB of space used, even though the final disk image is only 500MB.

0 comments on commit 6e58402

Please sign in to comment.