"Snapshot" of writable layers #223

gorbak25 · 2023-06-12T13:19:45Z

What is the version of your Overlaybd

No response

What would you like to be added?

I would like for the overlaybd-tcmu binary to implement a TCMU reconfigure handler which:

Temporarily pauses IO on the block device
Seals the current writable layer
Creates a new empty writable layer
Reconfigures itself, so the sealed layer is now a lower layer and the upper layer points to the new empty layer
Resumes IO
Sealing is quick, so the IO pause shouldn't be noticeable for users of the block device.

Why is this needed for Overlaybd?

OverlayBD will be the main storage back-end for https://github.com/hocus-dev/hocus (we're currently finishing the migration from raw sparse files to OverlayBD). We would have 2 use cases for reconfiguration:

Hocus spawns long-lived vm's (workspaces) which we would like to periodically back up to an OCI registry without recreating the vm's from scratch.
When fetching content from remote repositories we spawn an dedicated VM for that task which pulls changes from the remote repository into a new layer. If re-configuring tcmu was supported we could reuse the same block device and vm between pulls of remote repositories.

Are you willing to submit PRs to contribute to this feature?

Yes, I am willing to implement it.

The text was updated successfully, but these errors were encountered:

lihuiba · 2023-06-12T14:29:29Z

2. we could reuse the same block device

This reuse is risky, because it probably incurs inconsistency between disk content and that cached in kernel. I would advise against reconfigure (sightly change the content of) a block device without detaching it from kernel. Except in the case that you are 100% sure that the inconsistency will not happen.

gorbak25 · 2023-06-12T14:34:25Z

incurs inconsistency between disk content and that cached in kernel

Would data still be inconsistent when first flushing the kernel writeback buffer by invoking sync on the guest/host and then reconfiguring the device?

lihuiba · 2023-06-13T02:01:24Z

Basically yes, as the kernel will not be aware that you have changed (reconfigured) the disk.

gorbak25 · 2023-06-13T12:38:23Z

Ok, I agree that inconsistencies will arise. I'm wondering whether sync would act as a barrier. Let's consider this scenario:

Data is constantly being written to the block device without a pause
Issue a sync at time T_s
After sync finished reconfigure the device (swap the writable layer), we now have a sealed layer S and the new writable layer W1

Now my question is:
Can I be sure that data written at T < T_s will end up in S and data written after the sync T > T_s will end up in W? If I proceed to create a new writable layer W2 on top of S won't fsck during bootup fix any inconsistencies which may arise?

Do I need a stronger primitive to ensure this will work(like pausing the VM for a moment)? In Hocus we don't use containerd or the overlaybd containerd snapshotter, we use overlaybd-tcmu directly, so we are free to make this work for our use cases :)

lihuiba · 2023-06-14T01:49:53Z

whether sync would act as a barrier

No, sync is only a memory-to-disk synchronization, i.e. writing dirty cache pages back to disk; it does not take care of the opposite direction by loading new data from disk. (And the disk does not even have a way to tell what has been changed!)

Dropping caches with sysctl -w vm.drop_caches=3 before reconfiguration does not seem to ensure consistency. I suspect that it does not fully drop all kinds of caches, but I didn't carry out further investigation.

liulanzheng · 2023-06-14T07:39:50Z

reconfiguration seems easy to implement because libtcmu already supports it. But it must be used with caution due cache consistency problems.
Perhaps this set of commands can solve the cache consistency issue:

sync
echo 3 > /proc/sys/vm/drop_caches
blockdev --flushbufs $DISK
hdparm -F $DISK

from https://stackoverflow.com/questions/9551838/how-to-purge-disk-i-o-caches-on-linux

lihuiba · 2023-06-14T08:32:49Z

Anyway, umount and re-mount again are necessart.

gorbak25 · 2023-06-14T11:31:12Z

I don't see why purging the read cache is required when the only thing which changes is the upper writable layer so the data on disk remains the same. I see that problems will arise if one changes the disk image entirely but if only the upper layer changes then the read caches are still ok.

lihuiba · 2023-06-15T01:38:57Z

I don't see why

Because you "pulls changes from the remote repository". This may incur inconsistency even if you don't "changes the disk image entirely".

gorbak25 added the enhancement New feature or request label Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Snapshot" of writable layers #223

"Snapshot" of writable layers #223

gorbak25 commented Jun 12, 2023 •

edited

lihuiba commented Jun 12, 2023 •

edited

gorbak25 commented Jun 12, 2023 •

edited

lihuiba commented Jun 13, 2023

gorbak25 commented Jun 13, 2023 •

edited

lihuiba commented Jun 14, 2023

liulanzheng commented Jun 14, 2023

lihuiba commented Jun 14, 2023

gorbak25 commented Jun 14, 2023

lihuiba commented Jun 15, 2023

"Snapshot" of writable layers #223

"Snapshot" of writable layers #223

Comments

gorbak25 commented Jun 12, 2023 • edited

What is the version of your Overlaybd

What would you like to be added?

Why is this needed for Overlaybd?

Are you willing to submit PRs to contribute to this feature?

lihuiba commented Jun 12, 2023 • edited

gorbak25 commented Jun 12, 2023 • edited

lihuiba commented Jun 13, 2023

gorbak25 commented Jun 13, 2023 • edited

lihuiba commented Jun 14, 2023

liulanzheng commented Jun 14, 2023

lihuiba commented Jun 14, 2023

gorbak25 commented Jun 14, 2023

lihuiba commented Jun 15, 2023

gorbak25 commented Jun 12, 2023 •

edited

lihuiba commented Jun 12, 2023 •

edited

gorbak25 commented Jun 12, 2023 •

edited

gorbak25 commented Jun 13, 2023 •

edited